Delete orphaned field_collection_item entities when they are (programatically) removed from the host entity

Created on 13 June 2011, about 13 years ago
Updated 14 May 2023, about 1 year ago

If someone loads an entity, deletes (i.e unsets) one (of the many) values of it's field_collection field, and then saves that entity, that value of that field is deleted but behind-the-scenes the field_collection_item entity that corresponded to that value is never deleted and stays around forever.

This would appear to be a perfectly valid way of deleting the value of a field in Drupal, and so field_collection needs to make sure to detect this situation and also delete everything else associated with that value - namely the associated field_collection_item entity.

This is exactly the sort of behavior core file.module does to ensure that it can delete files when they are no longer referenced by any entities (see modules/file/file.field.inc: file_field_update())

I came across this issue when importing field_collection_items with feeds β†’ , since before an 'update' import, feeds simply unsets all current values of a field in preparation for them to be replaced by the import. We have found that our field_collection_item table was getting out of hand.

Patch to follow..

πŸ“Œ Task
Status

Closed: duplicate

Version

1.0

Component

Code

Created by

πŸ‡¬πŸ‡§United Kingdom jamsilver

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡«πŸ‡·France jvieille

    I am surprise that this issue was never seriously addressed.
    My field_collection_item contains 16579 records, with 15562 orphans among them - somewhat 94% garbage
    #22 provides useful information to develop a cleanup function.

    As I was puzzled by the SQL query, I asked ChatGPT to get me on :

    Can you explain this sql query : SELECT item_id
    FROM field_collection_item field_collection_item
    LEFT JOIN field_data_field_srelation field_data_field_srelation ON field_collection_item.item_id = field_data_field_srelation.field_srelation_value
    LEFT JOIN node field_srelation_field_collection_item ON field_data_field_srelation.entity_id = field_srelation_field_collection_item.nid
    WHERE (( (field_srelation_field_collection_item.nid IS NULL ) ))
    ChatGPT
    This SQL query is selecting the item_id column from the field_collection_item table, but with certain conditions specified using LEFT JOIN statements and a WHERE clause. Here's a breakdown of each part of the query:

    SELECT item_id: This part of the query specifies which column(s) to select from the table(s) being queried. In this case, we're selecting the item_id column from the field_collection_item table.

    FROM field_collection_item field_collection_item: This part of the query specifies the table(s) to query from. In this case, we're querying from the field_collection_item table. The second field_collection_item after the FROM keyword is an alias for the field_collection_item table, which allows us to refer to this table more easily in subsequent parts of the query.

    LEFT JOIN field_data_field_srelation field_data_field_srelation ON field_collection_item.item_id = field_data_field_srelation.field_srelation_value: This is a LEFT JOIN statement that specifies how to join the field_collection_item table to the field_data_field_srelation table. The ON keyword specifies the condition that must be met for a row in the field_data_field_srelation table to be joined with a row in the field_collection_item table. In this case, we're joining on the condition that the item_id column in field_collection_item is equal to the field_srelation_value column in field_data_field_srelation.

    LEFT JOIN node field_srelation_field_collection_item ON field_data_field_srelation.entity_id = field_srelation_field_collection_item.nid: This is another LEFT JOIN statement that specifies how to join the field_data_field_srelation table to the node table. The ON keyword specifies the condition that must be met for a row in the node table to be joined with a row in the field_data_field_srelation table. In this case, we're joining on the condition that the entity_id column in field_data_field_srelation is equal to the nid column in field_srelation_field_collection_item.

    WHERE (( (field_srelation_field_collection_item.nid IS NULL ) )): This part of the query specifies the condition(s) that rows must meet to be included in the query results. In this case, we're only including rows where the nid column in field_srelation_field_collection_item is NULL. This essentially means that we're only selecting rows from the field_collection_item table that do not have a matching row in the node table via the field_data_field_srelation table.

    To summarize, this SQL query selects the item_id column from the field_collection_item table, but only for rows that do not have a matching row in the node table via the field_data_field_srelation table.

    User
    why does this request correctly returns non matching rows via another table, say field_data_field_srelation?
    ChatGPT
    This query returns non-matching rows via the field_data_field_srelation table because it is using a LEFT JOIN statement to join the field_collection_item table with the field_data_field_srelation table. A LEFT JOIN returns all rows from the left table (field_collection_item) and only the matching rows from the right table (field_data_field_srelation), with NULL values for the columns of the right table where there is no match.

    By using a LEFT JOIN rather than an INNER JOIN, this query ensures that all rows from the field_collection_item table are included in the results, even if there is no matching row in the field_data_field_srelation table. This means that the query can return rows from the field_collection_item table that don't have a matching row in the node table via the field_data_field_srelation table, which is what we want in this case.

    The WHERE clause of the query specifies that we're only interested in rows where the nid column in field_srelation_field_collection_item is NULL, which essentially means that we're only selecting rows from the field_collection_item table that don't have a matching row in the node table via the field_data_field_srelation table. This allows us to filter out any rows that do have a matching row in the node table, and only return the non-matching rows that we're interested in.

    User
    Sorry, I mistaken the question, this is the correct one : why does this request correctly returns non matching rows via another table, say field_data_field_brelation?
    ChatGPT
    The reason this query correctly returns non-matching rows via the field_data_field_brelation table is because it uses a LEFT JOIN to join the field_collection_item table with the field_data_field_brelation table.

    A LEFT JOIN returns all rows from the left table (field_collection_item) and only the matching rows from the right table (field_data_field_brelation), with NULL values for the columns of the right table where there is no match.

    In this query, the condition for joining the tables is that field_collection_item.item_id is equal to field_data_field_brelation.field_brelation_value. This means that for each row in the field_collection_item table, the query tries to find a matching row in the field_data_field_brelation table based on the value in the field_brelation_value column.

    However, since we are using a LEFT JOIN, all rows from the field_collection_item table are included in the results, even if there is no matching row in the field_data_field_brelation table. In cases where there is no matching row in the field_data_field_brelation table, the columns from that table will have NULL values in the result set.

    The WHERE clause filters the result set to only include rows where field_srelation_field_collection_item.nid is NULL, which means that there is no matching row in the node table via the field_data_field_brelation table. By doing this, the query only returns the non-matching rows from the field_collection_item table that we're interested in.

    In summary, this query correctly returns non-matching rows via the field_data_field_brelation table because it uses a LEFT JOIN to include all rows from the field_collection_item table, and then filters the results to only include the non-matching rows based on the condition in the WHERE clause.

    User
    But the query only joins field_data_field_srelation, not field_data_field_brelation
    ChatGPT
    I apologize for my mistake in my previous response. Since the query only involves a join between the field_collection_item table and the field_data_field_srelation table, my previous explanation does not apply.

    The reason the query correctly returns non-matching rows is due to the use of a LEFT JOIN. Specifically, the LEFT JOIN is used to join the field_collection_item table with the field_data_field_srelation table, which includes all rows from the field_collection_item table regardless of whether there is a match in the field_data_field_srelation table or not.

    In the query, the condition for joining the tables is that field_collection_item.item_id is equal to field_data_field_srelation.field_srelation_value. This means that for each row in the field_collection_item table, the query tries to find a matching row in the field_data_field_srelation table based on the value in the field_srelation_value column.

    However, since we are using a LEFT JOIN, all rows from the field_collection_item table are included in the results, even if there is no matching row in the field_data_field_srelation table. In cases where there is no matching row in the field_data_field_srelation table, the columns from that table will have NULL values in the result set.

    The WHERE clause filters the result set to only include rows where field_srelation_field_collection_item.nid is NULL, which means that there is no matching row in the node table via the field_data_field_srelation table. By doing this, the query only returns the non-matching rows from the field_collection_item table that we're interested in.

    In summary, the query correctly returns non-matching rows via the field_data_field_srelation table because it uses a LEFT JOIN to include all rows from the field_collection_item table, and then filters the results to only include the non-matching rows based on the condition in the WHERE clause.

    User
    OK, I reformulate: why this query is supposed to returns all non-matching rows in field_collection_item in the node table via an intermediate table. there are 5 of these tables, and I get all non matching rows whatever intermediate table is involved. This is the query with field_data_field_srelation, which gives the same result with in the node table via the field_data_field_brelation:
    ChatGPT
    Based on the query you provided earlier, it looks like the purpose of the query is to find rows in the field_collection_item table that do not have a corresponding entry in the node table via an intermediate table (field_data_field_srelation or field_data_field_brelation).

    The query achieves this by using a LEFT JOIN to join the field_collection_item table with the intermediate table, and then another LEFT JOIN to join the intermediate table with the node table. The WHERE clause then filters the results to only include rows where there is no matching entry in the node table.

    The reason this query returns all non-matching rows in the field_collection_item table, regardless of which intermediate table is involved, is because of the use of LEFT JOIN statements.

    A LEFT JOIN returns all rows from the left table (field_collection_item in this case) and only the matching rows from the right table (field_data_field_srelation or field_data_field_brelation), with NULL values for the columns of the right table where there is no match.

    So when you use a LEFT JOIN to join the field_collection_item table with any of the intermediate tables (field_data_field_srelation, field_data_field_brelation, etc.), all rows from the field_collection_item table are included in the result set, even if there is no matching row in the intermediate table.

    Then, when you use another LEFT JOIN to join the intermediate table with the node table, all rows from the intermediate table are included in the result set, even if there is no matching row in the node table.

    Finally, the WHERE clause filters the results to only include rows where field_srelation_field_collection_item.nid (or the equivalent column from the other intermediate tables) is NULL, which means that there is no matching row in the node table.

    Therefore, the query returns all non-matching rows in the field_collection_item table, regardless of which intermediate table is involved, because of the use of LEFT JOIN statements and the WHERE clause that filters the results to only include non-matching rows.

    User
    Can this request wrongly report matching rows through another intermediate table ?
    ChatGPT
    It is possible for the query to incorrectly report matching rows through another intermediate table if there are duplicates or inconsistencies in the data.

    For example, if there are two or more entries in the intermediate table (field_data_field_srelation or field_data_field_brelation) with the same field_srelation_value or field_brelation_value value, but with different entity_id values, the query could potentially match to the wrong entry in the node table, or match to multiple entries in the node table, resulting in incorrect results.

    Similarly, if there are inconsistencies or errors in the data, such as missing or incorrect values in the intermediate table or the node table, the query could potentially match to the wrong entry in the node table or miss a valid match altogether, resulting in incorrect results.

    Therefore, it is important to carefully examine the data and the relationships between the tables to ensure that the query is providing accurate results. It may also be helpful to perform additional checks or validations to confirm the accuracy of the results.

Production build 0.69.0 2024