Duplication checks in aggregator are naive and incorrect

Created on 27 February 2006, over 18 years ago
Updated 8 March 2024, 7 months ago

In the RSS 2.0 "specification", a GUID is an optional element that is meant to "uniquely [identify] the item" of that particular feed. In Drupal, however, we're going too many steps further: if a GUID doesn't exist, Drupal treats the LINK of an item as a unique value. This is incorrect: it assumes that the LINK is unique, which no version of RSS has ever indicated as being true: in many non-blogging feeds (such as a weather feed, where I am unable to get the second ITEM), multiple bits of information are spit, all with the same LINK. Drupal, however, only sees one of those items, because it erroneously treats the LINK as if it were unique. Drupal should check only for GUID, and then a direct TITLE + LINK + DESCRIPTION check for each item - only then should something be considered unique. (So yes, minor spelling errors would be considered a new unique item - some readers even go so far as to show a diff between two items that have met a threshold of similarity).

πŸ› Bug report
Status

Active

Version

7.0 ⚰️

Component

aggregator.module

Created by

πŸ‡ΊπŸ‡ΈUnited States morbus iff

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024