Prevent ingesting duplicates when using hash

Created on 8 October 2024, 4 months ago

Problem/Motivation

This is a "can it" question. We receive alerts from a RSS feed (of the type that could include "Active Shooter"). For whatever reason, the organization that provides the system for creating the alerts did not include it creating a GUID, and does not seem inclined to do so. The result was that alerts were being missed, because the folks that type them in use the same title over and over, and the title is what gets used for the GUID in the absence of one.

I had written a hook for Aggregator that takes the title and appends the feed item's timestamp to it, sort of the same idea as your hash. This swung the pendulum all the way to the other side. Now no feed items get ignored...it reads them each time, adds the title+GUID, rinse and repeat. I'm guessing the decision whether to create the Aggregator item id downstream from the hook.

If I use Feeds Tamper to create the hash, is there anything in the coexistence of it and Aggregator that would stop Aggregator from re-ingesting the same items again and again?

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ’¬ Support request
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States j. ayen green

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @j. ayen green
  • πŸ‡ΊπŸ‡ΈUnited States j. ayen green
  • πŸ‡³πŸ‡±Netherlands megachriz

    I'm not sure if I understand your question. I read from it that for the source RSS:

    • GUID is sometimes provided, but not always;
    • Title is not unique.

    You combine title+GUID, but because GUID is not always available and title is not unique, this is not guaranteed to provide a unique value.
    You then combine title+"the feed item's timestamp" which is probably always unique, so you cannot update items that way.
    Perhaps you could use title+"created date" as unique or is 'created date' not available in the feed?

    I wonder what Aggregator has to do with this. Since you wrote it with a capital letter, do you mean the Aggregator module β†’ ? By my knowlegde Aggregator and Feeds Tamper do not work together.

    Can you rephrase your question?

Production build 0.71.5 2024