Convert the Feed hash to the State API

Created on 15 August 2023, 11 months ago
Updated 31 January 2024, 5 months ago

Problem/Motivation

The Feed entity hash is a record of the feed's state when it was last downloaded. It allows us to skip processing the feed if its contents have not changed since the previous download. Currently it is stored as a property of the Feed entities. I believe that if this module were designed in the D8 era, then the hash would be stored in the State API. I don't think it makes sense to store this in the entity anymore. It could potentially cause problems. For instance, if anyone happens to be synchronizing Feed entities between sites, then the receiving site would get that hash and potentially think that it shouldn't update its items. Let's have this value indicate the state of the site, not the state of the entity.

Proposed resolution

We would remove the hash from the Feed entity baseFieldDefinitions(). The only question that I have is: should we deprecate and later remove the hash getter and setter? Or should we repurpose them to communicate with the State API and just keep them forever? I don't know that there's any reason we shouldn't do that other than adding a responsibility to the Feed class. And it could be a good idea so that calling code doesn't have to figure out how to construct the state key.

Remaining tasks

  • Reach a consensus on what all should be done.
  • Write a patch.
  • Commit

User interface changes

None

API changes

TBD

Data model changes

Delete the Feed entity's hash property in baseFieldDefinitions().

πŸ“Œ Task
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States dcam

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @dcam
  • Status changed to Needs review 11 months ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 11 months ago
    146 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    Let's see what breaks. I ran the full test suite locally and had seven failures. I started to correct them, but then I accidentally discarded the results and I don't want to wait for it to run again. lol

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 11 months ago
    146 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    It bothered me to have the feed still handling the hash. Also, I realized that using the hash isn't part of the functionality of feed entities. It's part of the importer functionality. Thus, dealing with the hash should be its responsibility. And since the importer is a service, there's no problem in transferring the functions to it and deprecating those in the entity class. At least, there shouldn't be.

    Now this needs deprecation tests too, but I wanted to see if I broke anything before worrying about that.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 11 months ago
    147 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I added the deprecation test and fixed a few issues that were found as a result.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 11 months ago
    148 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I added the update path and a test for it. I think this is complete now.

  • πŸ‡ΊπŸ‡ΈUnited States dcam
    +++ b/tests/src/Functional/Update/AggregatorUpdateDeleteHashTest.php
    @@ -0,0 +1,34 @@
    +  /**
    +   * Ensure all items in the aggregator_feeds queue are deleted.
    +   */
    +  public function testUpdateHookN(): void {
    

    I forgot to update this docblock when I copied it from another test I'd written. But I'm not going to bother changing it right now. We'll probably have to update this class's docblock to change the number of the update function it covers anyway. It can be fixed then.

  • πŸ‡ΊπŸ‡ΈUnited States dcam

    There are other properties that should also probably be converted to state:
    * checked
    * queued
    * etag
    * modified - This is the timestamp when the feed was last fetched. It is not the same as our usual entity "changed" fields.

    Should these be handled separately or as part of this issue?

    It may also be worth looking into whether we can eliminate the queued property and instead access the queue service to check for an entry. It's been a few years since the last time I worked with queues. I don't remember if that's possible.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    The queue system has ::getNumberOfItems so that might suffice

    I have a concern here that we're increasing the number of rows in the key value table which is commonly queried in the hot path.

    We can create our own key value service in its own table and keep our stuff out of the primary table.

    \Drupal\Core\KeyValueStore\DatabaseStorage supports passing a table name to it's constructor, so I think we could use that.

    It would be on sites using non-DB instances to override that with e.g. Redis/Memcache.

    With that approach we could also use the collection property to partition by feed, or by property if we converted the other fields too.

    Thoughts?

  • Status changed to Needs work 10 months ago
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I like the idea of creating our own key-value store for it. I'll work on it.

    I considered using a cache bin. After all, the hash is basically just a cached representation of the feed. But the obvious concern is what happens when the cache is cleared. For most sites it probably wouldn't be a problem. But any site that has to process thousands of items, either because they have lots of feeds or those feeds have many items, will have to do a lot of extra work after any clear. Also, if we put the "modified" timestamp into the same cache, then the site would lose that information on a cache clear too which would make it unavailable on the admin feed list. I'm sure that would lead to a bug report or two. State just seems like the better option to me.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Yeah agree keyvalue is the best here. Not sure any of them warrant key_value_expirable, but worth considering

  • Status changed to Needs review 10 months ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 10 months ago
    157 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    Just a reroll of #5.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 10 months ago
    157 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    Here's an attempt at creating a custom key value store. I studied the two examples in Core to make it. It seems to be working on my local environment.

  • Status changed to RTBC 10 months ago
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    This is some awesome work. Only one minor nit pick. Feel free to self RTBC on that.

    +++ b/src/Entity/Feed.php
    @@ -437,7 +432,8 @@ class Feed extends ContentEntityBase implements FeedInterface {
    +    @trigger_error('Feed::setHash() is deprecated in aggregator:2.1.0 and is removed from aggregator:3.0.0. Use \Drupal::service("aggregator.items.importer")->setHash($feed, $hash); instead. See https://www.drupal.org/project/aggregator/issues/3381298.', E_USER_DEPRECATED);
    
    +++ b/src/FeedInterface.php
    @@ -158,6 +158,11 @@ interface FeedInterface extends ContentEntityInterface {
    +   * @see https://www.drupal.org/project/aggregator/issues/3381298
    
    @@ -170,6 +175,12 @@ interface FeedInterface extends ContentEntityInterface {
    +   * @see https://www.drupal.org/project/aggregator/issues/3381298
    

    technically this should link to a change notice which you can create from https://www.drupal.org/node/add/changenotice?field_project=3262413&field... β†’

  • Status changed to Needs review 10 months ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 10 months ago
    156 pass, 1 fail
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    Thank you for catching that. I knew there was another patch where I needed to make a change record. Here it is: https://www.drupal.org/node/3386907 β†’ .

  • The last submitted patch, 14: 3381298-14.patch, failed testing. View results β†’
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 10 months ago
    156 pass, 1 fail
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I forgot to update the deprecation test. I'm not going to bother with an interdiff.

  • The last submitted patch, 16: 3381298-16.patch, failed testing. View results β†’
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 10 months ago
    157 pass
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    This is what I get for being excited to commit this and not being careful about what I'm doing.

    And it's why we test every change, even minor ones. I thought briefly about not testing it. This will teach me to not think like that.

    • dcam β†’ committed e9387c44 on 2.x
      Issue #3381298 by dcam, larowlan: Convert the Feed hash to the State API
      
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.1 & MariaDB 10.3.22
    last update 10 months ago
    157 pass
    • dcam β†’ committed ab1d3c8d on 1.x
      Issue #3381298 by dcam, larowlan: Convert the Feed hash to the State API
      
  • Status changed to Fixed 10 months ago
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    @larowlan thank you for the assist on this!

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed 5 months ago
  • πŸ‡³πŸ‡±Netherlands Eric_A

    Before this I think it was possible to call Feed::refreshItems() on a new, unsaved Feed entity.
    Since 2.1.0 this breaks ugly at the point when loading the feed by ID gets triggered and a NULL collection can't be found in the key-value store.

    Perhaps change record information will suffice. Perhaps a more meaningful exception should be thrown. Possibly as early as in Feed::refreshItems().

    Not sure at this point.

  • πŸ‡³πŸ‡±Netherlands Eric_A

    (Alternatively the old behavior could be restored by having the importer first saving any new entity before continuing...)

Production build 0.69.0 2024