Improve performance by comparing dependency hashing against tracking table

Created on 3 January 2021, over 4 years ago
Updated 19 January 2023, over 2 years ago

Problem/Motivation

On the subscriber side, before importing an entity, the subscriber needs to fetch all the CDF documents of all the dependencies recursively, depending on the dependency chain of this entity, it could mean dozens of API calls, which can slow down the importing performance dramatically for very large/deep dependency chain.

When we get a CDF document, there is already a dependencies attribute in which the key is the UUID of all dependencies and the value is the hash of that dependency.

Also, we have the hash stored for all the previous imported entities inside the import tracking table.

To improve the performance, can we run a hash check if the dependency needs to be updated before asking Content Hub for its CDF document?

Unfortunately, the hash stored in the tracking table is the hash attribute of the CDF object, it is calculated differently in different places than the hashes in the dependencies attribute.

Proposed resolution

1. Calculate the hash consistently in the CDF hash attribute and the dependencies attribute.
2. Compare the dependency hash against the already imported hashes in the tracking table, only asking for the ones that are missing or different.

🐛 Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

🇨🇳China weynhamz Ningbo

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024