How to avoid duplicates during import

Created on 25 February 2022, almost 3 years ago
Updated 10 July 2024, 5 months ago

Hi,
I have to periodically import hundreds of publications from different .BIB files downloaded from SCOPUS.

The problem is that if I re-import the same publication multiple times, it is recreated producing tons of duplicate nodes. Is it possible to prevent this from happening? For example by checking the DOI? If a DOI already exists in the database, then the current imported publication should by skipped.

Thanks!

Feature request
Status

Needs work

Version

2.0

Component

Code

Created by

🇮🇹Italy lorisbel

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • I reviewed changes. I think that users don't need to check options each time on import. And it should be global module settings.
    Also what do you think about options to update existed entities not just skip it? https://www.drupal.org/project/bibcite/issues/2890092
    This issue won't be included in next release.
    We have some ideas about implementation this function.

    Obviously you can use patch as a temporary solution until it will be finished.

  • This is a great patch, but I'm having one small issue with it that I haven't been able to figure out on my own.

    I have a user that wants to delete a reference and then re-import it and even if the reference was deleted, this still says there is a duplicate entry found and won't add a new one.

    I thought maybe that it was just a cached entry causing issues, but issue persists even if I do "Configuration -> Performance -> Clear All Caches" and then try to import it after deleting.

    Open to any thoughts or solutions.

  • I could also use this feature on a project, +1. I agree that checking for duplicates should be a global module setting, and it would be nice to also have the option to update existing entities instead of just skipping them. That way the single source of truth is not confused.

Production build 0.71.5 2024