importing/deleting stops

Created on 24 November 2024, 5 months ago

Problem/Motivation

I have been trying to import my CSV file for several days now and still unsuccessful. The file has 324,000 lines/entries and the import would just stop/hang in the middle, showing the progress bar and no error. I've broken the files into smaller files, each with only 1000 lines and one file completed import and then the second one just hangs too. The following questions came up while I am trying to debug the problem:

During setting up a new feed, I see an "Active" checkbox (see attached image). If I uncheck it, does it mean it should do the full import without stopping? If I leave it checked, does it mean the import will be broken up into smaller chunks and each should be automatically launched by some mechanism? What controls this periodic import and is there a way to adjust it? Please explain this Active checkbox more.
When the import hangs, i see an "unlock" link in the import view (eg /feed/9/list), what does that do?
Also when an import hangs, I do not see the "Delete" or "Delete items" in the import view (eg /feed/9) (see attached image), how can I safely delete only the imports with this feed so that I can try again?
For successful import, does "Delete" or "Delete items" delete ONLY the items imported with this specific feed and not others that have been imported with other feeds, even if all these imports are for the same content type?

All help/tips are appreciated!

💬 Support request
Status

Active

Component

User interface

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @monaw
  • 🇳🇱Netherlands megachriz
    1. Active checkbox: you can configure feed types to import sources regularly. This is called "Periodic import" (see image). Only feeds that are active will be used for periodic import. So when unactivating it, this particular feed will no longer be imported regularly, only when you click "Import" or "Import in background".
    2. When an import for a feed starts, the feed gets locked. This is to prevent running another import for it. If two imports for the same feed would run at the same time, that could cause issues, for example it could make earlier imported items being deleted that should not be deleted (if you have configured to delete previously imported items that are no longer in the feed). Unlocking the feed you would do if you believe the import got stuck. You can then restart the import. When unlocking, Feeds cleanups the metadata for the import that did not finish.
    3. If you want to start the import over, first unlock the feed. Then you can delete all imported items.
    4. "Delete items" only deletes items that are created or updated with this feed. It doesn't delete items from other feeds - if those items are not created or updated with this feed. It is technically possible to configure feed types so that two feeds update the same content. Say feed 1 creates items A, B and C and feed 2 creates items D and E and updates item C. Then "Delete items" on feed 1 would delete items A, B and C and "Delete items" on feed 2 would delete items C, D and E. So in this case they would both delete item C, because they both "touched" item C.

    A tip for importing large files: I think it is a better idea to import these in background (by using the "Import in background" button). This way, the import runs in chunks during cron runs. This way the chance that the import will hang is smaller, because it doesn't depend on the browser being kept open. It can still hang or get stuck however. For example when a fatal PHP error occurs in the process, or when the server shuts down. Or perhaps when running module updates (because that could cause module files temporary getting removed and that could possibly cause fatal PHP errors too).
    Import in background does require cron to be configured. Per cron run, the import process runs for about a minute. So I can imagine 324000 lines would take quite a large number of cron runs too.

    I hope this answers your questions. Feel free to reopen this issue if you have more questions. :)

  • 🇳🇱Netherlands megachriz

    Ah, I see you just updated the issue summary. Feel free to add/update the documentation . :)

  • 🇳🇱Netherlands megachriz

    The "Delete" button on a feed, deletes the feed itself. Not the import items.

  • thank you @megachriz for your helpful info! few more questions:

    1. what do you recommend as the best way to import 324,000 rows of CSV data?
    2. why does it hang during import and delete? i'm monitoring the system and nothing else is happening and nobody is logged in so why would it hang?
    3. if the import hangs, does unlocking the feed and then restarting will start from the beginning again or from it left off?
    4. if i configure the feed to import in the background and my cron is set to every hour, that should work right? i guess i'm afraid it will still hang...and if that happens, how can i kill the cron import?
  • how do i reopen this issue?

  • 🇳🇱Netherlands megachriz
    1. I don't have experience importing a CSV file with that many lines, but I would choose to import in background and let the import be done in chunks using cron.
    2. I don't know. There could be a bug (either in Feeds or an other module) causing the import to stop. If this is the case, you should be able to find something about it on the server logs (the error may not be logged by Drupal). An other possibility is that the server thinks "This process runs for way too long, I'm going to stop it". Or a lack of memory. Probably these are reported on the server logs too.
    3. If you unlock the feed and then restart the import, the import will start from the beginning. If you have configured a CSV column as unique, Feeds would skip items it already has imported, but it will still go through each item in the file in order to check that. Say the import hangs at 2000 items, and you restart the import, then for the first 2000 items of the CSV file Feeds will check if they are already imported and would see that this is the case (and not import them again). But just doing these checks can also take a long time.
    4. Yes, setting cron running once an hour would work, but I would try to run cron more often. If for example Feeds would manage to import 500 items per cron run, then it would take 648 hours to import all 324000 items. That's almost a month. If you want to stop the cron import, then you would unlock the feed.
    5. An import running in the UI (where you see a progress bar) stops shortly after you close the browser (it would just finish only the last chunk it was busy with). An import running on cron does not depend on the browser. You can restart the import by unlocking the feed and then start the import again.
    6. Yes, an import running on cron does not depend on the browser.

    Since I don't know what makes the import hang, I cannot guarantee that the import of all 324000 items will be successful when imported during cron. If it happens to hang, I would first check the server logs to see if there's any information what made it hang. Then I would check how many items were already imported and remove that many items from the CSV file and try again.
    You could see that the import hangs if the amount of imported items stays the same after a cron run.

    By installing the Queue UI module, you can inspect/monitor the import tasks that are scheduled to run. If the same task is retried over and over again, then the import hangs. (One day I hope to add functionality to Feeds that would detect that the same task is retried over and over again, so that it can warn the user that something went wrong during the import - or maybe even skip the task so it can continue doing the rest of the import.)

  • @megachriz, thank again for your helpful info! one last question...if i don't have any unique columns, when the import hangs, is it best then to unlock the import, delete the imported items, and restart the import again?

  • 🇳🇱Netherlands megachriz

    If you can figure out why it hangs and then resolve that issue, then you could start the import from scratch. But if it is going to hang once and you don't get that issue resolved, it will likely hang again on a restart of the import. So in that situation I would choose to remove the items that were already imported from the CSV file.

  • ok, now the files with 10,000 entries seem to be importing without hanging after I uncheck the "Active" option when setting up the feed AND I leave my MacBook plugged in! I checked my MacBook battery setting and turned off Low Power Mode option (see attached image) but that alone didn't seem to fix the hanging...so looks like it might be a combination of feed setting and my computer...interesting!

  • 🇳🇱Netherlands megachriz

    That's interesting that a MacBook going in sleep mode could affect the import from hanging. For imports in the UI I think that makes sense (if the webserver is running on the MacBook), but I would expect that with imports on cron, the import would eventually continue.

    I have a Mac too. Would be interesting to run a large import and then at the exact moment that cron is running, put the Mac into sleep mode. And then see if that makes the import hang.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024