Download and serve archive.org captures locally

Created on 20 August 2025, 3 days ago

Problem/Motivation

Imagine this scenario: archive.org has been closed, and all the archived pages are gone. Now what?

Steps to reproduce

Prepare for a future where archive.org no longer exists.

Proposed resolution

Add a belt-and-suspenders "Archive archive.org captures" kind of tool.

Two features probably need to be built:

  • A crawler-like bot, which visits relevant links and saves individual captured pages locally
  • Allow Wayback filter links to be configured to link directly to a local, saved copy, instead of the archive.org capture

As a side effect, this will result in much faster page loads for the end users.

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.2

Component

Code

Created by

🇩🇰Denmark ressa Copenhagen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @ressa
  • 🇩🇰Denmark Steven Snedker

    This would also benefit the External Link Preview Module immensely.

    Yet with all the awful specious invoicing companies out there, local caching could mean a lot of Drupal sites losing a lot of money and a lot of sleep. We're at an impasse at Make a local image copy for GDPR 📌 Make a local image copy for GDPR Active (read and shudder).

    Wayback Filter could branch out and support way smaller archive sites like Archive.today or Ghost Archive. There may be a few other OKish candidates on the List of web archiving initiatives. Autosubmit the URL where feasible.

    But with only 14 users , half of them me, and archive.org in ok health, I haven't spent any time on it. With only one user ever (!), Wayback Submit to Archive.org will never be updated.

    But back to the orignal question: caching sites?
    No. Having Drupal sites caching (storing and publishing) third party sites locally is sadly way too risky.

Production build 0.71.5 2024