Problem/Motivation
If we are going to maintain many more (or all) contrib modules and themes on api.drupal.org, we need a simpler way to manage their files. Currently, we have scripts that maintain our projects/branches as Git clones, but this is time consuming to set up. Instead, it would be easier if the API module (or a sub-module) could get the files from TGZ downloads. This would work in conjunction with
β¨
Grab project list and packages from Drupal database or XML
Closed: outdated
to make a system that could automatically maintain the project/branch lists from meta-data, and download the necessary files for the API module to parse.
Proposed resolution
A few things would need to change:
a) File update times are not reliable from TGZ files. So, instead of deciding a given file needs to be reparsed based on the file update time, we would need to switch to using a hash or checksum of the file instead, at least for projects/branches being managed via TGZ files.
b) Ideally, if the TGZ had not been updated at all, we could skip checking the hash/checksum of individual files in the branch when doing a branch update, because we would know that nothing needed to be updated.
c) Probably the API module could have a couple of new hooks, that would ask "Does this project/branch need to be fully checked" and "Get me the files for this project/branch". We can have a new submodule that will manage these hooks by using TGZ files, for certain branches, or maybe it would just be some code in the main module that would notice it's a TGZ managed branch and use this method, vs. a regular files branch, and do things the old way.
d) Unzipped files would have a time-to-live and could be cleaned up once the API module is done looking at them.
Remaining tasks
TBD
User interface changes
TBD
API changes
TBD
Data model changes
TBD
Original issue report....
Directly using tgz files downloaded from Drupal.org, or anywhere else, will greatly reduce the setup for each project. This is needed for
β¨
Grab project list and packages from Drupal database or XML
Closed: outdated
.
This could either be a new branch type, next to files, or changes to the files branch type. Some code will likely be shared.
Two strategies I can think of are:
I like in memory because it avoids the filesystem, and permissions problems that come with it, entirely. In memory might take a lot or memory, but we already use a lot of memory on parsing.
Localize.drupal.org uses Archive_Tar is used to extract to the filesystem: http://drupalcode.org/project/l10n_server.git/blob/refs/heads/7.x-1.x:/c....