Update SQL query to generate project list for project_analysis and deprecation_status

Created on 26 October 2023, over 1 year ago
Updated 3 November 2023, over 1 year ago

Problem/Motivation

- Only Drupal 9 to 10 list is available
- List of projects is filtered in a restrictive way (we can do filtering in script logic much more fine grained)

Steps to reproduce

Proposed resolution

- Remove filters that don't serve us well and move filtering to consuming script
- Include all current version branches

Remaining tasks

User interface changes

API changes

Data model changes

πŸ“Œ Task
Status

Fixed

Version

3.0

Component

Code

Created by

πŸ‡­πŸ‡ΊHungary GΓ‘bor Hojtsy Hungary

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @GΓ‘bor Hojtsy
  • πŸ‡­πŸ‡ΊHungary GΓ‘bor Hojtsy Hungary

    Made an ENTIRELY UNTESTED proposal in the MR. See the diff :) Its mostly removing filters that we should do on the client side instead (more precise picking out of composer compatibility rather than the rough filter we had before, etc). This would allow to use one file for 9 to 10 and 10 to 11 testing.

    One more thing to add could be a concatenated list of maintainers, so we don't need to query that data.

    Also this may still not return ALL the branches for a project, just the latest (which might not anymore be D9 or D10 compatible), so that may need a bit of tweaking.

  • πŸ‡­πŸ‡ΊHungary GΓ‘bor Hojtsy Hungary

    I updated the script/query to what I think we would need. One more thing that would be great there is a concatenated string of maintainer names (separated with not tabs) but we can also keep using maintainers.json one by one for each project for this.

    Will ask BjΓΆrn Brala to review as well to make sure this is good still for project_analysis and/or there are no quick improvements we can make for that :)

  • @drumm opened merge request.
  • First commit to issue fork.
  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    Ok I've tested this on project analysis. Unfortunately we do need the 6th column also, which is the composer name.

    /var/lib/drupalci/workspace/infrastructure/stats/project_analysis/analyzer.sh: line 36: $6: unbound variable
    

    See: https://git.drupalcode.org/project/project_analysis/-/jobs/238715

    In my earlier scan i actually overlooked this:

    analyzer.sh

    # Arguments are
    # 1: project name, 2: composer name, 3: release version,
    # 4: type of extension, 5: number of workspace, 6: concatenated component composer requirements

    Looking at the input for that script (analyzer.sh) it seems that that is actually column 8 in the original tsv

    project_readiness.sh
    parallel -j${PROC_COUNT} --colsep '\t' --timeout 900 /var/lib/drupalci/workspace/infrastructure/stats/project_analysis/analyzer.sh "{1}" "{2}" "{3}" "{4}" "{%}" "{8}" :::: /var/lib/drupalci/workspace/projects.tsv

    Small example from the original:

    config_filter	config_filter	1.x-dev	project_module	NULL	65862	2855603	config_filter:primary:"^8 || ^9"
    fortytwo	fortytwo	5.0.x-dev	project_theme	NULL	267	3271663	fortytwo:primary:"^9 || ^10"
    acquiadam_asset_import	acquiadam_asset_import	2.0.x-dev	project_module	NULL	34	3249974	acquiadam_asset_import:primary:"^8 || ^9"
    

    small example from the new one generated:

    config_filter	config_filter	1.x-dev	project_module	NULL	79644		
    config_filter	config_filter	2.x-dev	project_module	NULL	79644		
    fortytwo	fortytwo	4.x-dev	project_theme	NULL	242		
    fortytwo	fortytwo	5.0.x-dev	project_theme	NULL	242	
    acquiadam_asset_import	acquiadam_asset_import	1.x-dev	project_module	NULL	49		
    acquiadam_asset_import	acquiadam_asset_import	2.0.x-dev	project_module	NULL	49		
    

    So i guess, at least for project analysis i kinda need the compsoer info. I might be able to find that in the composer endpoint, but that would require me to download that whole set, so guessing that is not ideal.

  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    I also see stuff like this:

    acquia_contenthub_subscriber:subcomponent:"^8.9 || ^9",acquia_contenthub_status:subcomponent:"^8.9 || ^9",acquia_contenthub_diagnostic:subcomponent:"^8.9 || ^9",acquia_contenthub_audit:subcomponent:"^8.9 || ^9",acquia_contenthub:primary:"^8.9 || ^9"

    But it only really uses :primary: it seems, no idea what that means exactly.

  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    Gabor mentioned 1st column is machine name, so i adjusted the script and ran again. Seems to work perfectly then, run on 100 projects, and even generated some patches.

  • Status changed to RTBC over 1 year ago
  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands
    • drumm β†’ committed b2ae1145 on main
      Issue #3397020: Update SQL query to generate project list for...
  • πŸ‡ΊπŸ‡ΈUnited States drumm NY, US

    Merged and set to run hourly: https://www.drupal.org/files/project_analysis/allprojects.tsv β†’

    I’ll leave this open for getting the maintainer usernames added.

  • Status changed to Fixed over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States drumm NY, US

    Added a column of maintainers with write to VCS access.

  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands

    Drumm, i dont seem to see maintainer info in the TSV?

    a11y	a11y	2.x-dev	project_module	NULL	131		
    a11y_autocomplete_element	a11y_autocomplete_element	1.x-dev	project_module	NULL	9		
    a11y_form_helpers	a11y_form_helpers	2.0.x-dev	project_module	NULL	88		
    a11y_paragraphs_tabs	a11y_paragraphs_tabs	1.x-dev	project_module	NULL	1035		
    a11y_paragraphs_tabs	a11y_paragraphs_tabs	2.0.x-dev	project_module	NULL	1035		
    a11yproject_checklist	a11yproject_checklist	1.0.x-dev	project_module	NULL	NULL	
    
  • Status changed to Needs work over 1 year ago
  • πŸ‡³πŸ‡±Netherlands bbrala Netherlands
  • Assigned to drumm
  • πŸ‡ΊπŸ‡ΈUnited States drumm NY, US

    I’m going to move this to drush so we can purge this URL at the CDN, and should be forgotten less.

    fputcsv($stream, $row, "\t"); would have changed the quoting a bit. I’m going ahead and switching to JSON, encoded line-by-line, so you can split the file on newlines, and process this without keeping the whole thing in memory. We can of course change the output if needed.

    • drumm β†’ committed 57b5212e on 7.x-3.x
      Issue #3397020: Move project analysis data dumping to a drush command
      
  • Status changed to Fixed over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States drumm NY, US

    The new data is at https://www.drupal.org/files/project_analysis/allprojects.json β†’ and will have the CDN cache cleared when regenerated.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024