Rebuild project list generation for Drupal 9 compatible projects

Created on 25 August 2023, over 1 year ago
Updated 31 October 2023, over 1 year ago

Problem/Motivation

Previously these scripts were run to produce project release lists:

https://git.drupalcode.org/project/infrastructure/-/blob/main/stats/proj...
https://git.drupalcode.org/project/infrastructure/-/blob/main/stats/proj...

These generated the tsv files for drupal 8/9 compatible modules. But they are not running anymore and the last data is from June 2023. However project_analysis needs this data to analyse Drupal 10 compatibility and will need this to analyse Drupal 11 compatibility later on.

Steps to reproduce

Proposed resolution

I need a place to download projects_d11.tsv with all projects that are compatible with d10. This will be feeding project_analysis with the projects for d11 readiness.

We talked through possible solutions with Neil, Ryan Gabor and me here: https://drupal.slack.com/archives/C51GNJG91/p1692886611776179

Remaining tasks

When building this out, make the following changes:

  • Remove removal of special cases egrep -v 'geotimezone|ip2country|โ€ฆ Those special cases can be better done on the runner side
  • Remove splitting & attempting to send to the old dispatcher.drupalci.org infrastructure
  • Wrap everything in for (range(9, drupalorg_highest_core_major_version()) { so we automatically pick up the next version when it comes, outputting one file for each core compatibility

User interface changes

API changes

Data model changes

๐Ÿ“Œ Task
Status

Closed: duplicate

Version

3.0

Component

Code

Created by

๐Ÿ‡ณ๐Ÿ‡ฑNetherlands bbrala Netherlands

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @bbrala
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States drumm NY, US

    This should land as a drush command to generate the files, which would go in drupalorg module.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States drumm NY, US

    Added implementation notes from Slack to the Remaining tasks

  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    Based on source code at https://git.drupalcode.org/project/infrastructure/-/blob/main/stats/proj..., current TSV contains these columns:

    1 project machine name
    2 composer package namespace
    3 composer version string
    4 type (module, theme, theme engine, distribution)
    5 next major version compatibility field text from project node
    6 usage count of project (from last week's stats?)
    7 (versioning entity ID, not used in processing)
    8 composer compatibility information

    Currently all project branches are included that are

    • Supported by the maintainer (release branch setting on d.o)
    • Composer compatibility matches '[~^]9' regex
    • Not one of 'geotimezone|ip2country|background_process|publisso_gold' (these historically tripped up the parser)
  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    I was trying to find the composer PHP API, but not finding much. How do you query the composer endpoint?

    To decide which dataset to start from, I looked at which projects have different machine name vs. composer name. I don't know how could one get to one from the other on the APIs, the D7 REST project API does not seem to have composer information while I don't know if the composer endpoint has project information? These 44 projects for some reason have different names vs. composer names. Don't know the reason.

    acm	acm-acm
    amun	amun-amun
    amunet	amunet-amunet
    anhur	anhur-anhur
    aos	aos-aos
    augmentor_google_cloud_speech_to_text	augmentor_google_cloud_speech_to_text-augmentor_google_cloud_speech_to_text
    augmentor_google_cloud_text_to_speech	augmentor_google_cloud_text_to_speech-augmentor_google_cloud_text_to_speech
    cision_block	cision_block-cision_block
    civicrm_entity_leaflet	civicrm_entity_leaflet-civicrm_entity_leaflet
    consultation	consultation-consultation
    drupal_ad	drupal_ad-drupal_ad
    elasticsearch_helper_views	elasticsearch_helper_views-elasticsearch_helper_views
    entity_visitors	entity_visitors-entity_visitors
    field_layout	field_layout-field_layout
    field_widget_class	field_widget_class-field_widget_class
    forms_to_email	forms_to_email-forms_to_email
    gatsby_endpoints	gatsby_endpoints-gatsby_endpoints
    gleap	gleap-gleap
    googlelogin	googlelogin-googlelogin
    govdelivery_signup	govdelivery_signup-govdelivery_signup
    graphql_file	graphql_file-graphql_file
    graphql_link	graphql_link-graphql_link
    instapage_cms_plugin	instapage_cms_plugin-instapage_cms_plugin
    media_pexels	media_pexels-media_pexels
    openlayers	openlayers-openlayers
    openstack	openstack-openstack
    plugindecorator	plugindecorator-plugindecorator
    popup_block	popup_block-popup_block
    schemadotorg_demo	schemadotorg_demo-schemadotorg_demo
    schemadotorg_next	schemadotorg_next-schemadotorg_next
    search_api_pinecone	search_api_pinecone-search_api_pinecone
    setka_editor	setka_editor-setka_editor
    students	students-students
    timepicker	timepicker-timepicker
    tmgmt_wordbee	tmgmt_wordbee-tmgmt_wordbee
    translation_outdated	translation_outdated-translation_outdated
    twig	twig-twig
    uikit_views	uikit_views-uikit_views
    ui_patterns_field_group	ui_patterns_field_group-ui_patterns_field_group
    weather	weather-weather
    webform_location_html5	webform_location_html5-webform_location_html5
    webtheme_default_content	webtheme_default_content-webtheme_default_content
    xnttexif	xnttexif-xnttexif
    xtcentity	xtcentity-xtcentity
    
  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    The actual issue BTW is that the Drupal 9 compatible project list is not getting updated and needs to be replaced with this new thing. So retitling for that and updating issue summary. Also moving to the project analysis queue for now assuming we can piece this together from publicly available information.

  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    One thing that we can do is to try to composer require by the project name and if that did not work use project_name-project_name since that seems to be the composer name of those that do not equal the project name. (I don't know why).

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands bbrala Netherlands

    I kinda wonder what parts of the data we actually use in project analysis. You might be using part of the info because of the dashboard?

    In the issue for d10/d11 i added that there is a 'homepage' field that could be added in the json to at least point to the project page (and therefor project key on d.o?). That page could also be used to get the node id of that project although that might be quite some overhead.

  • ๐Ÿ‡ช๐Ÿ‡ธSpain fjgarlin

    For the composer namespace vs machine name, the property "composer_namespace" was added to the XML endpoint.
    ie: https://updates.drupal.org/release-history/webform/current

  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    @drumm points out in slack that there is https://www.drupal.org/files/project_analysis/allprojects_d10.tsv which does seem to be the new location/updated version of https://www.drupal.org/files/project_analysis/projects.tsv which we have been relying on so far (and was last updated in June 2023). So I updated the project_analysis pipeline with that URL and trying that out :)

    On the other hand, I built a whole quick script based on the D7 REST API (project names, usage, porting field text) and the update XML data (composer name, thanks @fjgarlin!, composer compatibility to filter to only D9 compatible branches). It has local caching and all :D https://git.drupalcode.org/project/deprecation_status/-/blob/script/scri... -- hopefully the above file URL update works because this remote script runs a bit slow, but not impossible, if we need to switch over.

    For now watching https://git.drupalcode.org/project/project_analysis/-/pipelines/29026 that I kicked off with an update to the new URL.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands bbrala Netherlands

    Perhaps we should just leverage out own composer privider. We should have all info we need there since all projects are exposed through composer.

    This is the root: https://packages.drupal.org/8/packages.json

    List of all packages containing 'drupal' (which should be all)
    https://packages.drupal.org/8/search.json?s=drupal

    Since package endpoint:
    https://packages.drupal.org/files/packages/8/p2/drupal/zircon.json

    I'm guessing there should be some version contraint info in there somewhere. this would mean we could just use this as the source of thruth.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States drumm NY, US

    https://packages.drupal.org/8/search.json?s=drupal does a full-text search for โ€œdrupalโ€, so it is not a list of all projects. And looking at this uncovered some small problems ๐Ÿ“Œ Search should only search modules & themes Fixed . I donโ€™t recommend trying to use search.json this way.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands bbrala Netherlands

    If i look at packagist and satis is also supplies a list.json, but i cannot find that in the drupal implementation. Im mostly looking to get a full list of projects from the composer endpoint so we can parse that as input of different tools.

  • heddn Nicaragua

    So glad this is working better again. Such great results on D10 compatibility.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands bbrala Netherlands

    In regards to the search. This seems to match everything?

    https://packages.drupal.org/8/search.json?s=

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States drumm NY, US
  • ๐Ÿ‡ญ๐Ÿ‡บHungary Gรกbor Hojtsy Hungary

    I kinda wonder what parts of the data we actually use in project analysis. You might be using part of the info because of the dashboard?

    This is how we use that data in the dashboard/data parser:

    1. project machine name: used to link to project page, create standalone URL, etc. such as https://dev.acquia.com/drupal10/deprecation_status/projects/token
    2. composer package namespace: NOT USED in the dashboard, I think its used in the analysis to require the right project though?
    3. composer version string: NOT USED in the dashboard, I think its used in the analysis to require the right project though?
    4. type (module, theme, theme engine, distribution): used in the dashboard to categorize projects: https://dev.acquia.com/drupal10/deprecation_status/projects?type=Theme (note that we don't seem to have any results for theme engines or distributions, probably due to how we run the scans)
    5. next major version compatibility field text from project node: this was used in earlier, but since projects use it so inconsistently, I decided to link to issue search for all projects instead, such as: https://drupal.org/project/issues/search/token?issue_tags=Drupal+10+comp... (the tag used by the bot and most humans)
    6. usage count of project (from last week's stats?): this is HEAVILY used to categorize projects into segments and also order them in the report, which is super important and useful to tell the impact of where the results are
    7. versioning entity ID: NOT USED, I think this is only there for SQL reasons
    8. composer compatibility information: NOT USED FROM HERE ANYMORE, it was used up until yesterday, but turns out the data was very misleading, so we query the update XML directly

    In short I think the usage information is super important and I don't think its on the composer endpoint? If we need the project endpoint for that, then we can get the porting info from there too. However the composer compatibility info is not there. Which is why I used the combination for the quick script I did. I don't think we want to add usage numbers to the composer info, because it would make it less cache-able and updated weekly all the time. So we would need another way to get that info dump even if using the composer info. I don't know how the composer endpoint exposes all available releases, but the update XML we use to find the highest stability version that is next major Drupal version compatible, so we need to look at potentially various releases of the project.

    Hope this helps @bbrala, happy to discuss in chat too :)

  • Status changed to Closed: duplicate over 1 year ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States drumm NY, US
Production build 0.71.5 2024