Create a service to search in code of all Drupal Contrib Modules

Created on 5 August 2022, over 2 years ago
Updated 10 February 2023, about 2 years ago

Documentation location/URL

https://api.drupal.org/

Problem/Motivation

Very often when working with Drupal API - developers want to lookup code examples, to understand how other developers use this API.

And source code of Drupal Contrib Modules is a very good place to find such examples!

But Drupal.org officially doesn't provide an easy tool to search through all Drupal Contrib Modules, that makes such search not so easy.

Proposed resolution

Will be good to provide a web service, that allow type a function name or code block and search though all Drupal Contrib Modules source code files, showing the parts of source code blocks that match them.

Additionally will be good to have some filtering by file extension, module name, etc.

Drupal Russian community already make such tool for their needs, it is located here: http://grep.xnddx.ru/

But the official service from Drupal.org will be much better!

I think you can contact with the author of this service and ask about sharing the source code of the code search service, to not implement it fully from scratch.

Remaining tasks

✨ Feature request
Status

Closed: works as designed

Component

Docs infrastructure

Created by

πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

    I've still not found the way to search via GitLab only in all contrib modules code, excluding Drupal core, but grep.xnddx.ru still does this job well!

    Maybe someone can suggest to me the right query?

  • πŸ‡ΊπŸ‡ΈUnited States drumm NY, US

    There is a trick to it - search from the top bar at https://git.drupalcode.org/project. Once you click through to Code, there are results: https://git.drupalcode.org/search?group_id=2&scope=blobs&search=ScrollTo...

    We only index the project namespace, since there isn't a huge amount of value in searching sandbox & issue fork projects. Unfortunately, this makes it harder to find.

  • πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

    @drumm Thanks, but this search searches on all projects, including Drupal (Drupal Core), but I need to exclude it (eg, to count usages of some functions in contrib modules only).

  • πŸ‡§πŸ‡ͺBelgium falc0

    I've created my "private" version of the Russion site with Hound in a docker container on my NAS (not going to share it because my NAS can't handle too much visiters :p)

    Here are the steps I did (quick & dirty):

    1. Docker container
    2. version: "3"
      
      # More info at https://github.com/hound-search/hound
      services:
        hound:
          container_name: hound
          image: spyrolabs/hound-search:latest
          ports:
            - "8899:6080/tcp"
          volumes:
             - '/volume1/docker/hound/data:/data'
             - /var/services/homes/yvesAdmin/.ssh:/root/.ssh:ro
             
    3. small python script to get all contrib modules
    4. import requests
      from bs4 import BeautifulSoup
      from pathlib import Path
      import math
      import json
      
      # Define the directory and base URL
      directory = "/volume1/docker/hound/data"
      URL = "https://www.drupal.org/project/project_module?f[3]=sm_core_compatibility:8&solrsort=iss_project_release_usage+desc&op=Search"
      base_url = "git@git.drupal.org:project/{}.git"
      
      # Define the initial configuration
      config = {
          "max-concurrent-indexers": 2,
          "dbpath": "data",
          "title": "Hound",
          "health-check-uri": "/healthz",
          "vcs-config": {
              "git": {
                  "detect-ref": "true"
              }
          },
          "repos": {}
      }
      
      # Function to fetch project names from a page
      def fetch_projects(page_num):
          page = requests.get(URL + '&page=' + str(page_num))
          soup = BeautifulSoup(page.content, "html.parser")
          projects = soup.find_all("div", class_="node-project-module")
          return [project.find("a")['href'].split('/')[-1] for project in projects]
      
      # Number of pages to scrape
      pages = 40
      
      # Scrape each page and add projects to the config
      for i in range(pages + 1):
          project_names = fetch_projects(i)
          for title in project_names:
              repo_url = base_url.format(title)
              config['repos'][title] = {
                  "url": repo_url
              }
      
      # Define the path for the configuration file
      config_file_path = Path(directory) / 'config_test.json'
      
      # Save the updated configuration to the file
      with open(config_file_path, 'w') as file:
          json.dump(config, file, indent=4)
      
      print("Updated Hound configuration successfully.")
      

    After that you can grep search all code and when you click on a line, you end up at GitLab.

  • πŸ‡¨πŸ‡¦Canada joseph.olstad

    #10 is interesting however the python script no longer works so I've converted it into php.

    
    // Setup instructions: composer require symfony/http-client symfony/dom-crawler symfony/css-selector
    
    require_once __DIR__ . '/vendor/autoload.php';
    
    // Import required classes
    use Symfony\Contracts\HttpClient\HttpClientInterface;
    use Symfony\Component\HttpClient\HttpClient;
    use Symfony\Component\DomCrawler\Crawler;
    
    // Define the directory and base URL
    $directory = "/donnees/apps/hound/drupal/data";
    $url = "https://www.drupal.org/project/project_module?f[3]=sm_core_compatibility:8&solrsort=iss_project_release_usage+desc&op=Search";
    $url = "https://www.drupal.org/project/project_module/index?project-status=full";
    $baseUrl = "git@git.drupal.org:project/%s.git";
    
    // Define the initial configuration
    $config = [
        "max-concurrent-indexers" => 2,
        "dbpath" => "data",
        "title" => "Hound",
        "health-check-uri" => "/healthz",
        "vcs-config" => [
            "git" => [
                "detect-ref" => "true"
            ]
        ],
        "repos" => []
    ];
    
    // Function to fetch project names from a page
    function fetchProjects($url, $pageNum, HttpClientInterface $httpClient) {
        $response = $httpClient->request('GET', $url . '&page=' . $pageNum);
        $html = $response->getContent();
        $crawler = new Crawler($html);
    
        $projects = $crawler->filter("div.view-project-index div.item-list a")->each(function (Crawler $node) {
            $test_string = basename($node->attr('href'));
    	$reject_this_needle = 'project-status=full';
            if (stripos($test_string, $reject_this_needle) <= 0) {
              return basename($node->attr('href'));
            }
        });
    
        return $projects;
    }
    
    // Number of pages to scrape
    $pages = 4;
    
    // Create an HTTP client
    $httpClient = HttpClient::create();
    
    // Scrape each page and add projects to the config
    for ($i = 0; $i <= $pages; $i++) {
        $projectNames = fetchProjects($url, $i, $httpClient);
        echo "\n";
        echo count($projectNames); // Debug: Check the project names fetched for each page.
    
        foreach ($projectNames as $title) {
            $repoUrl = sprintf($baseUrl, $title);
            $config['repos'][$title] = ["url" => $repoUrl];
        }
    }
    
    // Define the path for the configuration file
    $configFilePath = $directory . '/config_test.json';
    
    // Save the updated configuration to the file
    file_put_contents($configFilePath, json_encode($config, JSON_PRETTY_PRINT));
    
    echo "Updated Hound configuration successfully.\n";
    
    
    
Production build 0.71.5 2024