Create a service to search in code of all Drupal Contrib Modules

Comment over 2 years ago →
🇦🇲Armenia murz Yerevan, Armenia
I've still not found the way to search via GitLab only in all contrib modules code, excluding Drupal core, but grep.xnddx.ru still does this job well!

Maybe someone can suggest to me the right query?
Comment over 2 years ago →
🇺🇸United States drumm NY, US
There is a trick to it - search from the top bar at https://git.drupalcode.org/project. Once you click through to Code, there are results: https://git.drupalcode.org/search?group_id=2&scope=blobs&search=ScrollTo...

We only index the project namespace, since there isn't a huge amount of value in searching sandbox & issue fork projects. Unfortunately, this makes it harder to find.
Comment over 2 years ago →
🇦🇲Armenia murz Yerevan, Armenia
@drumm Thanks, but this search searches on all projects, including Drupal (Drupal Core), but I need to exclude it (eg, to count usages of some functions in contrib modules only).

🇧🇪Belgium falc0

I've created my "private" version of the Russion site with Hound in a docker container on my NAS (not going to share it because my NAS can't handle too much visiters :p)

Here are the steps I did (quick & dirty):

Docker container

version: "3"

# More info at https://github.com/hound-search/hound
services:
  hound:
    container_name: hound
    image: spyrolabs/hound-search:latest
    ports:
      - "8899:6080/tcp"
    volumes:
       - '/volume1/docker/hound/data:/data'
       - /var/services/homes/yvesAdmin/.ssh:/root/.ssh:ro

small python script to get all contrib modules

import requests
from bs4 import BeautifulSoup
from pathlib import Path
import math
import json

# Define the directory and base URL
directory = "/volume1/docker/hound/data"
URL = "https://www.drupal.org/project/project_module?f[3]=sm_core_compatibility:8&solrsort=iss_project_release_usage+desc&op=Search"
base_url = "git@git.drupal.org:project/{}.git"

# Define the initial configuration
config = {
    "max-concurrent-indexers": 2,
    "dbpath": "data",
    "title": "Hound",
    "health-check-uri": "/healthz",
    "vcs-config": {
        "git": {
            "detect-ref": "true"
        }
    },
    "repos": {}
}

# Function to fetch project names from a page
def fetch_projects(page_num):
    page = requests.get(URL + '&page=' + str(page_num))
    soup = BeautifulSoup(page.content, "html.parser")
    projects = soup.find_all("div", class_="node-project-module")
    return [project.find("a")['href'].split('/')[-1] for project in projects]

# Number of pages to scrape
pages = 40

# Scrape each page and add projects to the config
for i in range(pages + 1):
    project_names = fetch_projects(i)
    for title in project_names:
        repo_url = base_url.format(title)
        config['repos'][title] = {
            "url": repo_url
        }

# Define the path for the configuration file
config_file_path = Path(directory) / 'config_test.json'

# Save the updated configuration to the file
with open(config_file_path, 'w') as file:
    json.dump(config, file, indent=4)

print("Updated Hound configuration successfully.")

After that you can grep search all code and when you click on a line, you end up at GitLab.

Comment 7 months ago →

🇨🇦Canada joseph.olstad

#10 is interesting however the python script no longer works so I've converted it into php.


// Setup instructions: composer require symfony/http-client symfony/dom-crawler symfony/css-selector

require_once __DIR__ . '/vendor/autoload.php';

// Import required classes
use Symfony\Contracts\HttpClient\HttpClientInterface;
use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\DomCrawler\Crawler;

// Define the directory and base URL
$directory = "/donnees/apps/hound/drupal/data";
$url = "https://www.drupal.org/project/project_module?f[3]=sm_core_compatibility:8&solrsort=iss_project_release_usage+desc&op=Search";
$url = "https://www.drupal.org/project/project_module/index?project-status=full";
$baseUrl = "git@git.drupal.org:project/%s.git";

// Define the initial configuration
$config = [
    "max-concurrent-indexers" => 2,
    "dbpath" => "data",
    "title" => "Hound",
    "health-check-uri" => "/healthz",
    "vcs-config" => [
        "git" => [
            "detect-ref" => "true"
        ]
    ],
    "repos" => []
];

// Function to fetch project names from a page
function fetchProjects($url, $pageNum, HttpClientInterface $httpClient) {
    $response = $httpClient->request('GET', $url . '&page=' . $pageNum);
    $html = $response->getContent();
    $crawler = new Crawler($html);

    $projects = $crawler->filter("div.view-project-index div.item-list a")->each(function (Crawler $node) {
        $test_string = basename($node->attr('href'));
	$reject_this_needle = 'project-status=full';
        if (stripos($test_string, $reject_this_needle) <= 0) {
          return basename($node->attr('href'));
        }
    });

    return $projects;
}

// Number of pages to scrape
$pages = 4;

// Create an HTTP client
$httpClient = HttpClient::create();

// Scrape each page and add projects to the config
for ($i = 0; $i <= $pages; $i++) {
    $projectNames = fetchProjects($url, $i, $httpClient);
    echo "\n";
    echo count($projectNames); // Debug: Check the project names fetched for each page.

    foreach ($projectNames as $title) {
        $repoUrl = sprintf($baseUrl, $title);
        $config['repos'][$title] = ["url" => $repoUrl];
    }
}

// Define the path for the configuration file
$configFilePath = $directory . '/config_test.json';

// Save the updated configuration to the file
file_put_contents($configFilePath, json_encode($config, JSON_PRETTY_PRINT));

echo "Updated Hound configuration successfully.\n";

Comment about 2 months ago →
solideogloria
@drumm Thanks, but this search searches on all projects, including Drupal (Drupal Core), but I need to exclude it (eg, to count usages of some functions in contrib modules only).

@murz Just add -path:/core/

E.g. https://git.drupalcode.org/search?group_id=2&scope=blobs&search=DisplayE...

You can click "View syntax options" next to the search bar to see all the advanced search options.

Create a service to search in code of all Drupal Contrib Modules

Documentation location/URL

Problem/Motivation

Proposed resolution

Remaining tasks

Comments & Activities