[PP-1] Assess the search quality

Created on 11 May 2024, 7 months ago

Problem/Motivation

As soon as the endpoint is available on d.o 📌 Promote drupal endpoint to full / default source plugin Postponed it would be a reasonable step to assess the general search quality and repeat that exercise regularly afterwards. if a search is considered useful, is usually based on if an autocomplete feature is available and its list of suggestions, and more importantly if the first 5 to 10 results are considered useful and relevant.

One example, I've search simply for ECA and no matter which option in the sort by section i've picked without any category selected i had a hard time to get anything i consider relevant in the context of the eca module ecosystem.

Proposed resolution

one approach to take is the following:

  • take the top 100 search queries.
  • let an editorial team decide which would be the top three modules that come to their mind when that search query is entered.
  • compare those 3 modules for each search term against the top 5 modules that are returned for each search query in project browser.
  • all three results named by the editorial team have to be within the top 5 results from project browser.
  • if they are it is a +1 otherwise a 0. that way you get a score between 0 and 100.
📌 Task
Status

Postponed

Version

1.0

Component

API site

Created by

🇩🇪Germany rkoller Nürnberg, Germany

Live updates comments and jobs are added and updated live.
  • Usability

    Makes Drupal easier to use. Preferred over UX, D7UX, etc.

Sign in to follow issues

Comments & Activities

  • Issue created by @rkoller
  • Status changed to Active 7 months ago
  • 🇩🇪Germany rkoller Nürnberg, Germany

    According to @drumm this issue doesn't have to be postponed anymore since d.o has project browsing now.

  • 🇺🇸United States drumm NY, US

    The top 200 searches from Google Analytics data are:

     565 ckeditor
     536 commerce
     515 webform
     373 views
     297 menu
     286 admin toolbar
     280 slider
     252 token
     248 taxonomy
     235 "lazy load"
     234 calendar
     227 entity
     214 pathauto
     212 devel
     192 ckeditor5
     188 view
     179 layout builder
     174 paragraphs
     173 bootstrap
     170 migrate
     161 rules
     160 captcha
     154 smtp
     150 comment
     142 gallery
     127 pdf
     127 paragraph
     126 field
     125 image
     124 form
     123 profile
     122 layout
     120 carousel
     115 address
     112 ctools
     111 webp
     111 seo
     111 chat
     110 backup
     109 admin
     108 search
     105 entity reference
     103 php
     102 slick
     102 feeds
     102 breadcrumb
     101 workflow
      99 date
      98 media
      98 google
      97 slideshow
      96 css
      94 video
      94 color
      93 redirect
      92 metatag
      90 user
      89 taxonomy manager
      89 ai
      88 popup
      83 taxonomy menu
      82 mail
      81 inline_entity_form
      81 group
      81 book
      81 blog
      80 chatbot
      78 google analytics
      78 editor
      76 language
      74 wysiwyg
      74 filter
      74 cookie
      73 import
      73 imce
      73 drush
      71 toolbar
      71 recaptcha
      70 reference
      69 twig
      68 button
      68 api
      67 entity_reference_revisions
      66 h5p
      65 map
      65 forum
      64 search api
      64 openai
      63 sitemap
      63 poll
      63 lms
      63 file
      63 colorbox
      62 ldap
      62 field group
      62 event
      61 table
      61 scheduler
      61 path
      61 password
      60 timeline
      60 panels
      59 drupal commerce
      58 node
      58 link
      58 backup and migrate
      58 back to top
      57 youtube
      57 jquery
      56 theme
      56 json
      56 icon
      55 jquery_ui
      55 cache
      54 select
      54 rest
      53 shop
      53 gutenberg
      53 export
      52 quiz
      52 clone
      52 cart
      51 ckeditor 5
      50 newsletter
      50 modal
      50 context
      49 composer
      49 audio
      48 optimize images
      48 dashboard
      48 content access
      48 ajax
      47 qr code
      47 libraries
      47 facebook
      47 dns-prefetch
      47 booking
      46 upload
      46 tree
      45 taxonomy unique
      45 facets
      44 wordpress
      44 views ui
      44 linkit
      44 fields
      43 superfish
      43 share
      42 ubercart
      41 views slideshow
      41 opigno
      41 events
      40 user permissions
      40 stripe
      40 social
      40 gin
      39 title
      39 smart date
      39 glossary
      39 database
      39 accordion
      38 print
      38 login
      38 block
      38 admin_menu
      37 slide
      37 field_group
      37 charts
      37 background
      36 signup
      36 oauth
      36 hero
      36 crm
      36 config
      36 accessibility
      35 wiki
      35 whatsapp
      35 email
      35 builder
      34 simplenews
      34 registration
      34 memcache
      34 ecommerce
      34 chaos tools
      34 asset injector
      33 vote
      33 style
      33 mercury
      33 chart
      32 page manager
      32 news
      32 leaflet
      32 flag
      32 banner
      31 views data export
      31 contextual filters
      30 twitter
      30 survey
      30 qr
      30 ecwid
      30 custom block

    There is definitely a long tail, there were 8,719 unique queries.

  • 🇩🇪Germany rkoller Nürnberg, Germany

    thank you for the list! just for the context, what period of time these queries cover? are those the top queries the last week or the last month? and searches from google analytics data, does that mean the searches on google, or searches directly on d.o just tracked with google analytics?

  • 🇺🇸United States drumm NY, US

    Last 28 days, Google Analytics data of what people search for directly on https://www.drupal.org/project/project_module

  • 🇩🇪Germany rkoller Nürnberg, Germany

    Thanks for the clarification! and one thought i've raised over on slack but wanted to document in here per request by @drumm. would it be make sense and be possible to get the search strings only when works with is set to either any, drupal 11, drupal 10, drupal 9 , or drupal 8. and to exclude searches targeting drupal 4-7. those wouldnt have any relevance for project browser anymore?

  • 🇺🇸United States leslieg

    The "Most Relevant" sort criteria should not be an option if there are no Search terms entered, THe results displayed currently are:

    If Search terms are present it seems to make sence that Most Relevant become the default sort option

  • 🇺🇸United States drumm NY, US

    would it be make sense and be possible to get the search strings only when works with is set to either any, drupal 11, drupal 10, drupal 9 , or drupal 8. and to exclude searches targeting drupal 4-7. those wouldnt have any relevance for project browser anymore?

    I haven't had a chance to get this filtering done. I’m hesitant to post the raw data since people don’t expect their searches to be made public, and there could be some unique searches in the long tail. And I think people are likely searching for the same things across versions, we’re still making websites and Drupal’s vocabulary isn’t changing too quickly. And searches for older versions will have already fallen off, sites under active development are on newer versions.

    The "Most Relevant" sort criteria should not be an option if there are no Search terms entered, THe results displayed currently are:

    I think the sort from Drupal.org should always be “relevance”, or something like that. Number of installs should always be a factor in ranking. We may add other factors that are always included, like

    • Development status = no further development → negative boost
    • Has a version compatible with the latest version of Drupal → positive boost

    So there may be no purely sorting by popularity in the future.

    Text searches should consider all of that in weighting, but with the text matches on project title, name, description as the most important ranking factor.

  • 🇺🇸United States chrisfromredfin Portland, Maine
  • First commit to issue fork.
  • 🇪🇸Spain fjgarlin

    Trying to run some tests.

  • 🇪🇸Spain fjgarlin

    Related issue. I ran some tests connecting Project Browser to my local, which uses the code on that other issue, and the default results seem to be scored by active installs. I'm going to deploy the changes on that issue to see if there is an impact.

  • 🇪🇸Spain fjgarlin

    Default:

    Search for "webform":

  • 🇩🇪Germany rkoller Nürnberg, Germany

    would it be possible to get a recent top 100 queries list for the last month or even spanning across a longer interval so that we can then create the list of expected results? (or should be go with the list in #3)?

  • 🇺🇸United States drumm NY, US

    Go with the list in #3, the search queries likely aren’t changing that often.

Production build 0.71.5 2024