Drush command to print some stats

Created on 31 October 2023, over 1 year ago
Updated 2 November 2023, over 1 year ago

I have a couple of SQL queries that highlights the frequency of removed tags/attributes. I'm just creating an issue to offer an MR.

   * @usage drush filter-format-audit-stats
   *   Print top 10 tags and attributes.
   * @usage drush filter-format-audit-stats --limit=0
   *   Print all stats output for tags.

Example output:

$ drush ffas --show

 !
 ! [CAUTION] These queries can be slow with a large analysis set, and will get faster as you resolve the biggest issues
 !           and re-run your results.
 !

Tag statistics, entity counts, descending frequency
===================================================


 [INFO]
                SELECT tag, etype, COUNT(eid) FROM (
                  SELECT DISTINCT
                  arst.stripped_tags_value as tag,
                  ar.content__target_type as etype,
                  ar.content__target_id as eid
                  FROM analysis_result ar
                  INNER JOIN analysis_result__stripped_tags arst
                  ON ar.tid = arst.entity_id
                ) as summary
                GROUP BY tag, etype
                ORDER BY COUNT(eid) DESC
                LIMIT 10


style	node	34
svg	paragraph	10
path	paragraph	10
style	paragraph	7
script	node	6
h1	paragraph	5
label	node	5
input	node	5
button	node	5
polygon	paragraph	5

Attribute statistics, entity counts, descending frequency
=========================================================


 [INFO]
                SELECT tag, etype, COUNT(eid) FROM (
                  SELECT DISTINCT
                  arsa.stripped_attributes_value as tag,
                  ar.content__target_type as etype,
                  ar.content__target_id as eid
                  FROM analysis_result ar
                  INNER JOIN analysis_result__stripped_attributes arsa
                  ON ar.tid = arsa.entity_id
                ) as summary
                GROUP BY tag, etype
                ORDER BY COUNT(eid) DESC
                LIMIT 10


img.class	node	185
img.title	node	165
p.style	paragraph	165
p.style	node	164
img.id	node	158
p.role	paragraph	152
p.aria-level	paragraph	152
table.style	node	120
span.style	node	113
td.style	node	105
📌 Task
Status

Active

Version

1.0

Component

Code

Created by

🇦🇺Australia sime Melbourne

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @sime
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    run-tests.sh fatal error
  • @sime opened merge request.
  • 🇦🇺Australia sime Melbourne
  • 🇦🇺Australia sime Melbourne
  • Status changed to Needs review over 1 year ago
  • 🇦🇺Australia sime Melbourne

    I've added what i have for a pub test.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    run-tests.sh fatal error
  • 🇦🇺Australia sime Melbourne
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    run-tests.sh fatal error
  • Status changed to Needs work over 1 year ago
  • 🇦🇺Australia sime Melbourne

    I've decided to keep this in "needs work". I'm working with a site that will have quite a lot of stripped tags and attributes after the migration, and the focus is on tracking the effected content.

    This module stores a reference to the entity (which could be a paragraph, block_content or node) and then displays the link to the "Host" in the views render. This means i can't easily query for a list of "hosts" - ie usually a list of nodes, but not necessarily - without

    a) significant views work which i don't think this is warranted for this, as this is a throw away analysis module, not like entity_hierarchy, or

    b) storing some extra meta in the analysis data, literally a host record if one is found, which I imagine in some cases could make the whole result generation quite slow as it would be like "I have a paragraph with hasIssue() = true, I'm going to find a node...".

    Either way this MR was supposed to capture a couple of useful queries. I haven't really got a handle on what these questions and leaving the MR open serves the purpose of capturing the queries anyway.

  • 🇦🇺Australia sime Melbourne

    While i let this issue percolate, if anyone just wants the queries the SQL can be found in the drush command. The SQL will work on any site (I mean as long as you aren't table prefixing).

    https://git.drupalcode.org/issue/filter_format_audit-3397880/-/blob/issu...

  • 🇦🇺Australia sime Melbourne

    Don't spend too much time on it unless you really think it's worth adding. There's a bit more that could be done but mentally i got stuck on wanting to know the host entity easily.

    Yeah the table option just needs exploding the data a couple of times to get an array.

  • 🇦🇺Australia sime Melbourne

    Disclaimer is that i don't do entity queries if i don't need to abstract it, or allow other processes to hook into it. I grew weary of the things you can't do or do easily that are easy with SQL. Like doing a distinct subquery of an aggregation query works quite well from a performance pov, and doing these as entity queries is not fun imo.

  • 🇦🇺Australia sime Melbourne

    Adding a patch i did to add a "host" DER field into the analysis result. I haven't pushed this to the branch just want to capture it in the same thinking space.

  • 🇦🇺Australia sime Melbourne

    With the above patch,

      SELECT
      ar.host__target_type as etype,
      ar.host__target_id as eid,
      GROUP_CONCAT(arst.stripped_tags_value) as tags
      FROM analysis_result ar
      INNER JOIN analysis_result__stripped_tags arst
      ON ar.tid = arst.entity_id
      GROUP BY ar.host__target_type, ar.host__target_id
    

    gives me a good insight into whats going on node by node

    
    block_content	16	style
    node	581	form,label,input,button,script
    node	996	article
    node	4151	path,polygon,h1,svg,path,svg,style
    ...
    
    
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    run-tests.sh fatal error
  • Status changed to Active over 1 year ago
  • 🇦🇺Australia sime Melbourne
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    run-tests.sh fatal error
  • 🇦🇺Australia sime Melbourne

    More random updates just capturing ideas at this point - refactoring would surely be needed. I put the output into tables and added a query for summarising content (which is better run in an SQL client).


    drush ffas --limit=10

    Tag summary
    ===========

    --------- ------------- ------------ ----------------------
    tag entity_type occurances fields
    --------- ------------- ------------ ----------------------
    style node 34 body
    path paragraph 10 field_formatted_text
    svg paragraph 10 field_formatted_text
    style paragraph 7 field_formatted_text
    script node 6 body
    button node 5 body
    h1 paragraph 5 field_formatted_text
    input node 5 body
    label node 5 body
    polygon paragraph 5 field_formatted_text

    --------- ------------- ------------ ----------------------

    Attributes summary
    ==================

    -------------- ------------- ------------ ----------------------
    attribute entity_type occurances fields
    -------------- ------------- ------------ ----------------------
    img.class node 185 body
    img.title node 165 body
    p.style paragraph 164 field_formatted_text
    img.id node 158 body
    p.aria-level paragraph 151 field_formatted_text
    p.role paragraph 151 field_formatted_text
    table.style node 120 body
    td.style node 105 body
    p.style node 103 body
    th.scope node 92 body

    -------------- ------------- ------------ ----------------------

    Entity summary ordered by entity IDs
    ====================================

    [INFO] This is a summary, best to use the query (--show-queries) and your SQL tools to output this summary to a
    spreadsheet for analysis.

    -------- --------------- ---------- ----------------- ---------------------- ------- ------------
    ffa_id host_type host_eid name fields tags attr_count
    -------- --------------- ---------- ----------------- ---------------------- ------- ------------
    1 block_content 16 Some Title body style 6
    2 block_content 26 Some Title body 0
    3 block_content 46 Some Title body style 1
    933 block_content 116 Some Title field_formatted_text 0
    886 block_content 116 Some Title field_formatted_text 0
    4 block_content 126 Some Title body 0
    5 block_content 216 Some Title body 0
    6 block_content 241 Some Title body 0
    7 block_content 243 Some Title body 0
    8 block_content 245 Some Title ( body style 1

    -------- --------------- ---------- ----------------- ---------------------- ------- ------------

Production build 0.71.5 2024