Tweak the summarize/taxonomy suggester input to get better output

Issue created by @kevinquillen
Comment over 2 years ago →
System Message

kevinquillen → committed 4704a73f on 1.0.x
Issue #3343862 by kevinquillen: Tweak the summarize/taxonomy suggester...
Comment over 2 years ago →
🇺🇸United States kevinquillen
Committed some changes to dev. The results coming back are already 10x better than they were previously. Here is an example from my own content:

It has been far more accurate with every article I have tried.

It would be good to expand on this a bit here later and allow the user to select which longtext field to summarize, but for now this is working. I may be able to get to that part (selecting which field) next week.
Comment over 2 years ago →
🇺🇸United States kevinquillen
It might be a good idea to implement a pass through DOMDocument too and just delete nodes that are pre or code formatted. I have noticed that code samples even though passed through strip_tags isn't cleanly removed and can interfere with summaries.
Status changed to Needs work over 2 years ago5:44pm 23 February 2023
Comment over 2 years ago →
🇺🇸United States kevinquillen
Comment over 2 years ago →
🇺🇸United States kevinquillen
So far so good. Still getting good results.

I did notice that we may have an issue trying to remove special tokens like or ... may have to figure out how to handle that too.
@kevinquillen opened merge request.
Comment over 2 years ago →
🇺🇸United States d0t101101
@kevinquillen - I can also confirm that these tweaks to the OpenAI queries made a dramatic improvement for summarization and taxonomy generation, which is part of the openai_content sub module (which now appears on the node edit pages). Tested this across 10 different nodes with varying subjects and lengths; working great.

Well done, sir!
Comment over 2 years ago →
🇺🇸United States kevinquillen
Ok, this is probably in a good enough position at the moment. I can go back and help other areas with the StringHelper (name subject to change) utility class, and implement the stopwords method that is currently in the queue worker and bring it all into one helper class.
Comment over 2 years ago →
System Message

kevinquillen → committed c4f09af4 on 1.0.x
Issue #3343862 by kevinquillen: Tweak the summarize/taxonomy suggester...
Status changed to Fixed over 2 years ago3:49pm 28 February 2023
Comment over 2 years ago →
🇺🇸United States kevinquillen
Comment over 2 years ago →
🇺🇸United States d0t101101
The latest 'Suggest Taxonomy' feature is great and quite powerful! Excellent English grammar skills over there with now requesting 'nouns and adjectives' only too :)

I've noticed that if you repeatedly 'Suggest Taxonomy' again and again, sometimes its in a numbered list, and other times its a comma separated list. Ideally this should be a comma separated list only so that it can be quickly copied and pasted into a Drupal Autocomplete Tags type of input. This seems to do the trick!

'Suggest five words to classify the following text. The words must be nouns or adjectives, comma separated:'
Comment over 2 years ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Tweak the summarize/taxonomy suggester input to get better output

Problem/Motivation

Proposed resolution

Remaining tasks

Comments & Activities