Merged! Thanks for all of the work on this!
@robertoperuzzo – thanks for the work on this!
Our team was looking for a way to leverage search_api_attachments with Unstructured to clean up the text for our RAG search, and this is the perfect solution. We have tons of legacy content that hasn’t been OCR’d, so this works well for that.
I did run into a few issues and have a couple of wish-list items:
- Chunking elements – the form values don’t persist after save (adding those fields to
submitConfigurationForm()
fixed it for us).
- Chunking settings not applied – the options aren’t being included in the payload request, so they don’t appear to take effect yet.
- Large files time-out – anything over ~1 MB (we have one that’s 1.1 MB) hits a
DelayedRequeueException
loop. Increasing the Guzzle time-outs solved it locally; exposing these as configurable options would be great.
php
$options = [
RequestOptions::HEADERS => [
'Accept' => 'application/json',
'unstructured-api-key' => $api_key?->getKeyValue() ?? '',
],
RequestOptions::MULTIPART => [
[
'name' => 'files',
'contents' => $file_resource,
'filename' => $file->getFilename(),
'headers' => [
'Content-Type' => $file_mime_type,
],
],
[
'name' => 'strategy',
'contents' => 'ocr_only',
],
],
RequestOptions::TIMEOUT => 300,
RequestOptions::CONNECT_TIMEOUT => 30,
RequestOptions::READ_TIMEOUT => 300,
];
- Expose strategy choices – it would be awesome to have a dropdown for extraction strategy (e.g., ocr_only, high_res, etc.) so we can pick the best option for non-OCR’d PDFs.
Anyways, great work on this -- hope we can get this merged in soon. Attaching PDF that we have been having issues with if you need it for testing.
Rebased and fixed coding standard issues.
Enabled Gitlab CI and added some basic functional tests to validate the functionality.
@pmagunia - thanks for reporting this. I've added a new constraint that should now be used on 10.2+ for validation. Going to try and write up some tests for this before getting it in.
We are currently experiencing the same issue on 10.4.7 with ai / ai_agent on latest dev release. It looks like it's caused by the node_unpublish_by_keyword_action and returned no attributes. Adding the above seems to have cured our problem.
@raveen_thakur51 - Thanks for adding that, I've went ahead and fixed the phpcs and cspell issues as well. We should be good to go now.
j-barnes → created an issue.
j-barnes → created an issue.
@mkalkbrenner - this is working much better now! Do you think it's worth adding some extra logic in the getRevisionableEntityTypes()
to filter out some of the entity types that might not be applicable like paragraphs or unique ones like menu_link_content where the entity_type_id might differ?
Thanks for the work on this, this is looking great. Going to do some testing on it and we will get it in.
Good catch and thank you!
The correct command for should:
``drush rm:queue`` - Queue all enabled entities for revision deletion processing.
I've updated that part of the form title as it doesn't really make sense to have it under the "disable_automatic_queueing" section.
j-barnes → created an issue.
Added tests for this.
j-barnes → created an issue.
Thanks for testing!
We’re experiencing this same issue with larger file uploads. When using the widget, files are uploaded directly to S3 client-side. However, the fileInput element still retains the selected file. This means when the form is later submitted via FormData, the file is included again in the request body — even though it was already uploaded — which can cause server-side errors such as exceeding post_max_size.
I think the simple solution is just to clear the fileInput value so that FormData doesn't include the file.
j-barnes → made their first commit to this issue’s fork.
j-barnes → created an issue.
It looks like the project is listed under community projects which is has some limitations.
In the meantime, I’ve been using the patch below to purge the items once it reaches a specified size:
https://www.drupal.org/project/admin_audit_trail/issues/3197592 ✨ Add settings to toggle between expanded/collapsed filters. And set an option to limit database table size Needs work
Updated the latest patch to include a small change to allow for patterns similar to how config ignore works. We encountered a use case where enabling client-side validation for all webform submissions was necessary. However, manually adding hundreds of form IDs was impractical.
Example:
webform_submission_name_change_test_form
webform_submission_*
@nicholass – Thanks for the patch! It's working great. I've rebased this and made a few minor updates, including dependency injection and some coding standard improvements. I'm also opening a merge request to get more eyes on it. Appreciate your work on this!
j-barnes → made their first commit to this issue’s fork.
Also looking at including this in our project if a release is created, thanks!
Awesome, our team was actually just asking about this functionality. Seems to be working great. Thanks for the contrib!
j-barnes → made their first commit to this issue’s fork.
j-barnes → created an issue.
I've added a new token access test based off the webform populate test. Let me know if you'd like any changes or if you'd prefer some kind of base class added. This should at least get us headed in the right direction.
@Joe - Thanks for the contribution, this is working great for our team. Created the MR for this so we can hopefully get it merged soon.
Seems to be related to and fixed in: https://www.drupal.org/project/drupal/issues/2896169 🐛 Details elements have incorrect aria-describedby attributes Needs work
That makes sense to me, I'll do that going forward. I fixed the one issue, but I think the others may be unrelated. See below:
------ --------------------------------------------------------
Line tests/src/Functional/HandlePdfControllerTest.php
------ --------------------------------------------------------
20 @coversDefaultClass references an invalid class
\Drupal\fillpdf\Controller\HandlePdfController
Also covers \Drupal\fillpdf\Plugin\FillPdfActionPlugin
and \Drupal\fillpdf\OutputHandler..
🪪 phpunit.coversClass
------ --------------------------------------------------------
[ERROR] Found 1 error
FILE: ...3/web/modules/custom/fillpdf-3460893/tests/src/Traits/TestFillPdfTrait.php
--------------------------------------------------------------------------------
FOUND 0 ERRORS AND 1 WARNING AFFECTING 1 LINE
--------------------------------------------------------------------------------
112 | WARNING | [x] Empty PHP statement detected: superfluous semicolon.
| | (Generic.CodeAnalysis.EmptyPHPStatement.SemicolonWithoutCodeDetected)
--------------------------------------------------------------------------------
Thanks for the feature, this is working great for us! I went ahead and updated / refactored this for the new 5.2.x-dev branch.
j-barnes → changed the visibility of the branch 3460893-webform-fillpdf to hidden.
We are currently experiencing the same issue on the latest Webform 6.3.x-dev utilizing the name element when dealing with multiple values.
Below is the outputted value using a dump on the submission view, so the input does exist.
After digging a bit, you can see the below value that gets passed for rendering. The template item-list expects to receive a flat structure, so the item is never rendered. The table view uses a different composite rendering function, is unaffected by this issue.
A simple fix to get us started is to add modifications to: docroot/modules/contrib/webform/src/Plugin/WebformElementBase.php
if ($item) {
$items[] = is_array($item) ? reset($item) : $item;
}
This adjustment flattens the item, allowing it to render correctly. However, further review is needed to ensure there are no unintended side effects from this change.
Attaching patch with update hook that we used to fix the issue.
Thanks for the patch, working great for us. Re-rolling against the latest changes.
j-barnes → made their first commit to this issue’s fork.
We ran into this issue when upgrading from 2.2.2 to 3.3.3. We realized that we had a drush cr
before our drush updatedb
.
Following the correct drush deploy sequence fixed our issue.
drush updatedb --no-cache-clear
drush cache:rebuild
drush config:import
drush cache:rebuild
drush deploy:hook
j-barnes → created an issue.
j-barnes → created an issue.
This is still an issue in the latest, the above fixes the problem for us.
Unassigning myself for now due to higher priority tickets, will revisit soon.
Updated, thanks!
j-barnes → created an issue.
j-barnes → created an issue.
@Bradley-B - Removing the && ($conjunction == 'AND')
worked great for our use case. We use "Contains Any" for our full text search, which does not appear to work correctly in conjunction with "match_entire_string". Thanks!
@asigrist - Thanks for the breakdown. We are facing the same issues on some of our forms that leverage "Acrobat's Extended" version. Ghostscript actually regenerates the PDF file, but also flattens it -- which results fillable forms being in a printed state.
Originally, we leveraged ExifTool with a PHP wrapper. This allowed us to not have to regenerate the entire PDF, but rather only modify the metadata. At the time, the library was not being maintained (so we switched over to Ghostscript) -- but it looks like it is now so it might be a good opportunity make this module a bit more generic and allow multiple providers.
I think we can solve this but implementing the below:
- Introduce generic providers.
- Add back Exiftool with ability to toggle between providers.
j-barnes → made their first commit to this issue’s fork.
j-barnes → created an issue.
j-barnes → made their first commit to this issue’s fork.
@jenny.tollerson / @alemadlei - Our team was running into the same issue using the "Views Infinite Scroll" pager in a view with ajax enabled. After the user clicks the "Load More" it jump back up to the top of the page after a minor delay. We have the translate block attached to the top of the page via block layout. Adding this has cured the problem for us, thanks for the patch!
@Defcon0 - I'm running into the same issue where my times went up significantly after enabling the Highlight processor (with Retrieve result data from Solr" and "Retrieve highlighted snippets" enabled).
Query build time 49.77 ms
Query execute time 130.9 ms
View render time 229.75 ms
to
Query build time 53.55 ms
Query execute time 1988.97 ms
View render time 2075.75 ms
After troubleshooting a bit, I noticed that field / excerpt were not available when the field was not explicitly added to the view. After adding my rendered_html field to the view (hidden), my times were near instant again. (Originally I just had Search: Excerpt and would select my search fields in Search: Fulltext search)
With render html field added to the view:
Without render html field:
Added a simple option to allow for disabling the post process query.
j-barnes → created an issue.
Added a couple minor tweaks, but tested and this is working great! Going to have some internal team members QA this and we should be good to go to merge. Thanks again for the contribution!
@mostepaniukvm - What are your thoughts on adding a role field and also incorporating the prompt replacement {input} similar to how the other augmentor behaves.
https://git.drupalcode.org/project/augmentor_chatgpt/-/blob/1.0.x/src/Pl...
if ($role == 'user') {
$content = str_replace('{input}', '"' . addslashes($input) . '"', $this->configuration['prompt']);
}
}
@mostepaniukvm - Looks great, going to give this a try. Thanks for the MR.