Parsing PDF files for embedding

Created on 6 July 2024, 5 months ago
Updated 12 September 2024, 2 months ago

Problem/Motivation

When a content type has a media field type containing PDF files, we want to index the text inside the PDF and save it in the Typesense's vector database in order to allow AI embaddings.

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Needs review

Version

1.0

Component

Code

Created by

🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @robertoperuzzo
  • 🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI
  • 🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI
  • 🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI
  • Merge request !28Resolve #3459544 "Parsing pdf files" → (Open) created by robertoperuzzo
  • Pipeline finished with Failed
    5 months ago
    Total: 199s
    #220198
  • Pipeline finished with Failed
    4 months ago
    Total: 217s
    #220496
  • Pipeline finished with Failed
    4 months ago
    Total: 327s
    #222193
  • Pipeline finished with Failed
    4 months ago
    Total: 189s
    #222439
  • Pipeline finished with Failed
    4 months ago
    Total: 183s
    #222446
  • Pipeline finished with Failed
    4 months ago
    Total: 173s
    #228261
  • Pipeline finished with Failed
    4 months ago
    Total: 160s
    #228283
  • 🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI

    @lussoluca I'm not able to fix the phpstan error

    $ php vendor/bin/phpstan analyze $_WEB_ROOT/modules/custom/$CI_PROJECT_NAME $PHPSTAN_CONFIGURATION --no-progress || EXIT_CODE=$?
     ------ --------------------------------------------------------------------- 
      Line   src/Attribute/EmbeddingModel.php                                     
     ------ --------------------------------------------------------------------- 
      32     Drupal\search_api_typesense\Attribute\EmbeddingModel::__construct()  
             does not call parent constructor from                                
             Drupal\Component\Plugin\Attribute\Plugin.                            
     ------ --------------------------------------------------------------------- 
     [ERROR] Found 1 error  
    

    Any advice?

  • 🇮🇹Italy lussoluca Italy

    This has been fixed in the latest 1.0.x version

  • Pipeline finished with Success
    4 months ago
    Total: 154s
    #234994
  • Pipeline finished with Failed
    3 months ago
    Total: 403s
    #258374
  • Pipeline finished with Failed
    3 months ago
    Total: 160s
    #270384
  • Pipeline finished with Failed
    3 months ago
    Total: 604s
    #273958
  • Pipeline finished with Failed
    3 months ago
    Total: 234s
    #273975
  • Pipeline finished with Success
    3 months ago
    Total: 236s
    #273993
  • Pipeline finished with Failed
    2 months ago
    Total: 187s
    #277329
  • Pipeline finished with Failed
    2 months ago
    Total: 221s
    #277750
  • Pipeline finished with Failed
    2 months ago
    Total: 180s
    #277801
  • Pipeline finished with Failed
    2 months ago
    Total: 175s
    #279609
  • Pipeline finished with Success
    2 months ago
    Total: 1169s
    #279619
  • Issue was unassigned.
  • Status changed to Needs review 2 months ago
  • 🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI
  • Pipeline finished with Failed
    21 days ago
    Total: 188s
    #326103
  • Pipeline finished with Failed
    21 days ago
    Total: 398s
    #326116
  • Pipeline finished with Success
    18 days ago
    Total: 190s
    #328808
  • 🇮🇹Italy robertoperuzzo 🇮🇹 Tezze sul Brenta, VI
  • Pipeline finished with Success
    17 days ago
    Total: 198s
    #329892
  • Pipeline finished with Failed
    9 days ago
    Total: 320s
    #336683
  • Pipeline finished with Failed
    8 days ago
    Total: 199s
    #337724
Production build 0.71.5 2024