search api attachments does not appear to support s3fs

Created on 18 January 2023, almost 2 years ago
Updated 6 March 2024, 8 months ago

Problem/Motivation

Drupal 9.5.0. Search API 8.x-1.28. S3FS 8.x-3.1.

Testing to see if Search API Attachments will index files on site using AWS S3 file system. So far, the answer appears to be no.

Steps to reproduce

Install Search api and Search api attachments (configured using solr extractor).

Index site with several nodes which contain file attachments. Both node and pdf attachments text are indexed.

Install S3 File system module ( https://www.drupal.org/project/s3fs β†’ ) to Drupal system and configure to use AWS S3 as file system.

Add new nodes with pdf file attachments.

Re-index site. Text from new nodes is indexed, but text from newly added pdfs is not.

The files that are indexed still use the local filesystem and not the configured S3 filesystem:

Note that the file datasource links in the search results DO point to the S3 filesystem, but the content of files that are NOT on the local filesystem do not appear to be indexed.

Proposed resolution

If search api attachments does not already support indexing s3fs file system attachments, modify it so that it does.

Remaining tasks

It could possibly be that the limitation is is the extractor. If the solr extractor used can only read from the local hard drive, then that would make it nearly impossible for search api attachments to accommodate this change.

User interface changes

I don't think any user interface changes would be required.

API changes

I've looked at the code and it appears the file that would need to be modified is FilesExtractor.php. So far, I've not been able to figure out where in the code that change should happen.

Data model changes

The only data model change I can think of is that the extractor reads files and extracts from S3 file system instead of local file system if S3 is enabled.

πŸ’¬ Support request
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States somebodysysop

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States Chris Burge

    @SomebodySysop are you seeing anything in the logs?

    Just looking through the code, TextExtractorPluginBase::getRealpath() is of interest to me:

      /**
       * Helper method to get the real path from an uri.
       *
       * @param string $uri
       *   The URI of the file, e.g. public://directory/file.jpg.
       *
       * @return mixed
       *   The real path to the file if it is a local file. An URL otherwise.
       */
      public function getRealpath($uri) {
        $wrapper = $this->streamWrapperManager->getViaUri($uri);
        if($wrapper != FALSE){
          $scheme = $this->streamWrapperManager->getScheme($uri);
          $local_wrappers = $this->streamWrapperManager->getWrappers(StreamWrapperInterface::LOCAL);
          if (in_array($scheme, array_keys($local_wrappers))) {
            return $wrapper->realpath();
          }
          else {
            return $wrapper->getExternalUrl();
          }
        }
      }
    

    I'm wondering if the method is failing to return a usable value here.

  • πŸ‡ΊπŸ‡ΈUnited States somebodysysop

    Thanks for the response. I have not looked at it in over a year since I don't know enough about the module to develop a patch. It was my hope that someone with much more knowledge would stumble upon this and figure out a solution.

Production build 0.71.5 2024