AI Media Discovery: Investigate JavaScript scanners as a method of extracting contextual information about media from web pages

Created on 7 September 2025, about 1 month ago

Problem/Motivation

This issue extends the issue AI Media Discovery: Investigate JavaScript scanners as a method of extracting information from images 🌱 AI Media Discovery: Investigate JavaScript scanners as a method of extracting information from images Active to look at contextual information from web pages.

We would like to extract information from web pages so that we have data about things that are not currently available to a media entity like:
When a crop yields undesired results (e.g. faces cropped awkwardly from image)
Effects applied to an image in a certain context
Overridden alt text and information provided by captions.

It was previously discussed to capture and analyse screenshots but JavaScript tools such as the approach used by Editoria11y module and by using tools like OpenCV.js and face-api.js may be efficient and effective.

How to then store and retrieve the data captured by these tools is not in scope of this particular issue. Perhaps the data can be stored in a vector database 🌱 AI Media Discovery: Store and retrieve extended media data in vector datatabase Active .

Proposed resolution

Explore how JavaScript tools might be used in combination with AI tools to extract such data from rendered web pages or from content saved in Drupal or content at the point of creation and editing.

Remaining tasks

- Investigate and explore capabilities of JS libraries and tools
- Investigate and explore capabilities of AI provider tools
- Document general findings
- Provide code examples and technical information that can be used to help realise stories in the AI media track.

🌱 Plan

Status

Active

Component

Planning

Created by

🇬🇧United Kingdom tonypaulbarker Leeds

Live updates comments and jobs are added and updated live.

Comments & Activities

Issue created by @tonypaulbarker

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024