[META] AI LLM Optimized Content

Created on 26 August 2025, 19 days ago

Goals & Guiding Principles

  • Guide LLM crawlers: Direct large language model (LLM) crawlers to a high-quality, pre-optimized corpus of content, ensuring they consume relevant information and reduce hallucinations.
  • Reduce training noise: Remove irrelevant HTML markup, navigation, and other "noise" that can confuse or dilute the core message of the content.
  • Protect primary SEO: Ensure that the simplified Markdown content is not indexed by traditional search engines, preventing duplicate content issues and maintaining the site's primary SEO strategy with its original HTML content.
  • Ease of use: Package the feature as a Drupal recipe to make it easy for site builders to implement best practices for LLM crawlers with a single installation.

Technical Approach

This feature will be built in a similar way to the recipe "LLM support" ( https://www.drupal.org/project/llm_support ). It will use the following modules:

Potentially the two recipes will be merged.

MVP Functionality

The ai_recipe_llm_optimized_content recipe will extend Drupal CMS by providing a mechanism to expose a curated, static, and optimized content corpus to LLM crawlers.

  • Configuration Interface: Provide a simple configuration interface where the site administrator can select the content types and individual nodes to be exposed to LLM crawlers.
  • Content Conversion: Convert the selected content into a clean Markdown format. This process will also optimize the content for LLM consumption, for example, by adding "Key Questions" or FAQ sections that can be answered based on the content of the document.
  • LLM.txt Generation: Automatically generate a valid llms.txt file at the site's root (e.g., example.com/llms.txt).
  • Crawler Guidance: The llms.txt file will contain a correctly formatted Allow rule for the site, with all included URLs ending in .md. Navigating to an .md URL (e.g., example.com/my-page.md) will successfully serve the Markdown version of the page content.
  • Web Server Configuration Documentation: Provide clear, copy-and-paste instructions for the two most common web servers (Apache and Nginx) to ensure the .md files are not indexed by traditional search engines.

Post MVP Functionality

  • Multilingual Support: Support for multilingual content.
  • Integration with other AI modules: Integration with other AI-driven modules for content generation.
  • Customizable Rules: A user interface to define custom optimization rules beyond the default "Key Questions" or FAQ sections.
🌱 Plan
Status

Active

Component

Planning

Created by

🇩🇪Germany breidert

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024