- Issue created by @d.fisher
- π¬π§United Kingdom d.fisher
In the meantime I've started work on something here if anyone is interested:
https://www.drupal.org/project/mdsitemap β - πΊπΈUnited States christophweber
This is an interesting use case that also came up when we wrote the /llms.txt module β and the LLM Support recipe β . What I was envisaging then was a token that spits out the entirety or a subsection of available entities. Basically what your module does, but in token format, so that a "fuller" sitemap could be included under the Optional section of a canonical llms.txt.
From a strategic point of view, I think your code has a better home in the LLM Support recipe β than here, because thus far Markdownify is only concerned with reformatting entities to MarkDown. That said, nothing would stop us from moving your module to yet another Markdownify submodule. You'd gain instant security coverage, so there's that.
What is your preferred resolution, @d.fisher?Earlier today, Adrian (imbatman) and I were talking about all this and concluded that the most ideal solution would be if Drupal core provided a basic service to give us all entity URLs and titles of a given entity subset, and then contrib modules could reformat that to whatever output format they require. The various sitemap modules likely all use similar code. But that's a dream for now, and your base classes seem simple enough that duplication won't hurt much.
I'd also like to point out that LLMs easily parse XML (sitemaps), but token count for XML is significantly higher than for MarkDown (2-3x in our estimation). - π¬π§United Kingdom d.fisher
I like the idea that the URLs could be provided as a token for inclusion directly in the llms.txt. That makes a lot of sense. I'd initially linked to the xml sitemap from our llms.txt but there were two reasons I thought a markdown sitemap would be a good idea. 1. Less expensive to parse. 2. Can link directly to the .md versions of the pages (again less expensive to parse). In our llms.txt we've manually linked to key service and solution pages with much more context but wanted to provide a dynamic complete list of URLs should any LLMs wish to crawl, index, or train from any of our content.
The module I've created was a quick first pass and I'd be very open to developing it further in collaboration with yourself and others. I agree it feels separate from the markdownify module but does feel as though it would fit well in the LLM Support recipe. In terms of security coverage, I can opt projects into security advisory coverage, but just have to wait for 2 weeks. I literally wrote and published the module yesterday.
I'd be more than happy to make yourself and any others involved in the LLM Support recipe or Markdownify maintainers of the mdsitemap module in order to shape it to work for all of our use cases.