50,000 sample files generated in one day

Created on 5 December 2023, 7 months ago
Updated 26 January 2024, 5 months ago

Problem/Motivation

On December 3, 2023 our site autogenerated some 50,000 text files, both public and private. We do not know how or why. These files did not exist on December 2nd. The only thing that happened that day is that there was heavier than usual usage of Layout Builder and W3CSS Theme and W3CSS Paragraphs module.

Steps to reproduce

The files look like this:
-rw-rw-r-- 1 apache apache 987 Dec 3 12:20 zYDVeCE9Zv.txt
-rw-rw-r-- 1 apache apache 590 Dec 3 12:22 zYe4nLgmVT.txt
-rw-rw-r-- 1 apache apache 909 Dec 3 12:22 zyH4nWJB9l.txt
-rw-rw-r-- 1 apache apache 855 Dec 3 12:20 zyJ1pdpsHd.txt
-rw-rw-r-- 1 apache apache 1041 Dec 3 12:20 zYJ4MgI4Lz.txt
-rw-rw-r-- 1 apache apache 664 Dec 3 12:20 zykYnuPZij.txt
-rw-rw-r-- 1 apache apache 1232 Dec 3 12:20 zYo549BVRf.txt

They are all *.txt files, have the same filename structure, contain text that appears to be Latin, are apparently stored with file descriptions (the file attachments description field).

These files also do not have top parent nodes. That is to say that all files that we manually upload are attached either directly to nodes, or to paragraphs that are attached to nodes. We NEVER upload files that are not somehow attached to nodes.

This is the count in both the public and private directories have I have deleted some 30,000.

Public Directory
[root@ip-172-31-13-94 files]# ls -1 *.txt | wc -l
9046
Private Directory
[root@ip-172-31-13-94 demo9]# ls -1 *.txt | wc -l
8896

Matches this number of files without node parents (currently in delete queue I created):
Queued: 17731 | Processing: 1 | Success: 30798 | Failure: 0

Typical content:

Abdo antehabeo minim natu utrum validus vel. Abbas dignissim ex exputo genitus iriure nunc sagaciter vindico. Appellatio at enim incassum nimis oppeto refero vindico virtus. Nobis paratus premo singularis suscipit te uxor verto virtus. Comis conventio exputo fere letalis luctus nunc paulatim ut vero.
Adipiscing commoveo dignissim singularis. Commodo huic ullamcorper vereor. Abico aliquip conventio eu gemino iaceo lucidus tum. Exerci illum laoreet tum uxor validus vereor. Abbas importunus modo tincidunt. Appellatio distineo facilisi minim secundum singularis.
Commoveo et gilvus jugis nibh paulatim refero sino usitas vel. Adipiscing aptent euismod exerci ibidem modo natu singularis sudo. Facilisi nobis nostrud plaga quae ratis refero si utinam.

There is some process which is creating these files and actually saving them titles (attachment descriptions):

type: file
dsid: 31722
title: Decet facilisi humo ille lenis loquor nobis ullamcorper venio. Blandit camur enim exerci gravis olim oppeto quidem sit sudo.
filename: jU1VCEBEhn.txt
file usage type: paragraph
public: N
timestamp: 2023-12-03T17:48:27Z
Orphaned file. Ignoring.

Again, I cannot tell what is creating these files or why. They just appear to be the same type of files that are used as dummy text in Layout Builder, And, they coincidentally seemed to appear the same day that Layout Builder along with W3CSS and W3CSS Paragraphs were used more than usual.

Proposed resolution

I have been advised on Slack that the answer may lie within: \Drupal\file\Plugin\Field\FieldType\FileItem::generateSampleValue

But, no further details were given. So, I suspect my question is: How do I stop Layout Builder, or indeed any module, from generating this excessive amount of sample files?

Remaining tasks

User interface changes

If there is no interface for handling this, there should be!

API changes

But, I'll settle for an API solution.

Data model changes

Release notes snippet

πŸ’¬ Support request
Status

Postponed: needs info

Version

10.1 ✨

Component
Layout builderΒ  β†’

Last updated 16 minutes ago

Created by

πŸ‡ΊπŸ‡ΈUnited States SomebodySysop

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @SomebodySysop
  • Status changed to Postponed: needs info 7 months ago
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Can you reproduce this by using layout builder the same way the person was on that day?

    There is nothing in layout builder itself that does this but those contrib projects may be generating dummy files

  • πŸ‡ΊπŸ‡ΈUnited States SomebodySysop

    The person is going to work on the site in the am. I will ask him to note the number of files in the Solr index before he starts and after, and what modules he uses.

  • πŸ‡ΊπŸ‡ΈUnited States SomebodySysop

    Went through today and tried to duplicate the issue. Attached are results: https://www.drupal.org/files/issues/2023-12-05/2023-12-05%20activity%20l... β†’

    Short of it, couldn't duplicate. The only significant difference I could glean is that on the 3rd, the theme was changed (upgraded/modified).

    I also did a search for Contrib and Core modules that have generateSampleValue:

    Contrib modules

    [root@ip-172-31-13-94 modules]# grep -R generateSampleValue *
    contrib/redirect/src/Plugin/Field/FieldType/RedirectSourceItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    contrib/devel/devel_generate/README.md:implement `\Drupal\Core\Field\FieldItemInterface::generateSampleValue()`.
    contrib/devel/devel_generate/README.md:see: https://api.drupal.org/api/drupal/core!lib!Drupal!Core!Field!FieldItemI…
    contrib/entity_reference_revisions/src/Plugin/Field/FieldType/EntityReferenceRevisionsItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {

    If I remove the tests from Core results:

    [root@ip-172-31-13-94 modules]# grep -R generateSampleValue *
    comment/src/Plugin/Field/FieldType/CommentItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    datetime/src/Plugin/Field/FieldType/DateTimeItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    datetime_range/src/Plugin/Field/FieldType/DateRangeItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    file/src/Plugin/Field/FieldType/FileItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    file/src/Plugin/migrate/destination/EntityFile.php: $value = UriItem::generateSampleValue($field_definitions['uri']);
    file/src/Plugin/migrate/destination/EntityFile.php: // generateSampleValue() wraps the value in an array.
    file/tests/src/Kernel/FileItemTest.php: // Test the generateSampleValue() method.
    image/src/Plugin/Field/FieldType/ImageItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    layout_builder/src/Plugin/Field/FieldType/LayoutSectionItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    link/src/Plugin/Field/FieldType/LinkItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    migrate/src/Plugin/migrate/destination/EntityContentBase.php: $values = $field_type_class::generateSampleValue($field_definition);
    options/src/Plugin/Field/FieldType/ListItemBase.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    path/src/Plugin/Field/FieldType/PathItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    text/src/Plugin/Field/FieldType/TextItemBase.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    user/src/Plugin/migrate/destination/EntityUser.php: $name = UserNameItem::generateSampleValue($field_definitions['name']);
    user/src/Plugin/migrate/destination/EntityUser.php: $mail = EmailItem::generateSampleValue($field_definitions['mail']);
    user/src/StatusItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    user/src/TimeZoneItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    user/src/UserNameItem.php: public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
    [root@ip-172-31-13-94 modules]#

    The site worked on is not open to the public yet, so the only activity that day and today was the essentially layout stuff.

    Can anyone think of any circumstance where 50,000 sample files would be generated within the space of a few hours?

  • πŸ‡³πŸ‡ΏNew Zealand DanielVeza Brisbane, AU

    This is indeed caused by FileItem::generateSampleValue.

    This can be confirmed by:

    1. Installing a standard site
    2. Enable Layout builder on the page CT
    3. Add a file field to the page CT
    4. Edit the default layout and confirm that the file has been created on the server.

    Out of the box a file should only be created by viewing the default layout of a content type that has a file field on it. That shouldn't generate 50,000 files in a few hours. There may be something else at play here with the paragraph setup. Do the paragraphs have file fields?

    This seems like it's working by design, the only thought I have after trying to replicate and having a look at the code, I wonder if the file thats created in ::generateSampleValue could be marked as a temporary file. That way the file will be garbage collected in the future.

  • πŸ‡ΊπŸ‡ΈUnited States SomebodySysop

    Yes, we do have at least one paragraph content type configured with file attachment fields.

Production build 0.69.0 2024