Add sitemap.xml tests for XML Sitemap

Created on 19 July 2023, over 1 year ago
Updated 11 August 2023, over 1 year ago

Add a test that confirms sitemap.xml is generated.
Do this for Cypress and Playwright.
Update documentation.

✨ Feature request
Status

Fixed

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States kreynen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @kreynen
  • πŸ‡ΊπŸ‡ΈUnited States kreynen

    We are dealing with this now. Went ahead and created an issue since it's listed as "coming up" on the project page.

    We are using https://www.drupal.org/project/xmlsitemap β†’ . To be able to render sitemaps on every site instance on Pantheon with the correct base URL, we've added this to our settings.php...

      if (isset($_ENV['DRUSH_OPTIONS_URI'])) {
        $settings['xmlsitemap_base_url'] = 'https://' . $_ENV['DRUSH_OPTIONS_URI'];
      }
    

    That works when dev, test and live have domains configurated as well as for multidev instances without a DNS entry, but still requires something to trigger the sitemap to be rebuilt.

    It seems wrong to use a test to do that. Looking at potentially using https://docs.pantheon.io/guides/quicksilver/install-script to call a PHP script to facilitate the rebuild, but it seems unlikely that would get added to the XML sitemap project. We could write our own script to do a rebuild after any clone workflow step on Pantheon, but I figured I'd ask what the plan was to be able to test the sitemap.xml here.

    Is the plan for keep ATK contrib agnostic? I'm not sure if https://www.drupal.org/project/simple_sitemap β†’ has this base URL issue or not, but it doesn't have the features we want.

    Would the sitemap rebuild script for the xmlsitemap module be something that could be added to https://git.drupalcode.org/project/automated_testing_kit/-/blob/1.0.x/mo... ?

  • πŸ‡ΊπŸ‡ΈUnited States aangel

    Great questions.

    The idea for the sitemap I had was to include a test that validates every link that the sitemap includes. However, I see that there should be two groups of tests and people then choose what they want to use.

    1. Are the sitemap file(s) present? There are variations to this.
    2. Run through each sitemap.xml file and test every URL. Don't validate URLs that match a pattern.

    I imagine that most people will just want #1 but I could be wrong.

    So far I haven't been afraid to have a contrib module as a dependency because, I figured, if it doesn't work for people they can make the changes they need for their installation once they copy the tests. Not everyone will use Webform from which I get the Contact Us form but it's a good model for people to see how to use Ethereal.email.

    For sitemaps it seems that this should be fairly agnostic but perhaps once I get into it that won't be true. Still, I'd like these sitemap tests to be agnostic since it's testing a result that is a web standard.

    As for triggering building the sitemap, I think that's up to the tester. If they are testing against production, I could see them checking if the sitemap is present and has been built recently. On other targets, perhaps deleting then building the sitemap to ensure it appears is best. Mix and max as the tester desires.

    There is plenty of room to add a function in ATK that executes a drush command that deletes then creates the sitemap. I do that sort of thing testing the creation of users: start with a delete (may fail if the user doesn't exist), create user then delete user that was just created.

    I haven't picked the test yet for next week so this looks like a good one to take on next.

    1a. Do sitemap file(s) exist? Test for first file and report back how many more were found.
    1b. Add argument for "rebuilt within x hours" (confirms cron is running and sitemaps are included as a task)
    1c. Create function that cleans up sitemap files and performs a rebuild.

    Get the above sorted and handle tests for #2 in the following week.

    What do you think?

  • πŸ‡ΊπŸ‡ΈUnited States kreynen

    The /sitemap.xml returning a 200 that is valid XML is obviously important, but valid XML with URLs that aren't valid because of a bad base URL or permissions is something of a false positive.

    2. Run through each sitemap.xml file and test every URL. Don't validate URLs that match a pattern.

    I'm thinking more of a 1Β½... Is the sitemap.xml accurate after events like a page is published, a page is unpublished or the anonymous role's permissions for a content type is changed?

    Because Google doesn't like to get 404 and 403 from URLs included in the sitemap and you've already added πŸ“Œ Add 404 test Fixed and πŸ“Œ Add 403 test. Fixed , it would make sense to me to confirm that these URLs have been removed from the sitemap.xml in addition to the HTTP response check.

    That wouldn't get into which contrib module generates the sitemap.xml, but would verify that the XML is updated as part of the publishing and permission changes.

    Does that make sense?

  • πŸ‡ΊπŸ‡ΈUnited States aangel

    "I'm thinking more of a 1Β½... Is the sitemap.xml accurate after events like a page is published, a page is unpublished or the anonymous role's permissions for a content type is changed?"

    That's even better. So:

    1a. Do sitemap file(s) exist? Test for first file and report back how many more were found.
    1b. Add argument for "rebuilt within x hours" (confirms cron is running and sitemaps are included as a task)
    1c. Create function that cleans up sitemap files and performs a rebuild.
    1d. Change the status/permissions of a page and confirm it appears and disappears from the sitemap.

    I can see that 1d might have tons of variations possible, such as multi-lingual and making sure node (pages) of different types are included in the sitemap. However, this sort of structure can then be expanded as people see fit.

    Y'all are using Cypress, right? Have you got any code yet to start us off?

  • πŸ‡ΊπŸ‡ΈUnited States kreynen

    We have nothing running for role-based feature testing for D10.

    With D7, we ran Behat with Travis.ci at https://github.com/cuboulder/express. We also hosted that at https://www.drupal.org/project/express β†’ , but didn't do any testing with Jenkins/Drupal CI.

    We used Cypress to do testing of front-end projects like https://github.com/CUCentralAdvancement/giving-frontend.

    We've added some basic testing to the contrib projects we maintain like https://www.drupal.org/project/webform_pardot β†’ , but haven't shifted those to GitLab CI yet.

    We also recently shifted from an on-prem GitLab with limited CI minutes/integration options to GitLab SaaS with nearly unlimited minutes. While we were waiting for that to happen, we shifted our workflow to leverage Pantheon's Autopilot for VRT before using a Pull Mirror to pull those changes back into GitLab for additional testing. We can also create Pingdom Advanced Transaction tests that can be run by Solar Winds Web Performance Monitor, but those have to be paused any time we are updating a DOM element used in the test. The test cannot be updated with the code.

    Everything we're doing with Playwright is green field.

  • πŸ‡ΊπŸ‡ΈUnited States aangel

    I thought this would be easy:

    1b. Add argument for "rebuilt within x hours" (confirms cron is running and sitemaps are included as a task)

    but the date I remembered being in the spec is for individual URLs. There doesn't seem to be a reliable way to tell when the sitemap as a whole was last generated unless one has access to the filesystem; bumping this test to the end.

    However, 1a. is working.

    As for 1c., because the xmlsitemap module intercepts the sitemap url and diverts it to \Drupal\xmlsitemap\Controller\XmlSitemapController::renderSitemapXml, it's not clear to me how to confirm the sitemap files have been regenerated without also having access to the filesystem. The files are stored in the files area not in the site home and thus I could compare filesystem dates. However, at this point, I don't want to tie this test so closely to XMLSitemap.

    1d. should still be possible but I'll have it work only with single file sitemaps for now.

    In any case, you can check out ATK-PW-1070 here:
    https://git.drupalcode.org/issue/automated_testing_kit-3375630/-/blob/1....

  • πŸ‡ΊπŸ‡ΈUnited States aangel
  • πŸ‡ΊπŸ‡ΈUnited States aangel

    I've created (ATK-PW-1071) Regenerate sitemap files.

    1. Find Site ID of default sitemap (change for your installation).
    2. Delete existing files on the filesystem.
    3. Use drush xmlsitemap:regenerate to create new files.
    4. Validate new files.

    I'm going to write ATK-CY-1070 and ATK-CY-1071 and issue the next beta.

    Let me know if ATK-PW-1071 is what you were thinking of (it's in this issue fork).

  • πŸ‡ΊπŸ‡ΈUnited States aangel

    Just wrote the Cypress versions and updated the documentation.

  • πŸ‡ΊπŸ‡ΈUnited States aangel

    I've renamed this to apply specifically to XML Sitemap β†’ since the regenerate function is specific to XML Sitemap.

    I'm going to say this is done for now and will open a new ticket for future work or tests for Simple XML Sitemap β†’ .

    Current tests are in Beta 6.

  • Status changed to Fixed over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States aangel
  • πŸ‡ΊπŸ‡ΈUnited States aangel
  • πŸ‡ΊπŸ‡ΈUnited States aangel
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024