Duplicate entry error for paths containing trailing spaces in redirect_404 table

Comment over 2 years ago →
🇺🇸United States loopy1492
Similar error on both sql-sync and fin db import on our site.

> ERROR 1062 (23000) at line 25846: Duplicate entry '/consumer/ca_ren.htm-en' for key 'redirect_404.PRIMARY'

I ran sql-sync with --extra-dump='--no-tablespaces --insert-ignore' but got:
> ERROR 1227 (42000) at line 18: Access denied; you need (at least one of) the SUPER, SYSTEM_VARIABLES_ADMIN or SESSION_VARIABLES_ADMIN privilege(s) for this operation

Clearly I do not have permissions for on our shared hosting environment.

At any rate, I went into admin/config/search/redirect/404 and actually searched for this redirect. I was shocked/not shocked to see that there actually were two of these in there.

Apparently, this module uses the url as the unique key for this field instead of a unique ID. That probably should be changed.
Comment over 2 years ago →
🇺🇸United States loopy1492
So, it seems that telling the module to do nothing with one of these urls does not delete it from the table. Instead, it marks it as resolved. Interestingly, when you mark one as resolved, that stops both it and the duplicate from displaying on the page. You cannot resolve the other one. It wouldn't matter anyways because both are still in the database table. Time to break out the big guns:

fin drush @nydfs.dev sql-query "SELECT * FROM redirect_404 WHERE path LIKE '%consumer\/ca_ren%'"

The response:

/consumer/ca_ren.htm en 1 1675924164 0 0
/consumer/ca_ren.htm en 1 1675924164 1 0

Okay!

fin drush @nydfs.dev sql-query "DELETE FROM redirect_404 WHERE path LIKE '%consumer\/ccra_registration.htm%

Of course, when I do that, ANOTHER site comes up as a duplicate. What's a boy to do? Oh right....

fin drush @nydfs.dev sql-query "DELETE FROM redirect_404" && echo "GET REKT!"
Comment over 2 years ago →
🇺🇸United States joegl
Apparently, this module uses the url as the unique key for this field instead of a unique ID. That probably should be changed; either that or some kind of check needs to be performed that denies the user the creation of a duplicate entry.

I am guessing the issue has to do with collation of a database and how it stores unique characters. Because if the unique key is the URL, then duplicate entries by definition shouldn't exist. The only reason they exist is because they're not duplicates, and one is probably using a special character of some kind. The problem is when the database gets dumped and imported, somewhere along that process the unique character which makes two entries separate gets stripped or modified and thus the entries now match and are considered duplicates.

In my case, it was because the difference in MySQL versions between the production site and my local where I was importing and how certain characters were treated between those versions.
Status changed to Postponed: needs info over 2 years ago10:17am 4 March 2023
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
I don't see what this module could change to fix that, as #16 explains, this is a collation problem and different databases with different settings.

If you get this on local environments for development, it's perfectly fine to just ignore this table unless you for some reason want to work specifically on this. On staging/test environments, you likely want to ensure that you use the same collation.
Status changed to Active over 2 years ago6:39pm 10 March 2023
Comment over 2 years ago →
🇨🇦Canada joelpittet Vancouver
@Berdir, we are running into this, the path is used as an index, and MySQL 8 changes what can be in an index and seems to normalize the values where MySQL 5.7 didn't care that there was some weird character some hacker tried to put in the path should also be an index.

Maybe we can either remove the index/primary key on path, or apply a filter to remove characters that can't be real paths as per some RFC (didn't google yet)
Comment over 2 years ago →
🇨🇦Canada joelpittet Vancouver
Some examples of paths where this MySQL 8 key collision occurred

/programs/vancouver） /programs/vancouver) /∼user/demos/mdp/vi.html /~user/demos/mdp/vi.html /user/demos/mdp/vi.html /user/demos/mdp/vi.html
Can you spot the funny characters?
Comment over 2 years ago →
🇨🇭Switzerland berdir Switzerland
> that can't be real paths as per some RFC (didn't google yet)

Well, they can be in the path or the server wouldn't be able to receive and log them.

But yeah, fair enough, I'm open to trimming and otherwise removing characters, but not sure how easy that part is, there's plenty of international characters that might actually be used in an URL.
Comment over 2 years ago →
🇨🇦Canada joelpittet Vancouver
@Berdir, how about URL encoding them?

There may be some that are worth stripping, I'll look around and see if there is a spec to reference.
Comment over 2 years ago →
🇨🇦Canada joelpittet Vancouver
This is a really good answer, but still trying to grok how to implement it. https://stackoverflow.com/a/36667242/80281 Basically what you said written as RFC 3987 is out-of-date from reality.
Comment over 2 years ago →
🇨🇦Canada joelpittet Vancouver
We should have tests to show a sampling of characters that should work. And maybe if they are valid characters for URLs we ensure they save properly in MySQL 8.0?
Like maybe we don't index the path, which would be a performance issue I imagine.

There is probably a nice middle ground here... but lets start with test coverage to set the expectations.
Comment about 2 years ago →
🇨🇦Canada sseto
I'm also getting this from trying to dump from 5.7 to 8.0. Is it safe to just delete the data from the table `redirect_404` and mysqldump that way?
Comment almost 2 years ago →
🇺🇸United States loopy1492
Just got it on another site. Need to copy the db to a different environment upstream, and wipe the redirect_404 table before I can download it and use it locally.
Comment over 1 year ago →
🇺🇦Ukraine Taran2L Lviv
URL encoding should be the way here, as @joelpittet suggested
Comment over 1 year ago →
🇺🇸United States scott_earnest
We have been copying our database to a lower environment, then running a database update, then sql-sync to local.

This command will clear the problem records:
drush @stage sql-query "DELETE FROM redirect_404 WHERE path LIKE '%my-duplicate-key-redirect%'"

Or we have dropped the PRIMARY KEY, this also works:
drush @stage sql-query 'ALTER TABLE redirect_404 DROP PRIMARY KEY'

This is our current workaround.
Comment over 1 year ago →
🇺🇸United States tedfordgif
Comment over 1 year ago →
🇺🇸United States tedfordgif
Adding core issue for reference on handling case sensitivity, but with only one mention of "collation".
Comment over 1 year ago →
🇺🇸United States tedfordgif
Another approach might be to simply diverge from core's handling of URLs, and make redirect_404 everything-sensitive, e.g. throw 'binary' on the 'path' field in the redirect_404 schema. That would be unexpected for users/devs, and is probably not correct, but would have the same effect as the URL-encoding suggestion (absent normalization).

This is not the right ticket to discuss it, but another place we end up with inconsistencies is that public files don't get served by Drupal, but the 404 page for the same does. Should core have a redirect if the file_managed uri matches the path modulo collation? That would also mean fewer redirect_404 records would get created.

The issue with allowing the database collation to determine matches is that we may want different behavior in different circumstances. For example, if a single character in /some/long-URL-slüg has a diacritic mark like that, we probably want to serve the same page, especially if the canonical path has the diacritic and the request path does not. However, in a short url slug, such as you might see on /a/view-page-with/tags/slüg, some languages probably have word collisions we wouldn't want. Maybe the correct approach is to always have unambiguous short slugs (add context to the "machine name", e.g. garden_slug).
Comment over 1 year ago →
🇺🇸United States loopy1492
Still busted.

🇦🇺Australia sonnykt Melbourne, Australia

The issue is still occurring with MySQL 8:

MySQL [v8032]> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.32    |
+-----------+
1 row in set (0.000 sec)

MySQL [v8032]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_0900_ai_ci   |
+--------------------------+----------------------+
1 row in set (0.001 sec)

MySQL [v8032]> SHOW CREATE TABLE redirect_404\G
*************************** 1. row ***************************
       Table: redirect_404
Create Table: CREATE TABLE `redirect_404` (
  `path` varchar(191) NOT NULL COMMENT 'The path of the request.',
  `langcode` varchar(12) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'und' COMMENT 'The language of this request.',
  `count` int unsigned NOT NULL DEFAULT '0' COMMENT 'The number of requests with that path and language.',
  `timestamp` int unsigned NOT NULL DEFAULT '0' COMMENT 'The timestamp of the last request with that path and language.',
  `resolved` int NOT NULL DEFAULT '0' COMMENT 'Boolean indicating whether or not this path has a redirect assigned.',
  `daily_count` int unsigned NOT NULL DEFAULT '0' COMMENT 'The number of requests with that path and language in a day.',
  PRIMARY KEY (`path`,`langcode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores 404 requests.'
1 row in set (0.001 sec)

#*** one trailing space 0x20
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing ','en',0,0,0,0);
Query OK, 1 row affected (0.004 sec)

#** two trailing spaces 0x20
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing  ','en',0,0,0,0);
Query OK, 1 row affected (0.002 sec)

#** one trailing nbsp 0xC2A0
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing ','en',0,0,0,0);
ERROR 1062 (23000): Duplicate entry '/test-trailing -en' for key 'redirect_404.PRIMARY'

MySQL [v8032]> SELECT "a " = "a  ";
+--------------+
| "a " = "a  " |
+--------------+
|            1 |
+--------------+
1 row in set (0.001 sec)

MySQL [v8032]> SELECT "a " = "a ";
+--------------+
| "a " = "a "  |
+--------------+
|            0 |
+--------------+
1 row in set (0.001 sec)

The special character is 0xC2A0 (UTF-8 non-breaking space).

Comment 11 months ago →
🇵🇹Portugal joao.ramos.costa
Faced the same issue with mysql 8.

ERROR 1062 (23000) at line 11049: Duplicate entry '/mypage page/example/1.pdf-en' for key 'redirect_404.PRIMARY'
Comment 10 months ago →
🇺🇸United States loopy1492
With Acquia moving over to mysql 8 soon, folks on that platform may want to petition Acquia to contribute to this module.
Comment 9 months ago →
🇺🇸United States loopy1492
still busted
Merge request !1263201335: Normalize redirect_404 paths before attempting to insert them in the database. → (Open) created by loopy1492
Pipeline finished with Success
9 months ago
Total: 179s
#354216
Pipeline finished with Success
9 months ago
Total: 204s
#354336
Comment 9 months ago →
🇺🇸United States loopy1492
While I've been developing a solution, I can confirm @joao.ramos.costa that we're getting this issue still with upstream MySQL 8. For some reason, locally with Docksal running mysql 8.0.28, it thinks the full-width kana characters are the same as the medium width ones.
Pipeline finished with Success
9 months ago
Total: 180s
#354409
Pipeline finished with Success
9 months ago
Total: 227s
#354479
Pipeline finished with Success
9 months ago
Total: 202s
#354502
Pipeline finished with Success
9 months ago
Total: 182s
#354548
Comment 9 months ago →
🇺🇸United States loopy1492
I started by changing how the URLs were stored, but then I realized we'd have to change how they were accessed as well. Different characters that happen to look the same are still different after all. I didn't want to risk the possibility that there were intentional uses of these characters.

I went back to my original thought that we simply need a proper unique key. The merge request/patch creates an update that creates a new id field, assigns an id value to all the existing 404 entries, then sets that ID as the primary key, then sets the value as auto-incrementing.
Comment 9 months ago →
🇺🇸United States loopy1492
Testing steps:

Deploy the previous tag to dev

Copy the database and files from prod to dev

Fire up a local site. You will might get a database sync error if your local build includes a sql-sync. That is fine.

Now, on the upstream DEV SITE, log into the remote site

From the admin screen, paste these urls into the address bar and visit them one at a time:

/example/path/
/example/path
/example%2Fpath
/example%20path/
/example%20path
/example path
/example/path）
/example/path)
/example）/path
/example)/path
/∼example/path
/~example/path

Note that the two parentheses are different… U+FF09 and U+0029

Note that the two tildes are different as well… U+223C and U+007E

Check the /admin/config/search/redirect/404 page. There should be a bunch of new records in there.

Deploy the branch with the patch to the dev server

Run drush cr && drush updb -y on the dev server

Run drush sql-query "DESCRIBE redirect_404;", then…

Run drush sql-query "select * from redirect_404;"

Make note of the count (the third field).

Hit ./~example/path followed by running drush cr several times. We have to do this because cache protects the resource from incrementing after the first hit.

Run fin drush sql-query "select * from redirect_404;" and check to make sure that path has incremented.

Now run drush sql-sync @thesite.dev @self -y

You should not get a database sync error – it should successfully import the database.
Comment 9 months ago →
🇺🇸United States loopy1492
Comment 9 months ago →
🇵🇹Portugal joao.ramos.costa
Hi @loopy1492, I will soon test your steps, although I think we should add them as kernel/unit tests. As for the table auto-incremental unique id, here's my two cents: Seems indeed like the most wise choice; otherwise, we might get lost in the infinite creativity of character combinations.
Status changed to Needs review 3 months ago1:01pm 7 May 2025
Comment 3 months ago →
🇧🇪Belgium Grayle
Besides any possible performance issues with joins which should be investigated, and writing an update hook to update the existing column, adding `binary` to the hook schema should "fix it" no?

https://www.drupal.org/node/1237252 →

https://git.drupalcode.org/project/redirect/-/blob/8.x-1.x/modules/redir...

That will result in binary collation. So if the database default charset is utf8 the collation would essentially be utf8_bin. If the charset is utf8mb4 the collation would then be utf8mb4_bin, etc.

Duplicate entry error for paths containing trailing spaces in redirect_404 table

Problem/Motivation

Proposed resolution

Merge Requests

!126Duplicate entry error for paths containing trailing spaces in redirect_404 table
Open

Comments & Activities

Duplicate entry error for paths containing trailing spaces in redirect_404 table

Problem/Motivation

Proposed resolution

Merge Requests

!126Duplicate entry error for paths containing trailing spaces in redirect_404 tableOpen

Comments & Activities

!126Duplicate entry error for paths containing trailing spaces in redirect_404 table
Open