File sanitation not removing unused replacement character and dedup issues

Created on 24 February 2025, 8 months ago

Problem/Motivation

When setting the file sanitation up, the unused replacement character is remaining as a valid character.

eg. if you have a file called `file_name.txt` and have a replacement character '-' the filename will not be changed. I would expect this change the file name to `file-name.txt`

Also if you have something like `_-` or ` _` and deduplication and replacement character of `-` then the previous 2 examples will change that part of the filename to `-` which is kind of inconsistent to in one place treating the the unused replacement character as valid to the replacing in another situation.

Proposed resolution

I see there 2 methods to treat this.
1. Treat the unused characters as invalid chars and convert them to the replacement character which would make it consistent with the deduplication process.
2. Add a new parameter to allow the user to decide if they want to remove the unused replacement characters and then resolve the problem with the deduplication.

I also think that there is a problem with dedup process in that it treats the underscore, hyphens and dot as a single character and will merge them together. eg. `_-.` will be converted into the replacement character.

πŸ› Bug report
Status

Active

Version

10.2 ✨

Component

file system

Created by

πŸ‡¦πŸ‡ΊAustralia gordon Melbourne

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @gordon
  • πŸ‡³πŸ‡ΏNew Zealand quietone

    Changes are made on on 11.x (our main development branch) first, and are then back ported as needed according to the Core change policies β†’ .

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Would be great to have a test that covered this.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    I think this is the expected behaviour.

    1. '_' is a valid separation character and santization only replaces invalid characters.
    2. '_-' gets converted to '-' because we de-duplicate separation characters
  • Status changed to Closed: works as designed 2 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Since there's been no follow up going to close this one out. If still a bug please re-open addressing @kim.pepper comment in #4

Production build 0.71.5 2024