add long table name abbreviation to Database API

Created on 7 November 2024, 5 months ago

Problem/Motivation

The entity field storage system handles table names that are too long by abbreviating the table name with a hash.

The same functionality is needed in other places in core and contrib:

- ๐Ÿ› migrate mapping & messages table names are truncated, can lead to incorrect mapping lookups Needs work
- ๐Ÿ› occurrences table name can be too long Active

Steps to reproduce

Proposed resolution

Add functionality to Drupal\Core\Database\Schema to create a potentially too-long table. Modules which use a dynamic name for their tables should call this to ensure they don't crash the DB.

e.g.:

$schema->createAbbreviatedTable(
  // The full name of the table you want to create. Could be too long!
  $actual_table_name,
  // The first part of this name that you definitely don't want to be hashed.
  // This can't be over a certain length.
  $fixed_prefix.
);

We'd also need to add a method to get the abbreviated table name for use in queries.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

โœจ Feature request
Status

Active

Version

11.0 ๐Ÿ”ฅ

Component

database system

Created by

๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @joachim
  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น
  • Pipeline finished with Failed
    5 months ago
    #336236
  • Pipeline finished with Failed
    5 months ago
    Total: 2008s
    #336243
  • I have added both the mentioned functions : createAbbreviatedTable and getAbbreviatedTableName.

    The table names which will be created will be named as : @fixed_prefix . $hashed_suffix, and it will look like : test_abbrev_55ec7ea143ca25227bd0a4eff922798c8f7693d8e59 (which will be less than 64 characters).

    The createAbbreviatedTable takes 3 inputs :
    1. TableName - actual table name
    2. Fixed Prefix - to be attached in the starting of the abbreviated table name.
    3. Schema (array) - as the function uses the createTable function which requires two parameters : tableName and the schema.

    Please check it.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    Looks good!

    I've made some comments on the MR.

    Needs tests too.

  • Hey joachim โ†’ ,

    The comments you have made on the MR have been resolved now.
    Please look into it.

  • Pipeline finished with Failed
    5 months ago
    Total: 2813s
    #337066
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    For the test coverage

  • Status changed to Needs work about 1 month ago
  • Hey, I just wanted to clarifyโ€”should we add a new test case here to cover this scenario, or is it enough to simply resolve the pipeline errors without adding a new test case?

    Please let me know how you'd like to proceed so I can move forward accordingly.
    Thanks!

  • Pipeline finished with Success
    about 1 month ago
    Total: 566s
    #435385
  • Hi,

    As part of the fix for handling long table names in the Database API, I see two possible approaches:

    1. Modify createTable() to check the table name length and, if necessary, prompt users to provide a prefix before automatically calling createAbbreviatedTable().

    • Pros: Ensures all table creations automatically handle long names, reducing the chance of errors.
    • Cons: Alters core behavior, which may introduce unexpected changes for existing modules.

    2. Keep createTable() unchanged ( as requested above and the current approach ) and let module developers explicitly use createAbbreviatedTable() when needed.

    • Pros: Maintains existing behavior and gives developers full control.
    • Cons: Modules that donโ€™t explicitly use the new method might still run into long table name issues.

    Which approach would be preferable? Should we enforce automatic handling within createTable(), or should it remain an opt-in feature for developers?

    Looking forward to your thoughts. Thanks!

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    > 1. Modify createTable() to check the table name length and, if necessary, prompt users to provide a prefix before automatically calling createAbbreviatedTable().

    This is an API. We can't prompt users. And in any case, users shouldn't be exposed to internal details like database table names.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    The big problem for me is what are we going to do with, for instance, tables from entity field storage system that have long tables names that have been abbreviating with a hash? Do we still want to support that or do you want to rename them? To me that is the big problem. If you have a better idea on how to solve this, then please speak up.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    The entity field storage system currently hashes its own table names, right?

    In which case, can't it just carry on doing that if it wants to? Using this new API is optional.

    A follow-up could take care of renaming those tables and switching the entity field storage system to using the API.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    In which case, can't it just carry on doing that if it wants to? Using this new API is optional.

    We shall need to be very careful that we do not break any core/contrib/custom code.

    A follow-up could take care of renaming those tables and switching the entity field storage system to using the API.

    That will be a very big update for site owners. Are you sure that this improvement is worth the update for site owners?

    We have similar issues like: ๐Ÿ› migrate mapping & messages table names are truncated, can lead to incorrect mapping lookups Needs work and ๐Ÿ› Identifiers longer than 63 characters are truncated, causing Views to break on Postgres Needs work .

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    > We shall need to be very careful that we do not break any core/contrib/custom code.

    The MR isn't changing any existing behaviour:

    - call createTable() -- same behaviour as before. If the table name you supply is too long, then Bad Things happen!
    - createAbbreviatedTable() -- new API, ensures that your table name is not too long.

    For follow-ons, the rule is:

    - were you passing table names to createTable() that MIGHT sometimes be too long? When the table name was too long, then your code was crashing anyway. Change to calling createAbbreviatedTable() to prevent crashing
    - were you already passing hashed table names to createTable()? If so, you need to either:
    - a. Keep doing that, no change
    - b. Change to calling createAbbreviatedTable(), and rename your tables in an update function. Proceed with care! This can only really be done if your table names are internal. If contrib or custom code might be referring to your tables, there's a problem! We need a way to deprecate table names!

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    I am ok with adding the functionality for abbreviated table/alias names and the new methods createAbbreviatedTable() and getAbbreviatedAlias(). The last method shall be used by the views module.

  • I'll surely look on the changes suggested above and update the MR.
    Thanks!

  • Hi,

    Thank you for the feedback on the MR.

    I've updated the patch per the review feedback:

    1. Replaced the constant with a protected property for the maximum table name length.
    2. Consolidated the abbreviation logic into a single helper function, which is now used in both createAbbreviatedTable() and the new getAbbreviatedAlias() method.
    3. Updated PHPDoc comments to clarify that table/alias names are only abbreviated when they exceed the allowed length.

    I have tested these changes locally using a custom module, and everything works as expected. Please let me know if there are any further changes to be made here.
    Ready for further review. Thanks !

  • Pipeline finished with Success
    about 1 month ago
    Total: 425s
    #440416
  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    @daffie your suggested change is a big change to how this was envisaged to work:

       * When a table name or alias combined with the database prefix is greater
       * than the maximum from the class variable $maxTableNameLength, the name
       * will be abbreviated. The abbreviation will remove the middle part of the
       * name and replace it with a 10 character hash. The first and last parts of
       * the name will be preserved.
    

    What counts as the 'middle' part? What if the first and last part don't guarantee uniqueness? I think it's best to leave it up to callers to know which part of their table name must be kept for uniqueness.

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    To make the change as useful as possible for the views module, the last part of the original name/alias is very important. For others it is the first part. Therefore my suggestion is to replace the 'middle' part. As little as possible and replace it with a short 10 character hash. In this way we can preserve as much as possible from the original name/alias.
    The problem with the views module is that we have a lot of aliases that are being abbreviated in the way you want and that results in aliases like:
    - QWERTYUIOPASDFGHJKLZXCVBNM1234567890qwertyuiopasdfgh__DFSGHeabdfag
    - QWERTYUIOPASDFGHJKLZXCVBNM1234567890qwertyuiopasdfgh__bdfhbEWRGEA

    I have 2 aliases and that are different and know idea what the originals are. Yes, they are from 2 different joins, only which is which. The first part is important and the last part is important. That is why I would like to remove the "middle" part.

    We can ask a framework manager/committer for some input/ideas?

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    > To make the change as useful as possible for the views module, the last part of the original name/alias is very important. For others it is the first part. Therefore my suggestion is to replace the 'middle' part

    That's fair enough. But what's the algorithm for deciding what counts as the 'middle'?

  • ๐Ÿ‡ณ๐Ÿ‡ฑNetherlands daffie

    But what's the algorithm for deciding what counts as the 'middle'?

    Take the maximum allowed length and subtract the strlen($this->getPrefix()).
    From that subtract 14 characters for the hash value and 2 underscore characters before and after the hash value.
    Now you got a number characters left. Half is for the first part of the table/alias name and the rest is for the last part of the table/alias name.

    For instance: QWERTYUIOPASDFGHJKLZXCVBNM1234567890qwertyuiopasdfghjklzxcvbnm
    Becomes something like: QWERTYUIOPASDFGHJK__10charhash__iopasdfghjklzxcvbnm

    I have also made in the PR a suggestion for the method for how to do it..

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom joachim

    > Now you got a number characters left. Half is for the first part of the table/alias name and the rest is for the last part of the table/alias name.

    That might cut off words in the middle, which is going to make some ugly table names.

    I think it's better if callers can specify which bits of the table name they want to preserve -- it's fine if that's bits at the start AND at the end.

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    I haven't gone through the details, but just wanted to say I'd be +1 on making the shortened table name the result of a deterministic algo only based on the length of the full name (including prefix). That would be what I'd be pursuing in ๐Ÿ“Œ [WIP] Decouple identifier management from database Connection, introduce an IdentifierHandler class Postponed .

  • ๐Ÿ‡ฎ๐Ÿ‡นItaly mondrake ๐Ÿ‡ฎ๐Ÿ‡น

    ๐Ÿ“Œ [WIP] Decouple identifier management from database Connection, introduce an IdentifierHandler class Postponed now has a MR in NR that addresses this (for the time being for table identifiers) on the Database API level.

Production build 0.71.5 2024