Translations are skipped when importing

Created on 12 April 2020, over 4 years ago
Updated 15 March 2023, almost 2 years ago

Problem/Motivation

I'm currently testing translation support in Drupal 8.9.0. I'm using Norwegian Bokmål which is 100% translated for Drupal core. I have experienced that several strings are missing after importing. (This issue is not the same as 🐛 No translation updates for current version of project Needs work .)

We are importing translations in batches of 200. One translation is skipped per batch - roughly 50 strings if there are around 10000 translations in total. The reason is that PoStreamReader::readItem calls readLine (repeatedly) to get the next PO item. To be sure that the item is complete, you need to read one extra line (containing the msg ID of the next string). This isn't a problem inside one operation - the PoStreamReader keeps state, but when you start a new operation the state is lost.

Steps to reproduce

Proposed resolution

I have attached a patch the reports the buffer pointer (byte offset) after the last completed PO item - not the last read line, so we don't loose any items.

Importing Norwegian Bokmål without patch:

From log: Translations imported: 9594 added, 0 updated, 0 removed.
From language stats: Norwegian Bokmål 9594/10027 (95.68%)

Importing Norwegian Bokmål with patch:

From log: Translations imported: 9640 added, 0 updated, 0 removed.
From language stats: Norwegian Bokmål 9640/10068 (95.75%)

I think the reason we don't get 100% is the language list. The test above is done using drush si standard --locale=nb

Remaining tasks

User interface changes

None.

API changes

New public method: core/lib/Drupal/Component/Gettext/PoStreamReader::getSeekLastItem()

Data model changes

None.

Release notes snippet

🐛 Bug report
Status

Needs work

Version

10.1

Component
Locale 

Last updated 4 days ago

Created by

🇳🇴Norway hansfn

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

    Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺
  • Issue was unassigned.
  • Status changed to Needs review almost 2 years ago
  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺

    I couldn't simplify the logic unfortunately, but this addresses #38 (and the edge case is added to both tests).

  • Assigned to huzooka
  • Status changed to Needs work almost 2 years ago
  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺

    Sweetchuck suggested that it would be good to ensure that the line breaks at the end of the .po (fixture) files are preserved... (They might be stripped away by a properly configured IDE.) And he is right!

  • Issue was unassigned.
  • Status changed to Needs review almost 2 years ago
  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺

    This patch moves both /core/modules/locale/tests/test.nb.po and /core/tests/fixtures/PoStreamReader/basic.hu.po into Nowdoc strings.

    Side-effect: same patch can be applied on 10.1.x and 9.5.x 🙃

  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺
  • 🇭🇺Hungary huzooka Hungary 🇭🇺🇪🇺
  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States smustgrave

    Tried testing by importing Norwegian Bokmål and got 99.55%

    The issue summary should be updated with the default template, what's the proposed solution? remaining tasks? any api changes?

  • Status changed to Needs review almost 2 years ago
  • 🇺🇸United States dww

    Tried to give this a proper summary.

    I haven't closely reviewed all the new test coverage, so not RTBC'ing, but moving back to NR...

  • 🇺🇸United States smustgrave

    Is this still an issue though? I got up to 99% without the fix when the patch mentions I should get to 95

    I Can retest also if there’s a missing step

  • 🇺🇸United States dww

    Yeah, not totally sure about steps to reproduce manually and how to really see the bug in obvious ways. Back in #11 we had a failing test-only (vs. 8.9.x). Perhaps we should do another test-only here to see if all the tests still fail in 10.1.x without the fix. 😅

    Also, removing credit from a couple of 1-off rerolls that didn't work and were then abandoned by the author. 😢

  • 🇺🇸United States dww

    Yeah, locally, the new Kernel test still fails in 10.1.x:

    % ./vendor/bin/phpunit -c core/phpunit.xml core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    PHPUnit 9.5.26 by Sebastian Bergmann and contributors.
    
    Testing Drupal\Tests\locale\Kernel\LocaleBatchImportTest
    F                                                                   1 / 1 (100%)
    
    Time: 00:01.324, Memory: 10.00 MB
    
    There was 1 failure:
    
    1) Drupal\Tests\locale\Kernel\LocaleBatchImportTest::testImportBatches
    Failed asserting that two arrays are equal.
    --- Expected
    +++ Actual
    @@ @@
         3 => 'Brukergrensesnitt'
         4 => 'Tittel'
         5 => 'Brødtekst'
    -    6 => 'Neste'
    -    7 => 'Små grønne menn'
    -    8 => 'Verten'
    +    6 => 'Små grønne menn'
    +    7 => 'Verten'
     )
    

    The new Unit test also fails, but it's less interesting: 😂

    1) Drupal\Tests\Component\Gettext\PoStreamReaderTest::testReadItemsInMultipleSequence
    Error: Call to undefined method Drupal\Component\Gettext\PoStreamReader::getSeekLastItem()
    

    But if I restore the fix, then intentionally break PoStreamReader::getSeekLastItem(), the unit test fails more interestingly:

    There was 1 failure:
    
    1) Drupal\Tests\Component\Gettext\PoStreamReaderTest::testReadItemsInMultipleSequence
    PO item 3
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'msgid "id-03"
    -msgstr "str-03"
    +'msgid ""
    +msgstr ""
    +"Project-Id-Version: Drupal core (9.2.7)\n"
    +"POT-Creation-Date: "
    +"2021-10-27 10:29+0000\n"
    +"PO-Revision-Date: YYYY-mm-DD "
    +"HH:MM+ZZZZ\n"
    +"Language-Team: Hungarian\n"
    +"MIME-Version: "
    +"1.0\n"
    +"Content-Type: text/plain; "
    +"charset=utf-8\n"
    +"Content-Transfer-Encoding: 8bit\n"
    +"Plural-Forms: "
    +"nplurals=2; plural=(n!=1);\n"
    +""
    
     '
    

    So I'm 99.9% sure this is still a bug, but the steps to manually reproduce it could be better explained. 😅 Tagging for that.

  • 🇺🇸United States dww

    Grrr, I just did a new fairly long dreditor review, which refused to paste. 😢 But maybe it's for the best, since I decided it was easier to just fix everything and upload a new patch/interdiff than to re-do the review. 😂 Hopefully the interdiff mostly speaks for itself.

    While I'm at it, here's a new test-only.

  • 🇺🇸United States dww

    This is the only hunk of the interdiff that might need explanation:

    +++ b/core/tests/Drupal/Tests/Component/Gettext/PoStreamReaderTest.php
    @@ -232,14 +232,7 @@
    -    $item = $reader->readItem();
    -    $this->assertSame(
    -      $expected['items'][5],
    -      (string) $item,
    -      'PO item 5',
    -    );
    -
    -    for ($i = 6; $i < count($expected['items']); $i++) {
    +    for ($i = 5; $i < count($expected['items']); $i++) {
    

    The removed lines are identical to the body of the for() loop. I saw no reason to do item 5 manually, then a loop for 6 and up. So I folded item 5 into the loop.

  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States smustgrave

    Moving to NW for an answer to #54 and steps to reproduce.

    Even with the patch I still only get to 99%

    Going to ping the subsystem maintainer too

Production build 0.71.5 2024