- 🇨🇦Canada Shiraz Dindar Sooke, BC
Thanks for this patch Graber!
This works great for me for chunking translations that are over the character limit.
I did have to re-roll it to work on the latest dev release.
I also took out the "translate only text" part of this patch -- it was throwing errors, and in any case, I was just interested in the chunking.
So attached is the patch rerolled by myself, which only does chunking.
Anyone reading this should also check out my other patch 🐛 increase max character limit to 50000 Fixed that increases the character limit to 50,000 (from 5000), as that alone may take care of your needs. In my case, we have translations over 50,000 characters so chunking is needed.
- 🇨🇦Canada Shiraz Dindar Sooke, BC
Please note that, even with chunking in place, I have run into certain translations that fail with "429 - too many requests" from the API. This is described at https://towardsdatascience.com/advanced-guide-avoiding-max-character-lim....
I've tried sleeping between the chunk translation submissions but even at 2 second sleeps it wasn't enough. I presume because the limit is per minute. (it's not super well documented. The above link is the bet I could find)
To be sure, this is separate from the character limitation.
I'm not sure if the account tier makes a difference here. I'll update this task as I find out more.
- 🇨🇦Canada Shiraz Dindar Sooke, BC
1. The 429 too many requests error I mentioned above is no longer occurring. I think it was just a temporary thing, not a real issue.
2. I found that some of the nodes I was submitting for chunked translations were failing because the previous patch would throw an exception if there was a single sentence over the character limit (because that would make for an untranslatable chunk). In fact the over-limit sentences in question were base64-embedded images in the text of the field I was translating. So I've updated the patch to not fail on these, but instead not submit over-limit "sentences" for translation, but still include them in place. This way base64 images are still included in the translated node and there are no fails.
3. Further to #2, the regex that was being used to split text into sentences was failing on text which have base64-encoded images (ie. was not splitting these correctly). I played around with several tweaks on the regex but couldn't get it to work satisfactorily. So instead I found a php library on github which is designed specifically for splitting text into sentences. However, there were just a couple extra things it was doing which caused their own issues, so I forked that repo with the changes needed to make it work. SO, for anyone that happens to be reading this (I suspect in the future this *will* be needed), to get this patch to work, you will also need to add these lines to your project's composer.json:
In the repositories section:
{ "type": "git", "url": "https://github.com/kanopi/php-sentence" }
In the require section:
"vanderlee/php-sentence": "dev-do-not-clean-unicode-and-do-not-replace-floats",
Hoping this helps someone out!
- heddn Nicaragua
Additions of new external dependencies are going to be harder then optional dependencies. Could this be re-worked into something that does a check if the php sentance codebase is available, use it?