πŸ‡ΊπŸ‡ΈUnited States @nnewton

Account created on 27 November 2006, over 17 years ago
#

Recent comments

πŸ‡ΊπŸ‡ΈUnited States nnewton

This actually maybe more possible soon, it is on the list for near-term changes.

In the mean time, I have hit this before with my ip6 only deployments in AWS as quite a few sites are ip4 only. I am not sure it will work for you, but I'd look into enabling DNS64 for the subnet for now: https://aws.amazon.com/blogs/aws/let-your-ipv6-only-workloads-connect-to...

-N

πŸ‡ΊπŸ‡ΈUnited States nnewton

The defaults discussion was due to someone suggesting that distros were changing the default, which they are not.

Obviously we change numerous default settings in drupal-infra. As I mentioned in my previous comment, this setting is very difficult to change in a manageable/secure way on an EKS cluster in our config management and we won't be doing so currently. We are working desperately to reduce maintenance overhead and this would increase it for no clear advantage (if people start using external images in mass enough that 8 would be co-scheduled on a node, we can address that then).

If this change is not merged what we will do at the moment is limit per-node concurrency, not change the setting. This is why I suggested the change, because it would stabilize the runs and not require per-node concurrency limits. Changing this setting is not currently an option. We maybe able to re-address it in the future.

πŸ‡ΊπŸ‡ΈUnited States nnewton

Which distros have this set to not 65536? Debian/RHEL/AL2 all seem to have this set to the default of 65536. Either way, our (and everyone elses) EKS/AL2 based clusters will have this set to 65536. Modifying this would require a custom launch template or marking this sysctl as unsafe but allowed at the kubelet level. I would advise this be changed at the container level as that is a far cleaner solution and would resolve this portably between clusters.

πŸ‡ΊπŸ‡ΈUnited States nnewton

We are starting to hit this on core gitlabci as we are trying to consolidate runs on nodes. I would suggest we globally disable AIO for these containers (mysql/mariadb). There are solutions on the node side, but they are ugly and won't be portable between testing environments.

On our larger nodes we can reproduce this fairly consistently while watching aio-nr.

1 Job - 1 Node

root@runner-s4yvuuu9g-project-78834-concurrent-0-hpbibdd6:/var/www/html# sysctl -a 2> /dev/null | grep fs.aio
fs.aio-max-nr = 65536
fs.aio-nr = 8805

4 Jobs - 1 Node

root@runner-s4yvuuu9g-project-78834-concurrent-3-qg6kwxrj:/var/www/html# sysctl -a 2> /dev/null  | grep aio
fs.aio-max-nr = 65536
fs.aio-nr = 35220

And if we push 8 jobs to double that, the 8th will fail with:

[ERROR] InnoDB: io_setup() failed with EAGAIN after 5 attempts.
[service:drupalci/mysql-5.7-database] 2024-04-17T21:58:10.746455040Z 2024-04-17T21:58:10.746096Z 0 [Note] InnoDB: You can disable Linux Native AIO by setting innodb_use_native_aio = 0 in my.cnf
[service:drupalci/mysql-5.7-database] 2024-04-17T21:58:10.746456647Z 2024-04-17T21:58:10.746156Z 0 [ERROR] InnoDB: Cannot initialize AIO sub-system
πŸ‡ΊπŸ‡ΈUnited States nnewton

This is now possible and is done. Marking fixed.

Production build 0.69.0 2024