- First commit to issue fork.
- π§π¬Bulgaria pfrenssen Sofia
Created issue fork and rebased patch #9 on the latest 8.x-3.x dev branch.
- @pfrenssen opened merge request.
- π§π¬Bulgaria pfrenssen Sofia
I've spent some time looking at the patches but it isn't entirely clear to me why we are trying to implement a streaming curl handler inside our module. I've been reading through the comments, and I did some digging, but I think we should probably try to approach this in a more simple way.
My findings:
- Most of the questions in #5 have been answered in later comments, but the main motivation for this approach seems to be to work around an error that is reported in #2901060: Generating derivative images behind a proxy fails β .
- In Guzzle's
choose_handler()
function it will skip Curl entirely for streaming requests (as mentioned in #8), the reason for this is explained in Guzzle issue #1938 - Curl downloads the entire file first which is not suitable from streaming purposes. Most of the work in the patch is trying to turn Curl into a streaming capable HTTP client. -
Guzzle by default allocates a CURL handler in choose_handler() however, should the system have 'allow_url_fopen' configured in php.ini it will if the request is a streaming request override that handler and provide a StreamWrapper based on PHP's stream wrapper.
In an unauthenticated tcp proxy environment this would be fine, but when a proxy needs authentication or special handling the PHP Based wrapper isn't up to the task.
This confused me a bit. Guzzle instantiates a
StreamHandler
object here, not aStreamWrapper
, but Guzzle does offer the\GuzzleHttp\Psr7\StreamWrapper
object in the (separate) PSR7 library. I didn't debug it in detail, so I am assuming this is either a mixup, or the StreamWrapper does get instantiated somewhere.In any case, if Guzzle needs to use Curl instead of the
fopen()
basedStreamHandler
when a proxy is used, isn't this something that should be fixed in Guzzle instead of in this module? It hardly seems our responsibility. The same issue would arise in a Laravel or Wordpress project using S3 over an authenticated proxy.I did some searching, and I found a PR that might be relevant: #2850 Support the cURL (http://) scheme for StreamHandler proxies. Possibly the issue is already fixed in the meantime in Guzzle?
I think for the moment it would perhaps be a good idea to limit the scope of this issue closer to the original patch that was provided in comment #2. Let's for now just forward the proxy settings to S3. This will already solve the issue for the majority of use cases involving proxies.
If there are still problems with authenticated proxies, we can handle these in separate tickets, preferably on the Guzzle / aws-sdk-php side so these issues can be solved for the entire PHP ecosystem. After all it is the responsibility of the aws-sdk-php project to handle the proxy connection correctly.
- Status changed to Needs review
over 1 year ago 10:24am 29 March 2023 - π§π¬Bulgaria pfrenssen Sofia
This patch reboots the issue with a limited scope to just forward the proxy settings to the aws-sdk-php library. It is inspired by the original patch #2 but rewritten for the latest 8.x-3.x dev branch.
- πΊπΈUnited States cmlara
Most of the questions in #5 have been answered in later comments, but the main motivation for this approach seems to be to work around an error that is reported in #2901060
Correct, it was a critical break in the streamWrapper, indicating that s3fs couldn't be trusted to provide files to callers, it really meant that the entire fundamental foundation of the streamWrapper was broken.
I didn't debug it in detail, so I am assuming this is either a mixup, or the StreamWrapper does get instantiated somewhere.
I'm not 100% sure, however I think it may be safe to believe this was a mixup given the rest of the comments are also around StreamHandler.
I did some searching, and I found a PR that might be relevant: #2850 Support the cURL (http://) scheme for StreamHandler proxies. Possibly the issue is already fixed in the meantime in Guzzle?
On quick glance this appears to be limited to Guzzle7 and above which can be installed in D9 but may not be compatible on all sites, D10 always has Guzzle7. It is nice to see that is in there however as that looks like it could resolve the issues (for http/https proxies at least) that was seen in the related issue (allowing #2/#15 to work) I'm a bit concerned that this might be moving us back to CURL and have #8-#10 remain an issue? At the very least we may need to have a way to detect this and not bring the proxy in for Guzzle6 if we use this fallback.
After all it is the responsibility of the aws-sdk-php project to handle the proxy connection correctly.
Strictly speaking this might actually lower level Guzzle issue at that point, since the SDK calls out it uses Guzzle by default but allows the caller to override it. The AWS SDK is just as much a victim of Guzzle implementation in that case as we are. It is also possible for the SDK to handle proxies in a successful manner that doesn't necessarily align with our needs.
#8-10 is a part of this, in likely the vast majority of S3Client use cases not being able to stream the download is acceptable, however since we need to directly deliver to the users (as part of FileDownloadController) we can't wait 1 minute for a 5gb file to download before we start sending data to the user otherwise the connection may timeout. This is something to try and improve at lower levels (if it still exists) yet at the same time it also is a 'special situation' that could be considered our responsibility as well since we are the ones who need streaming ability.
The biggest issue though is #9 which represents a DoS vulnerability where an attacker (using FileDownloadController) could conceivable consume a large number of sockets up to the FD limit of the server. I'm not sure if that still exists, or would exist #15 since its not a custom handler (my issues above could of been developer error as noted in #10) since it appears Guzzle would still use the CURL handler I am concerned that it does still exist.
That said if its 'safe' than yes, I may have been too focused at the time of make this work with every proxy to the point that I lost track of the truth that some proxy support is better than no proxy support.
Adding the needs manual testing tag as this will need me to come back with some manual lab review to look at the previous concerns and run the test suite in a firewalled environment.
- π§π¬Bulgaria pfrenssen Sofia
Thanks for the clarification! I did not consider the use case of piping S3 files through FileDownloadController. Also a good point about aws-sdk-php putting the HTTP client ball in the user's park, this does in fact make it our responsibility.
The DoS vulnerability you mention is interesting. Not only interrupted transfers, but an attacker could also open a number of large downloads simultaneously, and read the data very slowly (similar to a slowloris attack). This could exhaust server resources. I did some research and I found two options which could be useful to prevent this:
CURLOPT_LOW_SPEED_LIMIT
: The transfer speed, in bytes per second, that the transfer should be below during the count of CURLOPT_LOW_SPEED_TIME seconds before PHP considers the transfer too slow and aborts.CURLOPT_LOW_SPEED_TIME
: The number of seconds the transfer speed should be below CURLOPT_LOW_SPEED_LIMIT before PHP considers the transfer too slow and aborts.
We could for example specify that if the transfer drops below 1k / second for 10 seconds, that the transfer should be terminated. I'm not sure about which values would be good in all scenarios. We have to consider that users might be on unstable, slow and intermittent mobile connections.
- πΊπΈUnited States cmlara
Indeed a slowloris attack is similar. The real issue with our issue from #9/10 is that the sockets remain for an extended period of time even after the connection is destoryed.
CURLOPT_LOW_SPEED_TIME might pose an interesting solution for that, though finding balance is key and I wonder how this plays out in regards to the fact the S3client might obtain larger blocks of data at a time. Looking at #10 it looks like I had excluded all forms of timeout at the time, not sure if I had considered the low speed or not and tested it. Would need some experimenting, it wouldn't be perfect since the sockets would still exist but they wouldn't remain open anywhere near as long possibly moving it into a section where its not as viable an attack.
As another option, I wonder if we can handle this at a middleware level?
For an HTTP/HTTPS proxy all we need to do is to always connect to the proxy and handle authentication (14 years as an SE dealing with Web Filtering I know this isn't a simple subject) while passing the rest of the request through.
Perhaps one of these middleware already exists in the open source world allowing us to keep using the lower level fopen() instead of CURL?
Implementing SOCKS proxies would be significantly more complex, which goes to the justification that perhaps some proxy support is better than no proxy support allowing it to be deferred.
- Status changed to Needs work
over 1 year ago 9:50am 12 May 2023 - π§π¬Bulgaria pfrenssen Sofia
Apart from the remarks above there is another issue that needs to be addressed: the AWS endpoint might be whitelisted in the
no_proxy
setting which is stored in$settings['http_client_config']['proxy']['no']
. If our endpoint is included here we should NOT use the proxy.I unfortunately have no more availability to work on this issue for the time being. Our hosting infrastructure recently allowed to bypass the proxy for accessing S3 so I have been assigned to different tasks.
- First commit to issue fork.