- Merge request !2017StreamWrapperInterface return types for realpath() and dirname() are inconsistent with documented use and intention β (Open) created by lhridley
- πΊπΈUnited States lhridley
Following up on this issue, which is now back on my radar screen.
StreamWrappers for remote object stores cannot comply with creating a directory, as that concept does not exist for remote object stores. By limiting the Interface dirname() to require a value, you are effectively making the StreamWrapperInterface incompatible with remote object stores such as S3, Digital Ocean Spaces, Google's object store (don't remember the name currently), etc, requiring any contrib modules that allow utilizing a remote object store for file storage to write their own interfaces, which may or may not be compatible with Drupal's File system interfaces, which all originate with the StreamWrapperInterface.
the S3 protocol complies with the php streamwrapper class, which also does not have the method dirname(), and the documentation indicates that calls to the class methods rmdir() and mkdir() should not be made if the wrapper does not support creating or removing directories.
By not allowing FALSE as a return value, it significantly limits the use of the Core StreamWrapperInterface as a viable interface for alternative file storage mechanisms such as those that comply with the S3 protocol, which at its heart IS a streaming interface.
- πΊπΈUnited States cmlara
Good timing on bringing this up again, I was just going through another round of PHPSTAN L9 on the s3fs module and thinking "did we have an issue open yet for these incorrectly documented methods?"
Actionable Concern:
Really my ultimate concern in the proposed patch that sent this back to needs-work is as mentioned in #5 is dirname() and FileSystemInterface.
If we change StreamWrapperInterface::dirname() to allow a false return (as the comment implies is allowed but the @return annotation does not) we have an issue with FIleSytemInterface::dirname()
Here is the current implementation of FileSystem (inline comments my own)
/** * Gets the name of the directory from a given path. * * PHP's dirname() does not properly pass streams, so this function fills that * gap. It is backwards compatible with normal paths and will use PHP's * dirname() as a fallback. * * Compatibility: normal paths and stream wrappers. * * @param string $uri * A URI or path. * * @return string * A string containing the directory name. * * @see dirname() // Note that php dirname() does NOT support returning FALSE. * @see https://www.drupal.org/node/515192 * @ingroup php_wrappers */ public function dirname($uri) { $scheme = StreamWrapperManager::getScheme($uri); if ($this->streamWrapperManager->isValidScheme($scheme)) { return $this->streamWrapperManager->getViaScheme($scheme)->dirname($uri); // This calls StreamWrapperInterface::dirname(). MR2017 suggests this be allowed to return FALSE, while self::dirname() says false is not an approved return value. If Core ran a higher level of PHPstan this would of been flagged by automated testing. } else { return dirname($uri); // PHP dirname() ALWAYS returns a string. } }
That all means we need to figure out FIleSystemInterface::dirname() do we add
@return |FALSE
here too? or do we need to have FileSystem::dirname() catch the FALSE return and convert it to a string of some form?Even if a storage doesn't support the concept of '/' being a directory, it could still return "scheme://" eg "http://www.example.org/dir/file1/file.txt" can still have a dirname() of "http://www.example.org/dir/" and get a useful outcome.
This is why I think we might be better for dirname() to just remove the 'false' return comment and keep the return typehint as is, though that itself might be a BC break.
More generic thoughts:
On a more generic standpoint of dealing with these issues (that may be much wider scope than should be consider here)
By not allowing FALSE as a return value, it significantly limits the use of the Core StreamWrapperInterface as a viable interface for alternative file storage mechanisms
There is some validity to this.
I'll note that the Drupal s3fs module has implemented dirname(), to my knowledge it ends up working since anywhere your working with /path/to/folder a streamWrapper works too, its only realpath() we return FALSE on. The aws-sdk-php doesn't implement it because PHP doesn't pass dirname() to streamWrappers. I will note \AWS\S3\StreamWrapper (used by s3fs) actually DOES have a concept of directories, even though S3 itself is a flat storage system.
As you mention mkdir() in PhpStreamWraperInterface is a method that should NOT be implemented if the streamWrapper does not support it. Obviously that can not be done if you call 'implements PhpStreamWrapperInterface' which really means methods like this should NOT be on the base interfaces.
Since items like dirname() and realpath() are our own methods they strictly don't have to follow PHP's requirements, however I would question the wisdom of them deviating too far from them, the more our methods are 'drop in' replacements the better it is for code compatibility.
- πΊπΈUnited States lhridley
Updated the PR for the 10.3.x branch.
Traced documentation and implementation of dirname() on FileSystemInterface, while not directly related to StreamWrapperInterface, is nonetheless relevant.
All implementations of StreamWrapperInterface in core trace back to abstract class LocalStream, which handles (and can) returning a FALSE value on ::realpath(), so changing the documentation and return value typehinting on StreamWrapperInterface::realpath() to reflect a return value of FALSE.
Re-reviewed the implementation of FileSystemInterface::dirname() as well as PHP's dirname(), neither will return FALSE. LocalStream::dirname() also does not return FALSE as well. Revised the return value comment to remove reference to a FALSE return value.
- Status changed to Needs review
8 months ago 7:26pm 31 March 2024 - Status changed to Needs work
8 months ago 7:41pm 31 March 2024 - πΊπΈUnited States cmlara
I believe core would require this target 11.x first and be back ported to older branches (if eligible).
It would be better do the rebase now to avoid this issue being delayed in a few months.
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Can we take this further and add string|false a the typehint rather than just the docs?
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Ah, we can't add return typehints on interfaces without breaking bc.
- Merge request !7401#3271894 Fix documented StreamWrapperInterface return types for realpath() and dirname() β (Closed) created by kim.pepper
- Merge request !7402Issue #3271894 by lhridley: StreamWrapperInterface return types for... β (Open) created by kim.pepper
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Created a new MR based on 11.x.
I think we should go through the rest of the methods to ensure we have the correct documented return types.
- πΊπΈUnited States lhridley
@kim.pepper -- I reviewed every defined in Drupal\Core\StreamWrapper\StreamWrapperInterface back in 2022 when I opened this issue. The two identified in this issue were the only ones whose documentation differed from the actual return type defined on the interface signature.
- Status changed to Needs review
8 months ago 6:21am 10 April 2024 - Status changed to RTBC
7 months ago 10:08pm 10 April 2024 - π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Thanks for confirming. This docs change looks good to me.
- Status changed to Needs work
7 months ago 7:57pm 11 April 2024 - π¬π§United Kingdom alexpott πͺπΊπ
\Drupal\file_test\StreamWrapper\DummyExternalReadOnlyWrapper::dirname() returns FALSE. I do not think we should be changing that part of the interface.
We need to see what contrib is doing and a cursory search shows https://git.drupalcode.org/project/brandfolder/-/blob/2.0.x/src/StreamWr...
- πΊπΈUnited States cmlara
@alexpott Can you advice how you want this resolved given the points raised in #10 that the rest of core/contrib is not equipped to handle a FALSE return from dirname()?
Sometimes the @return is wrong and sometimes the comments are wrong. In this case it appears to me the documentation is wrong for dirname() not the @return.
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
@alexpott #25 we only ever return a string inline with the PHP
dirname()
function, so our documentation is wrong. The example you provided typehints the return type asbool
which is wrong. - πΊπΈUnited States lhridley
@alexpott -- being able to return FALSE for dirname() would be awesome, especially for remote stream management to/from assets on object stores like S3. The original docblock notes indicated that both ::realpath() and ::dirname() werre supposed to be capable of returning a boolean, but the actual type hint on the return value as implemented were set for strings only.
(Begin my soapbox contribution)
IMO, we should not have either ::realpath() or ::dirname() on the StreamWrapperInterface, as the concept of both may not actually exist on the Streaming application that the wrapper is intended to integrate. For key/value object stores it's absolutely not applicable. If we ARE going to keep them on the StreamWrapperInterface, then we should make their implementation consistent with the PHP implementation of both:
https://www.php.net/manual/en/function.realpath (returns string or FALSE)
https://www.php.net/manual/en/function.dirname.php (returns string)PHP's dirname() and realpath() don't support "streaming" per se. They both are implicitely structured to work with a traditional, local file system. Implementing dirname() and realpath() on Drupal\Core\File\FilesystemInterface is appropriate (somewhat), but I still think even there they should return FALSE when they're not applicable (like streaming applications).
- AWS PHP SDK StreamWrapper Interface: https://github.com/aws/aws-sdk-php/blob/master/src/S3/StreamWrapper.php <== neither method name exists
- Google Cloud Storage StreamWrapper class: https://github.com/googleapis/google-cloud-php-storage/blob/main/src/Str... <== neither method name exists
This ticket, and the ensuing discussion, exposes what is IMO a fundamental flaw in the way the Drupal Core has implemented StreamWrappers. The implementation has in implicit assumption that the underlying roots of any StreamWrapper implementation is working with a traditional mounted file system. In practice, these two methods cause workarounds that have to be implemented for a non-traditional alternative to the more traditional mounted file system, such as remote object stores.
As we have evolved other parts of Drupal Core to adapt to a constantly changing tech landscape, this is a particular component that needs to be revisited, as simple changes to the implementation open up a broader world of possibilities for more effectively using alternative file systems as they are intended to work. Breaking changes will likely result, but sometimes breaking changes are necessary to evolve.
A more radical change would be to decouple Drupal\Core\StreamWrapper\StreamWrapperInterface from Drupal\Core\StreamWrapper\PhpStreamWrapperInterface. This could actually be done without alot of disruption simply by:
- Making the two Wrapper Interfaces independent (in other words don't have StreamWrapperInterface implement PhpStreamWrapperInterface).
- Deprecate the "extended" methods on StreamWrapperInterface that are actually a part of PhpStreamWrapperinterface, with messages to revise any custom implementation (custom code and contrib modules) to implement BOTH interfaces.
- Pick a date for deprecations to take effect (11.x release perhaps)
- Revise Core to decouple the two interfaces
The code changes would be relatively minor, and this would make the StreamWrapper Management component of Drupal infinitely more flexible with regard to managing a wider variety of streaming services, including video streaming, podcasting, etc in addition to more traditional asset storage that is using non-traditional mounted file systems for storage (such as remote object stores, for example).
(End my soapbox contribution)
- πΊπΈUnited States cmlara
being able to return FALSE for dirname() would be awesome, especially for remote stream management to/from assets on object stores like S3
As an S3 streamWrapper maintainer (s3fs module) I'm not able to see any advantage to allowing FALSE, Can you please provide an example of how not having dirname() would help an object store streamWrapper to help me understand why this would be important?
AWS PHP SDK StreamWrapper Interface: https://github.com/aws/aws-sdk-php/blob/master/src/S3/StreamWrapper.php <== neither method name exists
Google Cloud Storage StreamWrapper class: https://github.com/googleapis/google-cloud-php-storage/blob/main/src/Str... <== neither method name existsIt is important to note that no PHP streamWrapper ever implements these methods. See https://www.php.net/manual/en/class.streamwrapper.php
I will admit I always found it a bit clumsy that in Drupal a streamWrapper class implements code intended for use by PHP directly and code that Drupal Core expects to call into directly rather than separating them into two different systems.
That said, both methods were added to the Drupal API to allow more control rather than depending solely on PHP's implementation. This is especially true for PHP realpath() which will always return FALSE for a streamWrapper, even if its backed by local disk storage. Though there is a good argument that no Drupal streamWrapper should ever implement realpath() to ensure file access always goes through the streamWrapper.
As an s3fs maintainer: we would have no issue with these methods being removed and relying on PHP implementations of these methods.
I will note that if you absolutely need a flat structure this simple code would work for dirname()
public function dirname($uri): string { return 's3://'; }
- πΊπΈUnited States lhridley
@cmlara You summed it up nicely:
It is important to note that no PHP streamWrapper ever implements these methods. See https://www.php.net/manual/en/class.streamwrapper.php
I will admit I always found it a bit clumsy that in Drupal a streamWrapper class implements code intended for use by PHP directly and code that Drupal Core expects to call into directly rather than separating them into two different systems.
The example of how not having the
::dirname()
defined on the StreamWrapperInterface is, a module maintainer doesn't have to fight with working out how to handle a method that is totally irrelevant to the functionality of a StreamWrapper, and write and maintain unnecessary code that contributes no value to the functional aspects of the StreamWrapper adapter that is being registered with Drupal's StreamWrapperManager service.I've been a maintainer of the Flysystem project for about 4 years, I got involved whenone of the maintainers left the community and we had several client projects that utilized Flysystem. The buggiest part of the initial implementation of Flysystem was the unnecessary methods such as dirname(), realpath(), and related methods that were also badly implemented. These have been refactored somewhat to make them better, but untangling the hacks that were put in place to force them to work is like trying to hook a race horse to a plow and expecting it to be effective :).
Flysystem defines protocols in configuration, which for example will allow for different definitions of a stream wrapper adapter that implement different protocols (i.e, "docx://", "pdf://", "image://" etc. I can then define media entities that are configured to utilize different stream wrapper adapters that will store an entry in the file managed table utilizing the protocol defined for the different types.
As a result, I can then send "docx://" to one S3 bucket, "pdf://" to a different S3 bucket, "image://" to Google Cloud Storage, and "styles://" to the local file system if my business use case necessitated such a structure. And I've used it to manage scientific studies that were PDFs living on different S3 buckets that were managed by the respective research teams in legacy systems outside of Drupal and uploaded to an S3 instance that they managed. We built a Drupal content management system to index the studies in Drupal and make them searchable with Solr. The team would then add the metadata about the study to the Drupal entry, and "publish" the study in Drupal, which would then in turn make it visible for Solr to index.
We used Flysystem to manage the Drupal integration with the different S3 buckets.
- πΊπΈUnited States lhridley
Additional info on contrib modules: Remote Stream Wrapper β , which implements
Drupal\Core\StreamWrapper\StreamWrapperInterface
by extension. Looks like this module implementedStreamWrapperInterface::realPath()
by the documentation and not by the interface defined return type.StreamWrapperInterface::dirname()
is implemented by the defined return type, which is string.::realPath
Returns FALSE;
https://git.drupalcode.org/project/remote_stream_wrapper/-/blob/2.x/src/...
::dirname
Returns a string;
https://git.drupalcode.org/project/remote_stream_wrapper/-/blob/2.x/src/... - πΊπΈUnited States lhridley
Circling back for additional information in looking at the implementation of stream wrappers in Drupal core, as it is relevant to this discussion.
Drupal core has three active stream wrappers currently registered:
Two of the three stream wrappers extend the abstract class
\Drupal\Core\StreamWrappers\LocalStream
(PublicStream
andTemporaryStream
), and the third,AssetsStream
, extendsPublicStream
.LocalStream::realpath()
returns the result ofLocalStream::getLocalPath()
, which has return types ofstring|bool
, and in code will return either FALSE, or a string..
PublicStream::getLocalPath()
overrides the parent (LocalStream::getLocalPath()
) method, and also return either FALSE, or a string.
TemporaryStream
by inheritance implements the code defined in it's parent,LocalStream
.In practice, it appears that Drupal Core's stream wrappers fail to adhere to the current Interface as it is described.
In tracing the source of the implementation of StreamWrapperInterface::dirname() back through the same streamwrappers, all inherit from LocalStream::dirname(), which is defined with a return type of string on the interface, and adheres to that.
Conclusion: the definition of
StreamWrapperInterface::realpath()
should be modified to implement return types ofstring|bool
, as described in the current comments.This change will not break B/C in core, but should probably be identified in a release as a breaking change, as it could impact contrib modules that are using
Drupal\Core\StreamWrapper\StreamWrapperInterface
. Also worth noting, none of the stream wrappers in contrib modules that I've reviewed implement ::realpath() in adherance to the Interface, so they should in practice be fine. - π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
A typehint of
string|false
would be more accurate. We never returnTRUE
. - πΊπΈUnited States lhridley
lhridley β changed the visibility of the branch drupal-3271894-3271894-streamwrapperinterface-return-types to hidden.
- Status changed to Needs review
7 months ago 5:11am 16 April 2024 - πΊπΈUnited States lhridley
Changing description of issue, since interface definition of return types on
StreamWrapperInterface::dirname()
is no longer relevant, and pursuant to the recently added comment #32 π StreamWrapperInterface return types for realpath() and dirname() are inconsistent with documented use and intention Needs work .@kim.pepper -- after looking at the return types traced back on the actual implementation of protected methods in LocalStream, I modified it for consistency.
I have no strong feelings about
string|bool
vsstring|FALSE
, but changed the original PR for consistency with current implementations. I'm open to modifying this tostring|FALSE
, but would that not also mean that the return type forLocalStream::getLocalPath()
would also need to be updated? - πΊπΈUnited States smustgrave
smustgrave β changed the visibility of the branch 3271894-streamwrapperinterface-return-types to hidden.
- πΊπΈUnited States smustgrave
Went to review and hiding older branch.
But seems needs clarity on string|bool vs string|FALSE
- πΊπΈUnited States lhridley
@smustgrave, thank you for asking.
1. Clarity on whether the typehint needs to be
string|bool
vsstring|FALSE
2. If the decision isstring|FALSE
do we also need to revise the return types forLocalStream::getLocalPath()
andPublicStream::getLocalPath()
to string|FALSE as well.For clarity, the method
::getLocalPath()
in theLocalStream
andPublicStream
classes are protected methods that are called directly byLocalStream::dirname()
andPublicStream::dirname()
but have a return type ofstring|bool
instead ofstring|FALSE
.At least, this is the next step from my vantage point.
- Status changed to RTBC
6 months ago 10:10pm 29 May 2024 - π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Thanks @lhridley for your detailed investigation. I'm happy with the changes in the MR.
- πΊπΈUnited States cmlara
A note from a PHPStan standpoint:
@return string|bool
Means that anyone calling the method needs to check for TRUE as well before passing the return to a function that only accepts strings.This means the classic
if ($value === FALSE) {}
check needs to usually be re-written toif (is_bool($value)) {}
even though the comments only describe FALSE. - π³πΏNew Zealand quietone
I have read the issue summary, the comments and the MR. After reading the MR I see that the proposed resolution does not match the code changes.
I am not sure about this.There is an instance in core that returns FALSE for dirname and this issue is removing mentioning that in the comment for the return. And the change to realpath allows boolean, but there is no implementation in core the returns TRUE. The change to the @return and the return comment should be in agreement.
- πΊπΈUnited States cmlara
Updated the IS to match the steps agreed in the issue and SLACK conversations.
There is an instance in core that returns FALSE for dirname
Our reviews did not find this in functional code. I've updated the IS based on the belief this isn't the case.
- π³πΏNew Zealand quietone
I don't know what the historical information is that is being referred to. Using git I learned that the first use of 'sting|false' was in in 2012.
- Status changed to Needs work
5 months ago 9:00am 15 June 2024 - π¬π§United Kingdom alexpott πͺπΊπ
The file that returns FALSE is core/modules/file/tests/file_test/src/StreamWrapper/DummyExternalReadOnlyWrapper.php it needs fixing if we're going to make this change.
- πΊπΈUnited States lhridley
The return type declared in LocalStream::getLocalPath() also needs to be redefined as string|FALSE. In reviewing the code, it does not appear that TRUE is ever returned, you either get a string, or a FALSE.
Same with PublicStream::getLocalPath(), which inherits the doc comments from LocalStream::getLocalPath(), which it extends.