Discovery as described in https://oembed.com/#section4 is currently only kicking in if we get a URL which does not match any of the URL schemes defined for an endpoint, see \Drupal\media\OEmbed\UrlResolver::getProviderByUrl()
.
I'm working with a provider which currently does not provide <link rel="alternate" type="application/json+oembed" href="..." />
discovery tags on the actual resource URL, but they do provide that tag on a different URL (other subdomain). The URL delivered for the resource correctly contains the endpoint URL and the full actual resource URL as the url
query parameter. If you request it, it does return the full oEmbed JSON object we expect.
If I specify the URL scheme to the player / resource URL and our editors input it when creating a Media entity in Drupal, everything works as expected.
However, the provider clearly exposes the other URL in their interface, so our editors are likely to see that and try to create a Media entity using that - which leads to an "unknown provider" error.
If I add the secondary URL as another scheme for the same endpoint, it won't work. The endpoint only accepts the actual player / resource URL and won't return any JSON for the secondary URL.
Since the secondary URL page does contain a working <link/>
for discovery, as shown above, I thought we'd be fine anyway if the editors did this, and we almost are.
What happens is, Drupal:
- starts the main process using
Drupal\media\Plugin\media\Source\OEmbed::getMetadata()
- wants to know the URL to the oEmbed resource so uses
UrlResolver::getResourceUrl()
on the user input (the "secondary" URL in my case)
- checks the schemes for matches to the input URL using
UrlResolver::getProviderByUrl()
- finds no matching schemes
- falls back on discovery in the protected method
UrlResolver::discoverResourceUrl()
and requests the input url URL directly using an HTTP client
- parses markup, sees the
<link/>
and pops that URL back up to UrlResolver::discoverResourceUrl()
- requests the discovered URL with
ResourceFetcher::fetchResource()
(it's pointing to the endpoint with the correct url
parameter.)
- gets valid oEmbed JSON it can parse
- grabs the name of the provider
- looks up the provider definition based on the provider name
- creates a complete
\Drupal\media\OEmbed\Resource
instance, passing in the now found provider definition
- returns that [oEmbed] resource up to
UrlResolver::getProviderByUrl()
- throws away the resource and returns just the provider definition up to
UrlResolver::getResourceUrl()
- still inside
UrlResolver::getResourceUrl()
it wants to find out which endpoint is appropriate for this resource url using UrlResolver::getEndpointMatchingUrl()
- iterates through any defined endpoints in the provider, testing the URL schemes for matches to the input ("secondary") URL (or in most normal cases a valid resource URL which just wasn't listed in the schemes and we needed the discovery process for)
- finds no matching URL schemes and falls back to whichever endpoint is listed first and pops that back up to
UrlResolver::getResourceUrl()
-
- makes another request to that endpoint - using the secondary URL as the
url
parameter
- (For this provider it does not match the expected endpoint URL format based on the input string, and is thus not a request that has been cached by the resource fetcher. Even if it did match the requested format the URL would not have been fetched before, or we would have known to call it earlier and would not have ended up in the discovery phase.)
- ends up with a 404 because the secondary URL used on that endpoint is not actually a valid resource and throws an exception
-
- OEmbed::getMetadata() catches that exception, prints "The provided URL does not represent a valid oEmbed resource." validation error, returns NULL, leading to a form error.
I can see a sort of elegance to doing the discovery process inside the getProviderByUrl()
method and just returning the provider there since that's what that method is looking for, avoiding calling code having to care about the internal discovery steps we needed to go through to actually find a valid provider.
However, it also means the calling code does not know the correct resource has already been found, parsed, and thrown away, so it must go through the entire URL scheme matching again as outlined above, make a guess on the endpoint to use, we already know none of them matched, and hopefully find the embed code we already had as part of the resource fethed earlier.
Should we not move the discovery handling out of getProviderByUrl()
?
The closest candidate location for handling the discovery process would be getResourceUrl()
and OEmbedResourceConstraintValidator::validate()
, but then we have nearly the same issue of asking for a resource, not finding the provider, falling back on discovery, getting a full resource object, extract the URL and then throwing it away to just keep the provider. For the constraint validator I think that may actually be fine, at least if the resource fetcher caches the response, but otherwise it just bumps the problem up a level.
\Drupal\media\OEmbed\UrlResolverInterface
has no other methods so we're out of candidate locations here. Doing it a level higher (other than in the validator) would mean we're all the way up in the OEmbed
media source class, but may'be that's not so bad. It knows about the intricacies of oEmbed anyway, and could fall back on trying to directly the resource using the discovery URL if UrlResolver::getResourceUrl()
didn't work.
It would not require big API changes if we basically just moved the protected methods doing the discovery process up there, but it would mean anyone using the media.oembed.url_resolver
service directly would not automatically fall back to using the discovery process.
There are a few alternatives if we want to preserve that behavior, such as extending the interface UrlResolverInterface
either with new optional parameters to disable the automatic use of discovery when desiring to do it manually, or perhaps create a new service just for this purpose.