Potential DOS vector in oEmbed resource fetcher

Created on 4 March 2025, about 1 month ago
This was originally reported to the security team and has been cleared as a public hardening.

Problem/Motivation

If:
1. An oEmbed resource consistently fails to be fetched
2. The oEmbed provider is sluggish to respond for whatever reason

then I could imagine that a site could be taken offline in short order even through non-malicious means. I've added a hacked up / trimmed down / commented snippet of the code in question below.

We've seen the dreaded "Failed to fetch oembed resource" message quite frequently due to folks making videos private without coordinating with our web content team -- so I know this can happen in the wild.

  /**
   * {@inheritdoc}
   */
  public function fetchResource($url) {
    $cache_id = "media:oembed_resource:$url";

    $cached = $this->cacheBackend->get($cache_id);
    if ($cached) {
      return $this->createResource($cached->data, $url);
    }

     // Make a blocking HTTP call that can tie things up for 5 seconds.
    $response = $this->httpClient->request('GET', $url, [
      RequestOptions::TIMEOUT => 5,
    ]);

    // blah blah, throw exceptions on failure.

    // Important!
    // 
    // A cached result is only set on successful retrievals from the oEmbed endpoint.
    // If a resource consistently fails (i.e. a YouTube video has been made private), 
    // and YouTube is slow to respond, this can lead to a full-blown denial of service
    // in fairly short order.
    $this->cacheBackend->set($cache_id, $data);

    return $this->createResource($data, $url);
  }

Steps to reproduce

  1. Forcibly throttle traffic to an oembed provider (host file / custom app?).
  2. Mash the F5 key with anonymous sessions.
  3. Watch the world burn.

Proposed resolution

Can we cache the failed result with a much lower TTL...say 60 seconds? That would at least prevent a stampede from occurring.

Remaining tasks

User interface changes

None

Introduced terminology

None

API changes

None

Data model changes

None

Release notes snippet

None

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

media system

Created by

πŸ‡ΊπŸ‡ΈUnited States luke.leber Pennsylvania

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @luke.leber
  • πŸ‡¬πŸ‡§United Kingdom catch

    Caching a failed response for 60 seconds seems OK.

    Could also use the lock API on cache misses so that only a single request to any particular resource can happen at once maybe.

Production build 0.71.5 2024