Bad usage of the PHP hash_equals function

Created on 3 December 2024, 8 months ago

After discussing this with the security team, I'm publicly reposting the following issue :

Hi,

The core of drupal is using the PHP function hash_equals in multiple places.
The way this function is used does not always respect the api contract of the function
We can take Drupal\user\Controller\UserController::validatePathParameters() as an example :

protected function validatePathParameters(UserInterface $user, int $timestamp, string $hash, int $timeout = 0): bool {
    $current = \Drupal::time()->getRequestTime();
    $timeout_valid = ((!empty($timeout) && $current - $timestamp < $timeout) || empty($timeout));
    return ($timestamp >= $user->getLastLoginTime()) && $timestamp <= $current && $timeout_valid && hash_equals($hash, user_pass_rehash($user, $timestamp));
  }
  1. The length of the 2 parameters must be the same, otherwise the function return immediatly. In my example, the lengths are not check to be the same which can allow an attacker to determine the lenght of the known string by checking the timing of the comparaison => this is not really an issue since Drupal is open source and the length are known
  2. I am less sure about this one after checking the C implementation of PHP (https://github.com/php/php-src/blob/PHP-8.4.0/ext/hash/hash.c#L1114 and https://github.com/php/php-src/blob/PHP-8.4.0/main/safe_bcmp.c) : if we read the doc (https://www.php.net/manual/en/function.hash-equals.php), it is important to supply the known string first. In my example (and in multiple other places), the known string is supplied as the second argument

I known this is a very remote security issue. But since the goal of using hash_equals() is to mitigate timing attacks, I think we can pursue every little details.

Thanks.

📌 Task
Status

Active

Version

10.3 ✨

Component

other

Created by

🇨🇭Switzerland epieddy

Live updates comments and jobs are added and updated live.
  • Security improvements

    It makes Drupal less vulnerable to abuse or misuse. Note, this is the preferred tag, though the Security tag has a large body of issues tagged to it. Do NOT publicly disclose security vulnerabilities; contact the security team instead. Anyone (whether security team or not) can apply this tag to security improvements that do not directly present a vulnerability e.g. hardening an API to add filtering to reduce a common mistake in contributed modules.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @epieddy
  • First commit to issue fork.
  • Pipeline finished with Success
    about 1 month ago
    Total: 901s
    #520998
  • 🇫🇷France prudloff Lille

    I looked at every hash_equals() call in core and found only one place where the parameter order was wrong.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹
  • 🇯🇵Japan neptune-dc

    >I looked at every hash_equals() call in core and found only one place where the parameter order was wrong.

    I came to the same conclusion. +1 RBTC

  • 🇮🇹Italy apaderno Brescia, 🇮🇹
  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    I take that in the following code the $known_hash parameter should be $computed_hash, and the $user_string parameter should be $stored_hash.

    return $computed_hash && hash_equals($stored_hash, $computed_hash);

    (In other words, the correct code should be the following one.

    return $computed_hash && hash_equals($computed_hash, $stored_hash);

    I did not check all the calls to hash_equals(). There could be other cases where the arguments must be inverted.

    I am not sure the code should be changed as for the first point. Checking the string lengths in PHP could make the code more susceptible of timing attacks. If that is not a reason for changing code, the first point should be removed from the issue summary.

    Is there any evidence that passing the arguments in the wrong order would increase the possibilities of timing attacks?
    Given that base64_encode() and base64_decode() are not constant-time, and both the functions are used by Drupal core, would just fixing the arguments-order for hash_equals() make Drupal core code less prone of timing attacks?

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    Looking at the C code used by php_safe_bcmp(), I would say that inverting the arguments does not increase the possibilities of timing attacks.

    	const volatile unsigned char *ua = (const volatile unsigned char *)ZSTR_VAL(a);
    	const volatile unsigned char *ub = (const volatile unsigned char *)ZSTR_VAL(b);
    	size_t i = 0;
    	int r = 0;
    
    	if (ZSTR_LEN(a) != ZSTR_LEN(b)) {
    		return -1;
    	}
    
    	/* This is security sensitive code. Do not optimize this for speed. */
    	while (i < ZSTR_LEN(a)) {
    		r |= ua[i] ^ ub[i];
    		++i;
    	}
    
    	return r;
    

    r |= ua[i] ^ ub[i]; and r |= ub[i] ^ ua[i]; would take the same amount of time to run.

  • 🇫🇷France prudloff Lille

    According to this discussion the order was important in a previous implementation but not in the current one.

    It that's really the case, should we try to get the PHP doc fixed?

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    It that's really the case, should we try to get the PHP doc fixed?

    The PHP documentation should not show the It is important to provide the user-supplied string as the second parameter, rather than the first. warning, which seems to be a left-over from the previous implementation.
    IMO, it does not generally make sense. For example, in Drupal, hashes are not always values users provide.

  • 🇫🇷France prudloff Lille

    Should we close this as wontfix then?

    I opened an issue to clarify the PHP doc: https://github.com/php/doc-en/issues/4767

    IMO, it does not generally make sense. For example, in Drupal, hashes are not always values users provide.

    Well, if we use hash_equals() it's usually because the hash is provided by the user or calculated based on user-provided values.
    If that's not the case, timing attacks are not possible and there is no reason to use hash_equals().

  • 🇫🇷France prudloff Lille

    A PHP contributor confirmed that the order does not matter with the current implementation but it could change in the future: https://github.com/php/doc-en/issues/4767#issuecomment-3054071094

    So I think we should try to use the correct order to be future-proof, so setting this back to needs review.
    I also updated the summary to remove the point about checking the string length. I don't think leaking the hash length is really a problem and it would hard to prevent.

    (I also added back the empty sections of the issue summary, it helps seeing what still needs to be done.)

    I take that in the following code the $known_hash parameter should be $computed_hash, and the $user_string parameter should be $stored_hash.

    The user string is the hash calculated from values supplied in the HTTP request.
    In your example, $computed_hash is calculated from the password provided to be checked, so having it as the second parameter is correct.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    Lines like the following ones needs to be changed, then.

    return $computed_hash && hash_equals($stored_hash, $computed_hash);

    $computed_hash is the expected hash, not the user-provided value.
    In that case, there is no user-provided value, but a value stored in the database, which could be altered. If I follow that warning, it is $stored_hash that needs to be passed as second argument.

    I did not check all the hash_equals() calls, but there are more lines that should be changed, if we follow to the letter the warning given in the PHP documentation.

  • 🇫🇷France prudloff Lille

    Please see my comment in #14, $computed_hash is calculated from a user-provided password. $stored_hash comes from the database so is not user-provided.
    User-provided in the context of timing attacks means provided in the HTTP request.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    $known_string is the calculated hash, which must be kept secret. It cannot be a value stored in the database.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    In the example used by the PHP documentation, the hash calculated from a value provided by the user is passed as first parameter, not second.

    // Value and signature are provided by the user, e.g. within the URL
    // and retrieved using $_GET.
    $value = 'username=rasmuslerdorf';
    $signature = '8c35009d3b50caf7f5d2c1e031842e6b7823a1bb781d33c5237cd27b57b5f327';
    
    if (hash_equals(hash_hmac('sha256', $value, $secretKey), $signature)) {
        echo "The value is correctly signed.", PHP_EOL;
    } else {
        echo "The value was tampered with.", PHP_EOL;
    }
  • 🇫🇷France prudloff Lille

    The order does not depend on which variable is pre-calculated or not, it depends on which variable the attacker is trying to brute force.

    In the PHP doc example, the attacker is trying to brute force a valid signature, so the signature submitted by the user has to be second (and it is compared with a signature calculated from the secret key).
    In PhpassHashedPasswordBase::check(), the attacker is trying to guess the password so the hash computed from the sent password has to be second.

    It cannot be a value stored in the database.

    It can totally be a value stored in database. The known string is the correct value that does not vary per request, it is either stored pre-calculated or calculated from known values like a secret key.
    When the PHP doc says the known string "must be kept secret" it does not mean it can't be stored, it means it is the sensitive string that an attacker would be trying to get with a timing attack.
    The user string is the hash that varies with each brute force request.

    Timing attacks work by sending a lot of different values and checking if the time it takes the server to respond vary.
    For example when trying to brute force a password, the website will usually have a hash of the password stored in the database and will hash the user-supplied password and compare it to the hash stored in database.
    In that scenario, the known string is the hash stored in the database and the user string is the hash calculated from the password sent by the user.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    It is not a matter of pre-calculated values, which would eventually be values I calculate before I need them (and store in some places), not right before calling hash_equals().

    In PhpassHashedPasswordBase::check(), the attacker is trying to guess the password so the hash computed from the sent password has to be second.

    $computed_hash is not user-input. The user can just change the entered password, but finding a password that gives a specific hash is only possible by brute force, by definition of cryptographic hash, which is invertible and collision resistant.
    With passwords, the operation is more complex, and it is more difficult to find a password that gives an exact $computed_hash.

    When the PHP doc says the known string "must be kept secret" it does not mean it can't be stored, it means it is the sensitive string that an attacker would be trying to get with a timing attack.

    You missed the point. Neither $computed_hash nor $stored_hash is user-input (provided in the HTTP request), so there is no difference in choosing which value to pass as second argument; my preference is for $stored_hash, which could not be the preference somebody else has. It is no longer a matter of user input. (This does not mean the arguments order should not be changed.)

    Timing attacks work by sending a lot of different values and checking if the time it takes the server to respond vary.

    The only way to avoid timing attacks is using constant-time code, which does not mean just using hash_equals().

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    The issue summary says Drupal core uses hash_equals() in multiple places. The merge request just fixes the given example, but it should fix all the hash_equals() calls, when the user input is passed as first argument.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    Truly, the arguments order does not matter with the previous implementation either.

    mod_len = MAX(known_len, 1);
    
    /* This is security sensitive code. Do not optimize this for speed. */
    result = known_len - user_len;
    for (j = 0; j < user_len; j++) {
        result |= known_str[j % mod_len] ^ user_str[j];
    }
    
    RETURN_BOOL(0 == result);
    

    C code to compare in constant time two strings cannot be much different from the previous implementation and the actual implementation. I cannot see in which way they could rewrite the code to be dependent from the arguments order and still be constant-time code.

  • 🇫🇷France prudloff Lille

    $computed_hash is not user-input.

    Well I think this is were we disagree on the definition of user-input. IMHO a hash calculated from user input should be treated as user input here but I could be wrong.
    To be fair, the PHP doc does not give a lot of examples and the only example is not very similar to our use cases.
    I opened an issue about improving the examples: https://github.com/php/doc-en/issues/4771

    I cannot see in which way they could rewrite the code to be dependent from the arguments order and still be constant-time code.

    I'm not fluent in C but my understanding is that it matters in algorithms that can handle hashes with a different length (which hash_equals() does not support currently but could add in the future) because it could stop the comparison early if the user string and known string are inverted.
    I have not checked if there a places in core where we compare hashes that can have a different length.

  • 🇮🇹Italy apaderno Brescia, 🇮🇹

    IMHO a hash calculated from user input should be treated as user input here but I could be wrong.

    The example given in the PHP documentation is clear about what they call "user input": It is a value completely provided by the user. They also make clear that a value given in the HTTP request is considered "user input." $computed_hash is not provided in the HTTP request, so it is not "user input."
    $known_string is not a value stored in a database, though; it is the value I get from the function to calculate the "hash." To make a different example, if I store in a database the largest integer supported by PHP, what I should I consider the known value when checking the value stored in the database? PHP_INT_MAX, not the value stored in the database.

    I'm not fluent in C but my understanding is that it matters in algorithms that can handle hashes with a different length (which hash_equals() does not support currently but could add in the future) because it could stop the comparison early if the user string and known string are inverted.

    Two strings are equal if they have the same length and all their bytes are equal. With a four-byte string and a five-byte string, only four bytes can be compared, but the strings are still different because they have different lengths.

    Since the code cannot be optimized, C code is not much different from PHP code. In both the cases, two bytes can be compared in constant-time by XORing their value. (A XOR B is 0 when A and B have the same value, and it is different from 0 when A and B have different values.) Eventually, since constant-time code avoids if statements, the code could use $result = $known_string_length - $user_string_length; instead of if ($known_string_length !== $user_string_length) { return; }, but the code woud still be XORing the bytes.
    Eventually, the C code could be different if it is rewritten to in Assembly and all the CPUs would have instructions to compare two strings in constant-time. The alghoritm would not change, though.

Production build 0.71.5 2024