Message body missing from logs

Created on 10 February 2025, about 2 months ago

I'm using Seq to ingest logs from Drupal/OpenTelemetry. However, the log messages appears in Seq without a message body. I'm getting lots of other meta data but not the message itself.

I wrote a patch based on how the OpenTelemetry Log โ†’ module handles this and also took the liberty to use the Drupal log channel as the event.name if it's available to make selection based on channel possible.

๐Ÿ› Bug report
Status

Active

Version

1.0

Component

Code

Created by

๐Ÿ‡ธ๐Ÿ‡ชSweden peter tรถrnstrand

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @peter tรถrnstrand
  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    I'm using SigNoz and the message is part of the body. Signoz has features to extract things from the body and populate into attributes.

    I'm also doing that to make the channel available, and I'm using the event.name to identify logs from Drupal vs system logs and other applications. If the channel is set as event name, how would you do that? Also, I'm still pretty new to OpenTelemetry but channels are often user readable strings with spaces, which I think violates naming guidelines, at least what it should be according to https://opentelemetry.io/docs/specs/semconv/general/naming.

  • First commit to issue fork.
  • ๐Ÿ‡ฆ๐Ÿ‡ฒArmenia murz Yerevan, Armenia

    Thank you for reporting and the provided patch. I created an MR https://git.drupalcode.org/project/opentelemetry/-/merge_requests/63 with it, but while reviewing, I found things to simplify, so reworked it - please review and test it.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    I tested this a bit, and from what I see, the event.name that currently is there isn't just replaced, it is completely removed. I didn't understand that there's a scope name (which this now set to the channel, which is clearly much better than the service name which is currently duplicated into this).

    Looking at https://github.com/open-telemetry/opentelemetry-specification/blob/main/..., logs having an event.name makes them "events", a special type of more structured logs? I think it makes sense to keep that, we can easily do both, and keep event.name = drupal-log and set the correct scope? IMHO it is useful to keep a way to identify all logs coming from Drupal vs other logs? The alternative would be a namespaced scope_name as mentioned.

    As to the actual log body, in HEAD that is as a JSON structure that contains the message. With this change, it's *only* the body. Quite a bit of stuff in the current body is duplicated and repetitive, but not everything. For example, Drupal logs often contain a referer, this is lost. Exceptions also contain a backtrace, this information is also lost (if not used as a placeholder in the message). Per the specification, both is allowed.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    Reading through the current code and the patch, it looks like the most significant change regarding the message body is that before it was structured and after it is flat (a string).

    From that I conclude that Seq requires the message body to be a string. In order to fix the bug without breaking existing deployments.

    Also it looks like some backends will extract the backtrace from attributes (mine does), while others don't. So, it might be sensible to just make the body format configurable.

    Message body configuration knobs could include:

    • Message body as string vs. body as object
    • Include stack trace in message vs. exclude it

    Also note that the log APIs seem to be still in flux in the OT spec. E.g., v1.39.0 and v1.41.0. So maybe wait with refactoring until the spec trickled down into the OpenTelemetry PHP package.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    I see that https://git.drupalcode.org/project/otlog/-/blob/1.x/src/Logger/OtLog.php... does set the backtrace on the standard attribute, no reason to not add that here as well I guess.

    As for string vs array, not sure how much configuration there should be, if the backtrace is there as well, there isn't much left. currently, it's already passed through twice (as string and as array), so doesn't really hart to reduce that to one as the official attribute.

    One option would be to unset anything from $context that's explicitly used (essentially covering ๐Ÿ› Opentelemetry logs - context/ placeholder replacements gettingprefixed to log message Active ) and any keys that are left in there coud be added as a drupal.XY attribute.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    You mean something like this @berdir?

          // Extract context values to attributes not interpolated in message.
          $extraAttributes = array_filter(
            $placeholders,
            fn (string $key) => !str_contains($message, $key),
            ARRAY_FILTER_USE_KEY
          );
          unset($extraAttributes['uid']);
          unset($extraAttributes['request_uri']);
          unset($extraAttributes['ip']);
    
          foreach ($extraAttributes as $key => $value) {
            $record = $record->setAttribute($this->extraAttributeKey($key), $value);
          }
    

    The extraAttributeKey() method would need to sanitize the context value key and add a prefix (e.g., org.drupal.channel).

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    Yes, something like that. There's still the question of whether or not the scope name (that now uses the channel) should also be handled like that (org.drupal.access_denied instead of "access denied".

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    That seems to be still unclear upstream (see this GH issue). Personally, I'm leaning towards a fixed value for the instrumentation name/logger name. I have no particular reason other than a preference to keep things simple.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    I was quite confused about the event.name thing. Because the issue summary currently talks about that, but when testing the MR, it's gone.

    Took me a while, but that's because of the EventLogger that's removed here, this was apparently deprecated and is already removed from the documentation (the deprecated docs are a 404: https://github.com/open-telemetry/opentelemetry-specification/pull/4353). As @znerol said, the whole logs stuff seems quite unstable and in flux.

    That explains why the event.name attribute is gone and the channel is now set as the scope name instead.

    I also found https://opentelemetry.io/docs/concepts/instrumentation-scope/, which to me sounds like the channel is indeed a reasonably good scope, but it also links to https://opentelemetry.io/docs/specs/otel/glossary/#instrumentation-scope, which says the library is a good default choice, which would again by the default org.drupal.opentelemetry I suppose. But it also says "A typical approach to ensure uniqueness is to use the fully qualified name of the emitting code (e.g. fully qualified library name or fully qualified class name).".

    So IMHO, it's one of two options:

    1. We always use scope name "org.drupal.opentelemetry", and put the channel into attributes per #12). That gives us a stable, reliable way to identify drupal logs that doesn't require partial matching or something.

    2. We normalize the channel into org.drupal.

    I think I'd prefer 1, like @znerol.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    I pushed a few improvements to the MR now, implementing option 1 from 15 with a single logger (not sure if the logger should be kept in a property now with this, creating one does go through several factories and collects a bunch of stuff).

    I had to refactor the attributes part a bit, as it was running after message replacement, so the logic with the placeholders didn't work, including the existing match on %function and so on.

    This now results in:

    {
      "body": "Request GET /admin/config/development/opentelemetry, trace id 1cadec4301a25007cf21634df06114ac",
      "id": "2tAzfwrLk2SggLshd3y2GxgxobC",
      "timestamp": "2025-02-17T16:29:33Z",
      "attributes": {
        "client.address": "172.19.0.9",
        "org.drupal.channel": "opentelemetry",
        "org.drupal.referer": "https://DOMAIN/admin/config/development/opentelemetry",
        "url.full": "https://DOMAIN/admin/config/development/opentelemetry",
        "user.id": "1"
      },
      "resources": {
    .....
      },
      "scope": {},
      "severity_text": "DEBUG",
      "severity_number": 5,
      "scope_name": "org.drupal.opentelemetry_logs",
      "scope_version": "",
      "span_id": "8e88c0dec8755205",
      "trace_flags": 1,
      "trace_id": "1cadec4301a25007cf21634df06114ac"
    }
    
  • Pipeline finished with Failed
    about 2 months ago
    Total: 227s
    #426662
  • Pipeline finished with Failed
    about 2 months ago
    Total: 164s
    #426669
  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    edit: wrong issue.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    Looking at DbLog.php, it appears that there is a finite list of supported context values which are not placeholders:

    • channel
    • ip
    • link
    • referer
    • request_uri
    • timestamp
    • uid

    Everything else is lost unless it matches a placeholder in the message template. The remaining context keys could easily be matched manually together with uid, request_uri and ip:

    $attributes = [
      TraceAttributes::USER_ID => $context['uid'],
      TraceAttributes::URL_FULL => $context['request_uri'],
      TraceAttributes::CLIENT_ADDRESS => $context['ip'],
      TraceAttributes::HTTP_RESPONSE_HEADER . '.referer' => $context['referer'],
      'drupal.log.channel' => $context['channel'],
      'drupal.log.link' => $context['link'],
    ];
    
  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    That sounds good to me. Updated the logic and also adjusted the tests, to use proper context and extend with an exception to get test coverage for what I could.

    The tests are written in nightwatch and it took me a while to get them to run and have opentelemetry running as it expects, but eventually got it to work. But it didn't like the escaped, I also don't think it could do any other kind of asserts on those logs, such as that the backtrace exists without checking the exact value, or that certain things are _not_ logged. so I just put in there what I could assert.

  • Pipeline finished with Failed
    about 2 months ago
    Total: 182s
    #426930
  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    Took a look at SysLog and found that they are using strip_tags() on the $message and also on $context['link'].

    I was curious on how links are stored in the database, so I took a look (SELECT link FROM watchdog WHERE link<>"";). Result:

    <a href="/node/1" hreflang="de">Ansicht</a>
    [...]
    

    That is unfortunate:

    • href contains a relative path
    • After strip_tags(), only the link text will be visible (Ansicht)

    Since this is totally useless, I'd suggest to leave out the link field altogether.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    Yes, dealing with html tags is a good idea. I wonder if we want to do something specific about
    tags though, kind of the opposite of nl2br(). The perimeter module for example uses that, right now they're pretty unreadable (Banned: %ip for requesting %pattern <br />Source: %source <br /> User Agent: %browser). But half of that is duplicating information that's already logged (ip and referrer), so this could likely be simplified and shortened anyway.

    I'm OK with removing link, I forgot that it's a HTML link and not just a URL in my test additions and it's the reason they failed as well afaik (checkout prefix on Gitlab CI). We could try to extract the URL, it can be useful info, but likely more trouble than it's worth.

  • Pipeline finished with Success
    about 2 months ago
    Total: 188s
    #428025
  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    Thanks. Took this to a test-drive with loki and grafana. I noticed that if there is a backtrace, then it appears both in the code.stacktrace attribute as well as in the log body. Grafana displays the whole log body of each row in the explore view, and that can easily fill my laptop screen if an exception was thrown deep enough in the stack.

    In order to prevent that, I'm setting $message_placeholders['@backtrace_string'] = '' before replacing the placeholders into the message in otlog.

    There are some minor code style issues flagged by eslint (patch can be downloaded from the job) and cspell.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    I was wondering about the backtrace string, noticed that too but wasn't sure if that's too opinionated in case some backend doesn't support attributes like that. But agree that seems like a sensible default and if someone wants that they can propose a change to make it configurable.

  • Pipeline finished with Failed
    about 2 months ago
    Total: 180s
    #428376
  • Pipeline finished with Success
    about 2 months ago
    Total: 194s
    #428425
  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    Thanks! Maybe @peter tรถrnstrand could check whether that works in Seq as well.

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand jonathan_hunt

    I've applied this patch to latest commit bcf6219c for drupal/opentelemetry:1.0.x-dev but I get a fatal error Error: Undefined constant OpenTelemetry\SemConv\TraceAttributes::HTTP_REQUEST_HEADER in Drupal\opentelemetry_logs\Logger\OpentelemetryLogs->buildAttributesFromContext() (line 91 of modules/contrib/opentelemetry/modules/opentelemetry_logs/src/Logger/OpentelemetryLogs.php).. What am I missing?

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand jonathan_hunt

    Ah, needed to update "open-telemetry/sem-conv": "^1.30", all good.

  • ๐Ÿ‡ณ๐Ÿ‡ฟNew Zealand jonathan_hunt

    This patch works for me and I can send Drupal logs via Alloy to Grafana Cloud. It would be good to have HTTP_RESPONSE_STATUS_CODE in the data. Should that be added here? If not, what's the best way to augment the log record with additional attributes?

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland znerol

    The log record cannot know the final http status. You need tracing for that.

    If you are recording traces and the status code is missing from the spans, then I suggest to open a new feature request.

Production build 0.71.5 2024