Ideas and feedback

Created on 10 August 2024, 5 months ago

Problem/Motivation

Tested this project a bit, there are a few issues that I noticed, wanted to collect them first, can try to create more specific issues when there's agreement.

* Exception handling and resilience, since this operates on low level stuff, it can pretty much brick your site in ways that are hard recover when for example logging or other subscribers is broken. For example, OpentelemetryLogs->onTerminate() can fail if the endpoint is unreachable and things go _very_ badly when config is missing (e.g. when you directly install opentelemetry_logs). Se also the wrong fix in 🐛 opentelemetry_logs missing module dependency Fixed .
* Configuration in low level services like logging is discouraged, it can result in recursion. There are core isues about syslog as well. I'd recommend moving most things to settings or service parameters. One option would be to inject it into service parameters but would require container rebuilds on config change. Tricky for BC and less convenient maybe to try it out, the settings form could just be a place that shows current settings and provides snippets that you can just copy paste into settings.php?
* Trace stuff could maybe use percentage settings, e.g. detailed traces for database and future stuff only on 1% of the requests, similar to how blackfire and other APM's operate.
* In SigNoz, I saw each request twice, once with a (request) suffix, not if that's on purpose but seems to add little.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

🌱 Plan
Status

Active

Version

1.0

Component

Code

Created by

🇨🇭Switzerland berdir Switzerland

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @berdir
  • Status changed to Fixed 2 months ago
  • 🇦🇲Armenia murz Yerevan, Armenia

    Thank you for the feedback! Was busy with other project and had no time to contribute to OpenTelemetry, but now got some time and improved some things:

    1. In the issue Add support for open-telemetry/sdk v.1.1 and make missing dependencies exceptions more clear Active I improved error handling that should prevent the recursion, and clearly describes if the required libraries are missing - please test.

    2. Implement covering by functional tests, which should prevent us from broken functionality and regressions on the new versions.

    3. About this:

    * In SigNoz, I saw each request twice, once with a (request) suffix, not if that's on purpose but seems to add little.

    - it's the intended behavior, because Drupal actually starts processing the request later, and spending some time on the initialization of the Drupal Kernel and so on, so it's good to see the full image of what is happening under the hood. Maybe I'll add an option to disable this, will think about it.

    4. About this:

    * Trace stuff could maybe use percentage settings, e.g. detailed traces for database and future stuff only on 1% of the requests, similar to how blackfire and other APM's operate.

    Yes, I have this feature in the roadmap, filled a separate issue about this: Add the OpenTelemetry Sampler and the sampling rate configuration Active
    For now, probably you can already control this using the env variables like this:

    export OTEL_TRACES_SAMPLER="traceidratio"
    export OTEL_TRACES_SAMPLER_ARG="0.5"

    - more details here: https://opentelemetry.io/docs/languages/sdk-configuration/general/

    But I will test this more later.

    So, I hope that covered all your points, let's close this issue and feel free to fill separate issues about each bug and feature request - it will be easier to track them separately.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024