- Issue created by @guedressel
- 🇨🇭Switzerland berdir Switzerland
Doing some research on this at the moment.
openMetrics doesn't seem too active, OpenTelemetry seems to gain more traction, it's mostly push and not pull based, we're looking into using Signoz, https://signoz.io/blog/openmetrics-vs-opentelemetry/ seems a fairly good overview.
My understanding is that they can be combined and an OpenTelemetry Connector can retrieve/scrape an OpenMetrics endpoint.
Also just saw https://horovits.medium.com/openmetrics-is-archived-merged-into-promethe..., apparently OpenMetrics is basically dead?
So really it would be the prometheus format then.
- 🇦🇹Austria guedressel
I agree - OpenTelemetry is the way forward.
- 🇨🇭Switzerland berdir Switzerland
https://github.com/open-telemetry/opentelemetry-php/tree/main/examples/m... has some examples using the PHP SDK.
But after some research and discussions, we plan to implement a monitoring_prometheus (since openmetrics is no more) submodule which exposes the data on a /metrics route, this can be combined with https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/m... then.
Fewer dependencies and easier to set up in our infrastructure, as we have limited cron job capabilities and no local collectors.
But it should also be possible to adapt the implementation into something that uses the opentelemetry sdk to push the metrics in a cron hook and/or drush command.
- 🇬🇧United Kingdom catch
Just a note that there is https://www.drupal.org/project/opentelemetry → - I have not used it, just briefly looked at it before working on Gander/core performance testing - which also uses opentelemetry for the grafana dashboard.
When I looked at OpenTelemetry stacks, I also looked at signoz, but ended up landing on Grafana + Grafana Tempo + Prometheus because they seemed a bit more mature/well supported than signoz. But if you are using open telemetry then it should be agnostic about what it gets fed into at least in theory.
- 🇨🇭Switzerland berdir Switzerland
@catch: Yes, I'm evaluating that project as well. For us, being able to pull the metrics is currently easier to manage, but it should be pretty easy to adapt what I've implemented here to push the metrics to an opentelemetry receiver.
This now implements the endpoint, inspired by the prometheus_exporter project. I've introduced two new flags on sensor plugin definitions that allows them to opt-in to being metrics, so this does then export them. That's done to avoid exposing lots of requirements and other sensors that can't be expressed in a meaningful way as a metric. Could just be a value for ok/warning/critical I guess but I think that isn't very useful.
Instead, I added an extra metric for the total sensor count per status, that allows to set up alerts on there not being any critical/warning sensors. Additionally opened 📌 Allow to log status changes to watchdog as well as well Active , which allows to for example use the opentelemetry module to collect those logs as well, so they become visible again there. Haven't found any other way to handle non-metrics checks/sensors (like for example maintenance mode on/off or twig configured for production) with opentelemetry.
Remaining tasks that we need for us, then I'll merge and close this and additional features can be done in new issues:
* Allow to restrict access to the /metrics page by IP. Should be a textarea setting that stores it as a sequence in config and the controller should check that if not empty. similar logic to redirect_404 excludes for the UI.
* Add the ability to add custom labels, attributes, so for example each drupal instance can have a name. similar UI, a textarea, stored as a sequence. E.g. "service_name=Foo\n environment=production". should be stored as key value. run a token replace on the value.