MartinDotNet
banner
martindotnet.bsky.social
MartinDotNet
@martindotnet.bsky.social
Observability Evangelist, DevRel @honeycomb.io, Microsoft MVP and #OpenTelemetry contributor.

I talk on stage about o11y and otel stuff... basically.
If you think o11y and monitoring are about the data/signals, you should probably do a bit more reading.
November 7, 2025 at 7:38 PM
And not just coding agents. The agents that live in observability platforms could use that context too.

Tests that already exist describe what the system is required to do, the telemetry tells you what its actually doing. Couple those 2 things and you have a winning combo.
November 2, 2025 at 2:13 AM
Yup, the point is that as much as you need it when debugging locally its just as important (maybe more) when running in production, so make it part of your telemetry.

That talk was before coding agents were mainstream, nowadays, that context frok traces can help agents locally too!
November 2, 2025 at 1:59 AM
It really depends on your backend, modern observability backends will allow you tk aggregate and create charts based on those attributes without having to pre-aggregate them.

Profiling linked to span ids will allow more.
October 29, 2025 at 12:39 AM
Not just that, you'd probably want the handler + for loop combination as well.

If you're adding the instrumentation, just add that property to the span.

`for_loop.<name>.iteration_count`

Then you can aggregate the span data and forget about the metric cardinality
October 28, 2025 at 11:19 PM
I mean, you know the code, but I'm fairly certain a continuous profiling system with tracing enabled (so profiles are correlated to trace data) would show you that.
October 28, 2025 at 11:16 PM
This is why we promote the idea of adding that instrumentation as augmenting trace data over doing the pre-aggregation, since the sample of the traces would likely capture the size of the large loops causing issues.

I think you should have another look at profiling though, it's come along way.
October 28, 2025 at 9:24 PM
if you add tracing + profiling, depending on the language, that will do exactly what you need, or at least close enough to the same outcome that you wouldn't care.

The issue is, you don't need this all the time, so a metric isn't really the right answer. When you need it, you need it.
October 28, 2025 at 9:24 PM
Couple of things... The cardinality of the metric would also be based on the application it's running in, and also where it's running, unless you do some post processing to remove those.

For the profiling approach, you would be able to get a lot closer than you think, it's not "just" syscalls.
October 28, 2025 at 9:24 PM
Ultimately, i wouldnt use metrics for this, the cardinality would be just too high, so "cheap to store" goes out of the window.

Adding the count to a trace is better, but hard to pinpoint.

Utilising continuous profiling, along with tracing, will show you the outcome you're looking for.
October 28, 2025 at 5:47 PM
I think what you're actually reaching for here is continuous code profiling. That would give you the understanding of performance of these things

Metrics, logs, tracing, are meant to be intentional, you instrument the things that matter, not instrument everything.
October 28, 2025 at 10:23 AM
What do you mean by "codemods"? I'm assuming you can't just add a call to the meter from otel?
October 28, 2025 at 12:58 AM
In v9, they have added OpenTelemetry support, so you should be able to push to any OpenTelemetry compatible endpoint now!
October 28, 2025 at 12:16 AM
It's totally possible to know that there is a tool that covers your usecase, but not know how to use to achieve that without learning it.
October 27, 2025 at 2:42 PM
I feel a lot of people just want to build something cool, when actually, the cool thing is already built, your job is to evangelise and enable the engineers to get the best out of that tool.
October 27, 2025 at 12:31 PM
If you have to mandate that people use your tool, its not actually the right tool.

If people are asking to use another tool, perhaps your tool is missing something? How do you prioritise building that feature? Do you have the expertise in the current team to do that?
October 27, 2025 at 12:31 PM
I recently came across 2 teams that use NPS (Net Promoter Score) to monitor whether the team is actually valuable.

This measures, at a simple level, how many people would recommend your service to a friend or colleague.

Ultimately, given a choice, would people use your platform?
October 27, 2025 at 12:31 PM
Whether its scaling the database out, or managing high user load, or building in standards and auditing, its a lot.

Thats not counting how you then develop new features as the need arises.

This is why Observability teams MUST become product teams and not Infra/SRE teams.
October 27, 2025 at 12:31 PM
Last week I was talking to an organisation that was sitting at 30 SREs just for the observability team that maintain their in-house tooling, not counting the platform SREs.

Thats just not sustainable for the majority of organisations.
October 27, 2025 at 12:31 PM
That's not always about cost, but also the amount of forms, security assessments and questionnaires etc. This can make it feel like its easier to just build it yourself.

Once you hit scale from a data and also users of those tools perspective, it becomes a fulltime job for multiple people.
October 27, 2025 at 12:31 PM
It's especially prevalent in the Observability world (because Observability is just a database and some graphs right?), but also common in other areas like IaC, Build tooling, and tonnes of other places.

The problem is normally procurement, or people being scared of them.
October 27, 2025 at 12:31 PM
Happy to help if something isnt working. As long as it isnt rust.
October 25, 2025 at 8:59 PM