Lightnews — Scholar-powered news

MartinDotNet

@martindotnet.bsky.social

If you think o11y and monitoring are about the data/signals, you should probably do a bit more reading.

November 7, 2025 at 7:38 PM

MartinDotNet

@martindotnet.bsky.social

And not just coding agents. The agents that live in observability platforms could use that context too.

Tests that already exist describe what the system is required to do, the telemetry tells you what its actually doing. Couple those 2 things and you have a winning combo.

November 2, 2025 at 2:13 AM

MartinDotNet

@martindotnet.bsky.social

Yup, the point is that as much as you need it when debugging locally its just as important (maybe more) when running in production, so make it part of your telemetry.

That talk was before coding agents were mainstream, nowadays, that context frok traces can help agents locally too!

November 2, 2025 at 1:59 AM

MartinDotNet

@martindotnet.bsky.social

It really depends on your backend, modern observability backends will allow you tk aggregate and create charts based on those attributes without having to pre-aggregate them.

Profiling linked to span ids will allow more.

October 29, 2025 at 12:39 AM

MartinDotNet

@martindotnet.bsky.social

Not just that, you'd probably want the handler + for loop combination as well.

If you're adding the instrumentation, just add that property to the span.

`for_loop.<name>.iteration_count`

Then you can aggregate the span data and forget about the metric cardinality

October 28, 2025 at 11:19 PM

MartinDotNet

@martindotnet.bsky.social

I mean, you know the code, but I'm fairly certain a continuous profiling system with tracing enabled (so profiles are correlated to trace data) would show you that.

October 28, 2025 at 11:16 PM

MartinDotNet

@martindotnet.bsky.social

This is why we promote the idea of adding that instrumentation as augmenting trace data over doing the pre-aggregation, since the sample of the traces would likely capture the size of the large loops causing issues.

I think you should have another look at profiling though, it's come along way.

October 28, 2025 at 9:24 PM

MartinDotNet

@martindotnet.bsky.social

if you add tracing + profiling, depending on the language, that will do exactly what you need, or at least close enough to the same outcome that you wouldn't care.

The issue is, you don't need this all the time, so a metric isn't really the right answer. When you need it, you need it.

October 28, 2025 at 9:24 PM

MartinDotNet

@martindotnet.bsky.social

Couple of things... The cardinality of the metric would also be based on the application it's running in, and also where it's running, unless you do some post processing to remove those.

For the profiling approach, you would be able to get a lot closer than you think, it's not "just" syscalls.

October 28, 2025 at 9:24 PM

MartinDotNet

@martindotnet.bsky.social

Ultimately, i wouldnt use metrics for this, the cardinality would be just too high, so "cheap to store" goes out of the window.

Adding the count to a trace is better, but hard to pinpoint.

Utilising continuous profiling, along with tracing, will show you the outcome you're looking for.

October 28, 2025 at 5:47 PM

MartinDotNet

@martindotnet.bsky.social

I think what you're actually reaching for here is continuous code profiling. That would give you the understanding of performance of these things

Metrics, logs, tracing, are meant to be intentional, you instrument the things that matter, not instrument everything.

October 28, 2025 at 10:23 AM

MartinDotNet

@martindotnet.bsky.social

What do you mean by "codemods"? I'm assuming you can't just add a call to the meter from otel?

October 28, 2025 at 12:58 AM

MartinDotNet

@martindotnet.bsky.social

In v9, they have added OpenTelemetry support, so you should be able to push to any OpenTelemetry compatible endpoint now!

October 28, 2025 at 12:16 AM

MartinDotNet

@martindotnet.bsky.social

It's totally possible to know that there is a tool that covers your usecase, but not know how to use to achieve that without learning it.

October 27, 2025 at 2:42 PM

MartinDotNet

@martindotnet.bsky.social

I feel a lot of people just want to build something cool, when actually, the cool thing is already built, your job is to evangelise and enable the engineers to get the best out of that tool.

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

If you have to mandate that people use your tool, its not actually the right tool.

If people are asking to use another tool, perhaps your tool is missing something? How do you prioritise building that feature? Do you have the expertise in the current team to do that?

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

I recently came across 2 teams that use NPS (Net Promoter Score) to monitor whether the team is actually valuable.

This measures, at a simple level, how many people would recommend your service to a friend or colleague.

Ultimately, given a choice, would people use your platform?

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

Whether its scaling the database out, or managing high user load, or building in standards and auditing, its a lot.

Thats not counting how you then develop new features as the need arises.

This is why Observability teams MUST become product teams and not Infra/SRE teams.

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

Last week I was talking to an organisation that was sitting at 30 SREs just for the observability team that maintain their in-house tooling, not counting the platform SREs.

Thats just not sustainable for the majority of organisations.

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

That's not always about cost, but also the amount of forms, security assessments and questionnaires etc. This can make it feel like its easier to just build it yourself.

Once you hit scale from a data and also users of those tools perspective, it becomes a fulltime job for multiple people.

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

It's especially prevalent in the Observability world (because Observability is just a database and some graphs right?), but also common in other areas like IaC, Build tooling, and tonnes of other places.

The problem is normally procurement, or people being scared of them.

October 27, 2025 at 12:31 PM

MartinDotNet

@martindotnet.bsky.social

Happy to help if something isnt working. As long as it isnt rust.

October 25, 2025 at 8:59 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news