Fred Hebert
banner
ferd.ca
Fred Hebert
@ferd.ca
Staff SRE @ honeycomb.io, Tech Book Author, Resilience in Software Foundation board member, Erlang Ecosystem Foundation co-founder, Resilience Engineering fan. SRE-not-sorry.

blog: https://ferd.ca
notes: https://ferd.ca/notes/
Pinned
Fred Hebert @ferd.ca · Mar 17
People tend to have a mental model where a system is stable until disturbed, far more often than they have one where the system is balanced because it is constantly intervened with.

The latter is a more useful approach to thinking about complex systems.
It’s that time of the year where Home Alone is going to be on tv a bunch. As relatives go “they should have had a battery powered alarm” and make other judgments on the family’s lack of responsibility in forgetting their kid, get a better source to learn from what happened.

ferd.ca/home-alone-a...
Home Alone: a Post-Incident Review
A post-incident review of the first Home Alone movie and how parents could leave Kevin behind.
ferd.ca
December 19, 2025 at 12:54 PM
this is where all my posting energy went over the last few weeks
We experienced catastrophic Kafka failure in @honeycomb.io's EU instance on 5 Dec, exacerbated by missing automation and safeguards. Recovery is now complete, and we have posted an outline of critical events during the incident. Fuller retro will follow in Jan. status.honeycomb.io/incidents/pj...
status.honeycomb.io
December 18, 2025 at 7:25 PM
An incident is an opportunity to learn and reorient how you navigate a tradeoff space, and the desire to wall them off as a disjoint/irrelevant side-effect of development you shouldn't spend time thinking about can do major disservices to organizations. But is desirable for entrenched structures.
December 3, 2025 at 4:09 PM
Reposted by Fred Hebert
Spent my Thanksgiving playing with public incident data and seeing if it was under statistical control.

surfingcomplexity.blog/2025/11/27/f...
Fun with incident data and statistical process control
Last year, I wrote a post called TTR: the out-of-control metric. In that post, I argued that the incident response process(in particular, the time-to-resolution metric for incidents) will never be …
surfingcomplexity.blog
November 28, 2025 at 5:16 AM
A related thing I’ve seen is decent software folks saying “at least what I do isn’t critical” as a way to distance themselves from the perceived responsibility/stress/burden of the things we work on, knowing the tradeoffs we make.

We’re given huge amounts of power and should act accordingly.
all I see are "they're not engineers" "that's not what engineering is" I want a thick good really real piece to sink my teeth in about all the parts of this that ARE "like engineering"
November 24, 2025 at 10:55 PM
Seeing more reports and industry players blaming code reviews for slowing down the quick development done with AI. It's unclear whether anyone's asking if this is just moving the cognitive bottleneck of "understanding what's happening" around. "Add AI to the reviews" seems to be the end goal here.
November 21, 2025 at 2:29 PM
New paper notes on The Failure Gap, showing people systematically underestimate failures across many domains. They show a link to information balance in media, how to possibly close the gap, and the positive effects of doing so.

Source: papers.ssrn.com/sol3/papers....
Notes: ferd.ca/notes/paper-...
Paper: The Failure Gap
ferd.ca
November 17, 2025 at 12:49 AM
As someone who tends to know things (or how to access this knowledge) about systems I'm continuously involved with, I had a few people literally tell me the cool thing about AI would be not having to ask me questions anymore.

And I get what they mean but I also hear what they say.
November 4, 2025 at 5:11 PM
I had been struggling with the riff of a song on bass for a short while and that I felt should have been way easier than it was, and after taking a good break and coming back at it with a slightly different approach it’s absolutely straightforward.

This happens outside music too, it’s great.
November 3, 2025 at 12:31 AM
post-incident item: make all changes and effects advertent
October 29, 2025 at 6:21 PM
I’ll do some pedant stuff and question the frame: move from software-as-object into software-as-boundary.

What is good software? What’s a good vehicle? What’s a good meal? What’s a good home? What’s a good school? What’s good land?

The answer depends on who/what interacts with it and how.
for context: the second edition of "Observability Engineering" starts off swinging with an opinionated chapter called "The Fundamentals of Building Good Software".

which begs the question... what IS ✨ good software ✨?

i'll drop my answers 👇 but curious to hear others.
@charity.wtf asked a great question on LinkedIn today. Her question was: "What is ✨ good software ✨?"

I'd love to hear your answers, but I'd also like to share mine. So if you'll allow me a moment of indulgence... 🧵
October 27, 2025 at 2:41 PM
Reposted by Fred Hebert
Hey can y'all do me a favor? If you run across any post-incident write-ups from companies affected by the AWS outage, could you send those to me?Feel free to leave a link in the comments, or you can drop them here:

www.thevoid.community/submit-incid...

Thanks!
Submit an Incident Report
The VOID is a community-contributed collection of software-related incident reports.
www.thevoid.community
October 24, 2025 at 10:50 PM
Reposted by Fred Hebert
5. For example, research produced and funded by tech companies often either frames problems as user-driven, or explores solutions as the obligation of users (e.g. community notes). Seldom does it explore consequences of design, UX, or algorithmic implementation, let alone the business model.
October 24, 2025 at 1:01 AM
Yesterday’s dealing with the AWS outage felt kinda fun in a high-adrenaline sense, and demanded a lot of creativity and improvisation. Today I was just god damn drained though.

Usually fatigue starts on the same day, and one can shitpost through it, but the drive to sarcasm wasn’t even there today.
October 22, 2025 at 2:35 AM
Reposted by Fred Hebert
Learned that an anonymous outside expert on submersibles did an interview with the OceanGate Titan investigation, and they released a transcript, with all the names redacted. The first line of his first answer? "I'm sure you're familiar with my film Titanic."
October 16, 2025 at 9:28 PM
Last year I had grown some decently sized carrots, which felt pretty cool. This year’s harvest is just full on monstrously large eldritch taproots.

They taste great, they’re just comically impractical to store.
October 13, 2025 at 3:54 AM
Washing all my synthetic fabric clothes, putting them in the dryer, and eating a spoonful of whatever is in the lint filter.

You gotta do what you gotta do to maintain brain plasticity and adapt to all of them products rolling out workflow changes and dark patterns.

Helps deal with the news too.
October 7, 2025 at 11:07 PM
Radiology is a specialty that was long promised to be automated via ML, but that still persists today. There are fascinating studies on AI/human joint performance around it.

I appreciate this article, and the distinction between benchmarks settings and the more situated nature of real work.
AI radiology today is powerful, but it consists of many narrow islands of automation that have failed to replace radiologists' time.

This isn't the full picture. Read more in the @worksinprogress.bsky.social piece: worksinprogress.co/issue/the-a...
The algorithm will see you now - Works in Progress Magazine
Radiology combines digital images, clear benchmarks, and repeatable tasks. But replacing humans with AI is harder than it seems.
worksinprogress.co
September 27, 2025 at 3:57 AM
The rebar3 -> rebar4 kickstarter ends in a few hours!

Late pledges stay open, but rewards (T-shirt, mug, stickers) disappear with the campaign.

We're seeking a more sustainable future for #Erlang build tools, and this campaign helps support this goal: www.kickstarter.com/projects/pee...
From Rebar3 to Rebar4: Integrating with Erlang/OTP
Building on top of Rebar3 to Fully Integrate with Erlang/OTP for All BEAM Languages, creating Rebar4 the next generation build tool.
www.kickstarter.com
September 24, 2025 at 1:59 PM
Reposted by Fred Hebert
I was doing some software history research and stumbled on this absolutely FASCINATING letter from 1964: dl.acm.org/doi/10.1145/...

Some random defense contractor writes in to say "You should deliver a minimal prototype as fast as possible to get feedback and involve users at every stage of labor"
Some observations concerning large programming efforts | Proceedings of the April 21-23, 1964, spring joint computer conference
dl.acm.org
September 22, 2025 at 8:40 PM
This analogy was in my head for months and I had to write about it. Treating incidents as "fix it and move on" is a losing proposition. It's more useful to see them as emergent outcomes of all the tradeoffs we make, and to use them as "landmarks" to orient ourselves in a solution space.
Ongoing Tradeoffs, and Incidents as Landmarks
Think of incidents as landmarks when finding your way. The tradeoffs you make can inform the type of incidents you get, and they in turn let you evaluate how you balance priorities and goal conflicts.
ferd.ca
September 20, 2025 at 2:18 PM
New SRE team swag is in
September 19, 2025 at 12:44 PM
Reposted by Fred Hebert
Software incidents are painful, and we're trying to help change that. If you deal with incidents in your work, please help us help you!

Take the survey here: www.thevoid.community/survey
VOID Survey
The VOID is a community-contributed collection of software-related incident reports.
www.thevoid.community
September 10, 2025 at 3:38 PM
Reposted by Fred Hebert
Proud to back the Rebar4 Kickstarter — moving the BEAM ecosystem forward with the community. 🙌
Thanks to the @theerlef.bsky.social (EEF) for a €1,750 contribution to our Rebar4 Kickstarter. It moves us closer to funding work to prepare for OTP integration and cut external deps.
Back: www.kickstarter.com/projects/pee...
September 8, 2025 at 7:04 PM
‘helm’ has a levenshtein distance of 1 from both ‘help’ and ‘hell’ and I think that’s on purpose
August 28, 2025 at 11:08 PM