Distributed Systems
distribsystems.bsky.social
Distributed Systems
@distribsystems.bsky.social
I tweet/retweet interesting stuff about #DistributedSystems and #compsci. Suggest links/papers/conversations via DM! Tag me for retweets. Run by @fponzi.me
https://distsys.fponzi.me/
Reproducing the AWS Outage Race Condition with a Model Checker
wyounas.github.io/aws/concurre...
We’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle bugs.
Reproducing the AWS Outage Race Condition with a Model Checker | Waqas Younas' blog
Welcome to Waqas' blog
wyounas.github.io
November 10, 2025 at 12:02 PM
TLA+ Modeling of AWS outage DNS race condition
muratbuffalo.blogspot.com/2025/11/tla-...
AWS’s N. Virginia region suffered a DynamoDB outage triggered by a DNS automation defect.This post focuses narrowly on the race condition at the core of the bug, which is best understood through TLA+ modeling
TLA+ Modeling of AWS outage DNS race condition
On Oct 19–20, 2025, AWS’s N. Virginia region suffered a major DynamoDB outage triggered by a DNS automation defect that broke endpoint resol...
muratbuffalo.blogspot.com
November 6, 2025 at 12:01 PM
TernFS — an exabyte scale, multi-region distributed filesystem
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
November 3, 2025 at 12:01 PM
Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
Each component follows the Unix mantra—do one thing, and do it well—but working together they are able to offer all the features users expect from a database.
Just make it scale: An Aurora DSQL story
AWS Senior Principal Engineers, Niko Matsakis and Marc Bowes, take us inside Aurora DSQL's development: scaling write operations without two-phase commit, overcoming garbage collection hurdles, and…
www.allthingsdistributed.com
October 27, 2025 at 12:04 PM
Aurora DSQL: How authentication and authorization works
marc-bowes.com/dsql-auth.html
How connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL documentation.
Aurora DSQL: How authentication and authorization works
In this article, I’m going to explain how connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL…
marc-bowes.com
October 20, 2025 at 11:02 AM
Dynamo, DynamoDB, and Aurora DSQL
brooker.co.za/blog/2025/08...
People often ask me about the architectural relationship between Amazon Dynamo, Amazon DynamoDB and Aurora DSQL. I’ll start off on comparing how the systems achieve a few key properties.
Dynamo, DynamoDB, and Aurora DSQL - Marc's Blog
Names are hard, ok?
brooker.co.za
October 14, 2025 at 6:52 PM
Linearizability testing S2 with deterministic simulation
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
September 30, 2025 at 11:00 AM
How I solved a distributed queue problem after 15 years
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
September 22, 2025 at 6:27 PM
Understanding Paxos the intuitive way
relentless-leader.com/dive-deep-in...
August 9, 2025 at 2:07 PM
Murat Demirbas and Aleksey Charapko read and discuss the HotOS paper""Real Life Is Uncertain. Consensus Should Be Too!"
July 31, 2025 at 9:37 PM
Learning about distributed systems: where to start?
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
May 30, 2025 at 11:00 AM
FLP Result: Impossibility of Distributed Consensus with One Faulty Process (1985)
groups.csail.mit.edu/tds/papers/L...
groups.csail.mit.edu
May 29, 2025 at 11:00 AM
Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
May 28, 2025 at 11:01 AM
Reasoning about Distributed Protocols with Smart Casual Verification
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
May 27, 2025 at 11:00 AM
Apache Iceberg Internals Dive Deep On Performance
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
May 15, 2025 at 11:01 AM
Concurrency bugs in Lucene: How to fix optimistic concurrency failures
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
May 12, 2025 at 11:04 AM
Erlang’s not about lightweight processes and message passing…
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
May 9, 2025 at 11:01 AM
So, You Want to Learn More About Deterministic Simulation Testing?
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
May 8, 2025 at 11:01 AM
May thy bits chip and shatter: Patterns for Building High-Performance Observability Pipelines at Scale
sumercip.com/posts/patter...
May 7, 2025 at 11:02 AM
Parallel, Concurrent and Distributed Programming
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
May 6, 2025 at 11:02 AM
Systems Correctness Practices at AWS: Leveraging Formal and Semi-formal Methods
dl.acm.org/doi/10.1145/...
May 5, 2025 at 11:03 AM
Distributed consensus
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
April 28, 2025 at 11:02 AM
Why is the raft consensus algorithm called "raft"?
groups.google.com/g/raft-dev/c...
April 25, 2025 at 11:01 AM
Three Clocks are Better than One
tigerbeetle.com/blog/2021-08...
CLOCK_MONOTONIC_RAW, CLOCK_MONOTONIC and CLOCK_BOOTTIME, all monotonic clock stopwatches provided by the Linux kernel through the clock_gettime(2) syscall to measure elapsed time
Three Clocks are Better than One
Insights, updates, and technical deep dives on building a high-performance financial transactions database.
tigerbeetle.com
April 22, 2025 at 11:00 AM
Building a modern Durable Execution Engine from First Principles
restate.dev/blog/buildin...
We built a precursor and from all the lessons learned there, we arrived at a design with a self-contained complete stack, centered around a command log and event-processor, shipping as a single Rust binary
April 21, 2025 at 5:10 PM