Distributed Systems
distribsystems.bsky.social
Distributed Systems
@distribsystems.bsky.social
I tweet/retweet interesting stuff about #DistributedSystems and #compsci. Suggest links/papers/conversations via DM! Tag me for retweets. Run by @fponzi.me
https://distsys.fponzi.me/
TernFS — an exabyte scale, multi-region distributed filesystem
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
November 3, 2025 at 12:01 PM
Linearizability testing S2 with deterministic simulation
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
September 30, 2025 at 11:00 AM
How I solved a distributed queue problem after 15 years
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
September 22, 2025 at 6:27 PM
Understanding Paxos the intuitive way
relentless-leader.com/dive-deep-in...
August 9, 2025 at 2:07 PM
Murat Demirbas and Aleksey Charapko read and discuss the HotOS paper""Real Life Is Uncertain. Consensus Should Be Too!"
July 31, 2025 at 9:37 PM
Learning about distributed systems: where to start?
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
May 30, 2025 at 11:00 AM
Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
May 28, 2025 at 11:01 AM
Reasoning about Distributed Protocols with Smart Casual Verification
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
May 27, 2025 at 11:00 AM
Apache Iceberg Internals Dive Deep On Performance
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
May 15, 2025 at 11:01 AM
Concurrency bugs in Lucene: How to fix optimistic concurrency failures
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
May 12, 2025 at 11:04 AM
Erlang’s not about lightweight processes and message passing…
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
May 9, 2025 at 11:01 AM
So, You Want to Learn More About Deterministic Simulation Testing?
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
May 8, 2025 at 11:01 AM
May thy bits chip and shatter: Patterns for Building High-Performance Observability Pipelines at Scale
sumercip.com/posts/patter...
May 7, 2025 at 11:02 AM
Parallel, Concurrent and Distributed Programming
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
May 6, 2025 at 11:02 AM
Systems Correctness Practices at AWS: Leveraging Formal and Semi-formal Methods
dl.acm.org/doi/10.1145/...
May 5, 2025 at 11:03 AM
Distributed consensus
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
April 28, 2025 at 11:02 AM
Why is the raft consensus algorithm called "raft"?
groups.google.com/g/raft-dev/c...
April 25, 2025 at 11:01 AM
Building a modern Durable Execution Engine from First Principles
restate.dev/blog/buildin...
We built a precursor and from all the lessons learned there, we arrived at a design with a self-contained complete stack, centered around a command log and event-processor, shipping as a single Rust binary
April 21, 2025 at 5:10 PM
Decomposing Transactional Systems
transactional.blog/blog/2025-de...
Every transactional system does four things: execute, orders, validate and persists transactions.
All four of these things must be done before the system may acknowledge a transaction’s result to a client.
April 18, 2025 at 11:00 AM
How crawlers impact the operations of the Wikimedia projects
diff.wikimedia.org/2025/04/01/h...
Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grown.
April 15, 2025 at 11:02 AM
Memcached: VerifyThis Long-term Challenge
verifythis.github.io/ltc/03memcac...
VerifyThis Long-Term Challenge aims at proving that deductive program verification can produce relevant results for real systems with acceptable effort on a large scale in a collaborative manner.
April 10, 2025 at 11:02 AM
Testing Distributed Systems
asatarin.github.io/testing-dist...
April 1, 2025 at 6:35 PM
How concurrency works: A visual guide
wyounas.github.io/concurrency/...
Concurrent programming is hard.
March 24, 2025 at 10:02 PM
ChoRus is a library that enables Choreographic Programming in Rust.
lsd-ucsc.github.io/ChoRus/
Choreographic Programming is a programming paradigm that allows programmers to write "choreographies" that describe the desired behavior of a system as a whole.
March 18, 2025 at 12:00 PM
Antithesis of a One-in-a-Million Bug: Taming Demonic Nondeterminism
www.cockroachlabs.com/blog/demonic...
March 17, 2025 at 12:00 PM