Diana Darie
banner
blog.theengineeringcompass.com
Diana Darie
@blog.theengineeringcompass.com
Gophering @JET👩🏻‍💻 | Scaling Distributed Systems | Running in between 🏃🏻‍♀️

Writing at 👉 https://blog.theengineeringcompass.com/
Running at 👉 https://www.strava.com/athletes/38921278
(10/10) Next step for us, try using the lambda approach instead. Spend more time from poc to mvp and prove the solution is feasible for prod load.
December 18, 2024 at 3:02 PM
(9/10) So you take the wrong decision. What next? Do you live with the consequences and live with a not so perfect solution or do you start from scratch and take another approach?
December 18, 2024 at 3:02 PM
(8/10) According to the docs, the shard iterator should be sequential. In practise, we never managed to prove this, as we never got back an iterator we've seen before. Testing with production-scale data could reveal architectural flaws that aren't visible in small-scale pocs.
December 18, 2024 at 3:02 PM
(7/10) It seems in practise open DynamoDB Streams can spawn thousands of empty shards for no apparent reason, making what should be simple stream processing surprisingly complex. We run this on a prod table with a relatively fair load. We realised we ended up processing thousands of empty shards.
December 18, 2024 at 3:02 PM
(6/10) There are multiple approaches to reading from streams. Preferred ones are via #lambdas or the #kinesis adaptor. For various reasons this wouldn't work for us. We are left with the manual approach of polling the streams ourselves. According to the docs this should have been pretty easy to do.
December 18, 2024 at 3:02 PM
(5/10) Any #dynamodb table with dynamodb streams enabled can have one dynamodb stream active. At any point in time, we can only have one active stream. A #stream can have multiple shards. A shard can contain zero or more records. A record is the change that we want to be captured.
December 18, 2024 at 3:01 PM
(4/10) #DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.
December 18, 2024 at 3:01 PM
(3/10) The above is not ideal, too many extra resources and extra complexity. Intriguing alternative would be a CDC approach. Which is what we chose and relied on #DynamoDBStreams for this.
December 18, 2024 at 3:01 PM
(2/10) So this is more on lessons learnt when pocs and balancing trade offs aren't enough. Our only requirement: almost real time audit logs on all dynamodb tables. Possible approach - record any db change via events and the transactional outbox pattern.
December 18, 2024 at 3:01 PM
The good news and the key difference is in the architecture. The infrastructure is built to allow users to run their own services, migrate their data, and participate in the network without relying on Bluesky's systems. The foundation is built for users and developers to break free when ready. 6/6
December 13, 2024 at 3:44 PM
While Bluesky's design allows for decentralisation, today it runs more like Twitter: most users are on Bluesky's servers. Running a full node requires 16TB of fast storage - showing the practical challenges of true decentralisation. 5/6
December 13, 2024 at 3:43 PM
Your Bluesky identity combines an easy-to-read username with a permanent cryptographic ID. Think of it like having both a memorable email address and a secure passport number - you can change the first while keeping the second. 4/6
December 13, 2024 at 3:43 PM
The platform allows developers to create their own interfaces and algorithms since data is separated from presentation. This means you could see the same content through different apps, each with its own unique take. 3/6
December 13, 2024 at 3:43 PM