Lightnews — Scholar-powered news

Konstantin Tarkus

@koistya.com

Building the rails that AI workflows run on. Principal Engineer architecting the foundation for safe & scalable intelligence. ☕️ // Code #DevOps #Startups #AI

Posts Replies Media Videos

Konstantin Tarkus

@koistya.com

github.com/kriasoft/ws-...

November 5, 2025 at 8:59 AM

Konstantin Tarkus

@koistya.com

9/9 Need the full checklist, recovery-time calculator, and TypeScript snippets for per-user, per-route, and cost-based buckets? I broke it all down here →

levelup.gitconnected.com/designing-fa...

github.com/kriasoft/ws-...

Medium

levelup.gitconnected.com

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

8/9 Before rollout: 10× baseline load, jitter burst test, abuse flood, mixed workloads. After launch: watch rate-limit hit %, histogram tokens-at-reject, and keep limiter latency <1 ms or it becomes the bottleneck.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

7/9 Cost-based limiting is underrated. Set text = 1 token, file upload = 20, admin command = 10. One bucket per user, different prices. Suddenly your limiter tracks backend spend instead of treating every packet as equal.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

6/9 Layer limits like a pyramid: Global → per-IP/device → per-user-per-route → per-operation cost. Global caps stop botnets, lower tiers keep heavy routes or chatty users from starving everyone else.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

5/9 Three failure modes I still see weekly:

• Capacity ≫ rate → giant attack window
• Only per-user buckets → aggregate floods melt the DB
• Zero jitter headroom → works on ethernet, fails on 4G

All three come from sizing purely in the lab.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

4/9 Starting points that work well: Chat 100–200 capacity @ 1–2 tokens/sec (lets users paste history; yes, recovery is ~50–200 s but bursts are rare). 30 Hz games 10–15 capacity @ 35–40 tokens/sec (≈30 updates/sec plus jitter). Streaming when tokens track KB: capacity ≈ bitrate × buffer seconds.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

3/9 Shortcut: recovery_time = capacity / refill_rate. Want a user to bounce back from empty in 3 s? At 10 tokens/sec you need ~30 capacity; at 40 tokens/sec you need ~120. Pick the recovery window first, then solve for both numbers.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

2/9 Two knobs drive everything: capacity = burst tolerance, refill rate = sustained throughput (tokens/sec). Treat them as a pair. Set them independently and you either throttle legitimate spikes or hand attackers a long grace period.

November 3, 2025 at 5:59 PM

Konstantin Tarkus

@koistya.com

TL;DR

- "A distributed lock ≠ a mutex"
- "Locks are leases"
- "Leases expire"
- "Fencing tokens make expiry safe"

🧩 Full explanation + examples (Redis, Postgres, Firestore):

levelup.gitconnected.com/beyond-the-l...

Medium

levelup.gitconnected.com

October 18, 2025 at 2:38 PM

Konstantin Tarkus

@koistya.com

Even if a process wakes up from a 1-minute pause, it can’t corrupt data anymore — its stale token is rejected.

The system becomes safe by design, not by timing.

October 18, 2025 at 2:38 PM

Konstantin Tarkus

@koistya.com

The fix is adding a third party — the resource itself.

Every lock acquisition returns a fencing token, a monotonically increasing number.

Each write includes the token, and the resource rejects any write with an older token.

✅ Deterministic correctness.

October 18, 2025 at 2:38 PM

Konstantin Tarkus

@koistya.com

That’s not a bug in Redis.
Not even a bug in your code.

It’s a two-party problem:

- "Client thinks it holds the lock"
- "Lock manager knows it expired"

There’s no one to stop the stale client from writing bad data.

October 18, 2025 at 2:38 PM

Konstantin Tarkus

@koistya.com

While your process is frozen, the lock expires.
Another instance grabs the same lock and processes the payment again.

When your original process wakes up, it still thinks it holds the lock — and charges the customer twice.

October 18, 2025 at 2:38 PM

Konstantin Tarkus

@koistya.com

Imagine this:

Your service acquires a Redis lock with a 30-second TTL to process a payment.

Looks safe, right?

But then the JVM GC or network connection pauses for 35 seconds… 🧟

October 18, 2025 at 2:38 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news