Abhay Bothra
swe.dev
Abhay Bothra
@swe.dev
Co-founder/CTO @fennel.ai / Databases #DataBS / Distributed Systems / Infrastructure. @bothra90 on Twitter.
King to c7?
December 20, 2024 at 4:22 PM
Caveat: Some of these could be unique to Fennel’s architecture because of our reliance on Kafka for exactly-once semantics and recovery
December 6, 2024 at 5:43 AM
Why use large batches at all? To amortize the cost of Kafka transactions, which we rely on for exactly-once semantics.
December 6, 2024 at 5:43 AM
The latter also keeps memory utilization proportional to mini-batch size.
December 6, 2024 at 5:43 AM
We got around that by internally sharding each batch of records and processing sub-shards in parallel.
We also break down our batches into mini-batches so output of the chain can be streamed to Kafka without waiting for the full batch execution to finish.
December 6, 2024 at 5:43 AM
Cons: This architecture prevents concurrent/fully async operation of all operators since now each batch has to be processed in full by the operator chain before moving to the next batch, which was in turn preventing us from running full throttle even when CPU capacity was available.
December 6, 2024 at 5:43 AM
In hindsight, what would the right API for this look like?
November 27, 2024 at 8:29 PM
Yes, I think they do this so that the ‘a’ region doesn’t become a hotspot. Was definitely surprising when I found out, but ultimately made sense.
November 27, 2024 at 7:44 PM
it occupies a very interesting point in the design space of caches, but the fact that you can’t immediately read your writes can be a problem that you still need to design for. I wonder if that is its undoing.
@jonhoo.eu might have more thoughts on this.
November 20, 2024 at 4:51 PM
That was their implementation of Noria?
November 20, 2024 at 8:10 AM
We’ve built an IVM engine at Fennel that allows python UDFs by leveraging a fleet of python workers for execution while keeping the other operators in Rust. Hope to write a lot more about the technical details soon. One problem that we’ve had to solve is to provide IVM with time travel.
November 20, 2024 at 7:59 AM