jonathanesperanza.com
declarative infra to meet needs of an individual use case, in this case a feature.
declarative infra to meet needs of an individual use case, in this case a feature.
- groupBy -- aggregation compute
- join -- join compute
- stagingQuery -- arbitrary compute as spark SQL
time based aggregation and windowing are first-class concepts, along with SQL primitives.
- groupBy -- aggregation compute
- join -- join compute
- stagingQuery -- arbitrary compute as spark SQL
time based aggregation and windowing are first-class concepts, along with SQL primitives.
seems Chronon orchestrates pipelines for declared features
infra: kafka, spark, hive, airflow, and a customizable key-value store
seems Chronon orchestrates pipelines for declared features
infra: kafka, spark, hive, airflow, and a customizable key-value store
- frontend events are crucial for real-time activity needs, when combined with backend data it leads to complete intelligent decision making
- in-memory processing minimizes cost and latency
- frontend events are crucial for real-time activity needs, when combined with backend data it leads to complete intelligent decision making
- in-memory processing minimizes cost and latency
- use streaming process with in-memory state ➡️ low latency and no storage costs
- manage Flink state in RocksDB state backend
- use Flink high availability mode with persisted checkpoints in S3
- use streaming process with in-memory state ➡️ low latency and no storage costs
- manage Flink state in RocksDB state backend
- use Flink high availability mode with persisted checkpoints in S3
- batch process for sessionization not feasible due to cost and latency
- Flink needs to keep an hours worth of user activities for each user as its state, hundreds of GB
- job failure and recovery of Flink
- batch process for sessionization not feasible due to cost and latency
- Flink needs to keep an hours worth of user activities for each user as its state, hundreds of GB
- job failure and recovery of Flink
- ack quorum (AQ): if minimum number of bookies ack an entry, it has been fully replicated
- write quorum (WQ): data written sequentially to a ledger ➡️ each entry is distributed among a subset of bookies
- guarantees once entry meets AQ ➡️ replicated to bookies via WQ
- ack quorum (AQ): if minimum number of bookies ack an entry, it has been fully replicated
- write quorum (WQ): data written sequentially to a ledger ➡️ each entry is distributed among a subset of bookies
- guarantees once entry meets AQ ➡️ replicated to bookies via WQ
- latency delays
- concurrency issues
- synchronization challenges
the author points out that Apache Bookkeeper is able to solve this, so i'll take a look at that
- latency delays
- concurrency issues
- synchronization challenges
the author points out that Apache Bookkeeper is able to solve this, so i'll take a look at that
- atproto.com/guides/data-...
- atproto.com/specs/lexicon
- atproto.com/specs/crypto...
- atproto.com/guides/data-...
- atproto.com/specs/lexicon
- atproto.com/specs/crypto...
authentic decentralization of data for real-world applications that must scale
authentic decentralization of data for real-world applications that must scale
- unified data model and personal data server allow me to plug my own "trusted agent" into the bluesky network (self-hosting)
- Lexicons are the building blocks of atproto as they drive APIs and schemas
- cryptography is used to sign commits to data repos
- unified data model and personal data server allow me to plug my own "trusted agent" into the bluesky network (self-hosting)
- Lexicons are the building blocks of atproto as they drive APIs and schemas
- cryptography is used to sign commits to data repos