Small workloads? Use @DuckDb or @Polars for faster in-memory performance.
Massive datasets? MPP systems like @Spark or @Snowflake scale dynamically.
Experiment: @DuckDB outperformed Spark at <100GB.
💡 Don't drive groceries shopping with a tank!
Small workloads? Use @DuckDb or @Polars for faster in-memory performance.
Massive datasets? MPP systems like @Spark or @Snowflake scale dynamically.
Experiment: @DuckDB outperformed Spark at <100GB.
💡 Don't drive groceries shopping with a tank!
Object storage like S3 has become databases — scalable & efficient for transactional & analytical workloads.
Emerging programming models:
1️⃣ Distributed DBs: On files
2️⃣ Serverless: Focus on code
3️⃣ Wasm: Portable execution
Challenge: "one-way-door” innovation
Object storage like S3 has become databases — scalable & efficient for transactional & analytical workloads.
Emerging programming models:
1️⃣ Distributed DBs: On files
2️⃣ Serverless: Focus on code
3️⃣ Wasm: Portable execution
Challenge: "one-way-door” innovation
Modern data is evolving:
→ Iceberg now leads open table formats (Snowflake & Databricks adoption confirms it).
→ Cloud-native storage is a must (legacy systems won’t keep up).
→ AI thrives on scalable, open architectures.
More innovation. Less vendor lock-in.
Ready to shift?
Modern data is evolving:
→ Iceberg now leads open table formats (Snowflake & Databricks adoption confirms it).
→ Cloud-native storage is a must (legacy systems won’t keep up).
→ AI thrives on scalable, open architectures.
More innovation. Less vendor lock-in.
Ready to shift?
→ $300K/year on Snowflake, and 90% is spent on queries.
→ Most queries are tiny (median: 100MB, 99.9% <300GB).
→ Most workloads = ingestion + transformation (not analytics).
💡 Small Data > Massive Complexity.
Are we overpaying for simplicity?
→ $300K/year on Snowflake, and 90% is spent on queries.
→ Most queries are tiny (median: 100MB, 99.9% <300GB).
→ Most workloads = ingestion + transformation (not analytics).
💡 Small Data > Massive Complexity.
Are we overpaying for simplicity?
More Data ≠ Better Results.
→ Recent data is the most valuable.
→ Smaller AI models deliver bigger impact.
→ Local-first development works.
Stop relying on distributed complexity when single machines get the job done.
The #SmallData Movement is here. Are you in?
More Data ≠ Better Results.
→ Recent data is the most valuable.
→ Smaller AI models deliver bigger impact.
→ Local-first development works.
Stop relying on distributed complexity when single machines get the job done.
The #SmallData Movement is here. Are you in?
Most enterprises have <100GB in active data but overpay for tools designed for massive scale (#Snowflake, #Databricks, etc.).
Focus on #SmallData:
→ Easier to analyze
→ Cheaper to manage
→ Faster insights
Time to rethink your data strategy. #SmallData
Most enterprises have <100GB in active data but overpay for tools designed for massive scale (#Snowflake, #Databricks, etc.).
Focus on #SmallData:
→ Easier to analyze
→ Cheaper to manage
→ Faster insights
Time to rethink your data strategy. #SmallData