Joachim Rosskopf
jrosskopf.bsky.social
Joachim Rosskopf
@jrosskopf.bsky.social
MPP vs. Single-Node Engines

Small workloads? Use @DuckDb or @Polars for faster in-memory performance.
Massive datasets? MPP systems like @Spark or @Snowflake scale dynamically.

Experiment: @DuckDB outperformed Spark at <100GB.

💡 Don't drive groceries shopping with a tank!
December 10, 2024 at 10:51 AM
The Future of Distributed Systems

Object storage like S3 has become databases — scalable & efficient for transactional & analytical workloads.

Emerging programming models:
1️⃣ Distributed DBs: On files
2️⃣ Serverless: Focus on code
3️⃣ Wasm: Portable execution

Challenge: "one-way-door” innovation
December 8, 2024 at 10:51 AM
The Iceberg Effect

Modern data is evolving:
→ Iceberg now leads open table formats (Snowflake & Databricks adoption confirms it).
→ Cloud-native storage is a must (legacy systems won’t keep up).
→ AI thrives on scalable, open architectures.

More innovation. Less vendor lock-in.
Ready to shift?
December 6, 2024 at 3:14 PM
What Do Data Warehouses Really Do?

→ $300K/year on Snowflake, and 90% is spent on queries.
→ Most queries are tiny (median: 100MB, 99.9% <300GB).
→ Most workloads = ingestion + transformation (not analytics).

💡 Small Data > Massive Complexity.
Are we overpaying for simplicity?
December 4, 2024 at 10:51 AM
Think Small. Make Big Impact.

More Data ≠ Better Results.
→ Recent data is the most valuable.
→ Smaller AI models deliver bigger impact.
→ Local-first development works.

Stop relying on distributed complexity when single machines get the job done.

The #SmallData Movement is here. Are you in?
December 2, 2024 at 10:51 AM
BigData isn’t the problem—it never was.

Most enterprises have <100GB in active data but overpay for tools designed for massive scale (#Snowflake, #Databricks, etc.).

Focus on #SmallData:
→ Easier to analyze
→ Cheaper to manage
→ Faster insights

Time to rethink your data strategy. #SmallData
November 30, 2024 at 3:44 PM