Julien Hurault
hachej.bsky.social
Julien Hurault
@hachej.bsky.social
Freelance Data | Weekly Data Eng. Newsletter 📨 juhache.substack.com - 4k+ readers

New blog post: Building a 0$ Data Distribution System.

juhache.substack.com/p/0-data-dis...
0$ Data Distribution
Ju Data Engineering Weekly - Ep 78
juhache.substack.com
December 6, 2024 at 7:44 PM
Reposted by Julien Hurault
Some learnings after helping +50 companies in high performance data engineering projects

javisantana.com/2024/11/30/l...
javisantana.com
javisantana.com
November 30, 2024 at 12:17 PM
November 30, 2024 at 8:59 AM
1 Docker container embedding app code + SQLite DB → live chat app with 10k simultaneous users.

youtu.be/0rlATWBNvMw
DHH discusses SQLite (and Stoicism)
YouTube video by Aaron Francis
youtu.be
November 29, 2024 at 3:25 PM
Reposted by Julien Hurault
Yup, there are almost fifteen million SQLite databases on Bluesky’s PDS servers. It’s wildly efficient and simple but not without trade offs of course.

Makes sense for this use case in large part because each users atproto repository is self contained, with links to other repos, like a website.
November 11, 2024 at 6:51 AM
Reposted by Julien Hurault
Here's what I put in my ~/.zshrc file to make sure my virtualenv autoactivates when I move to a directory with a .venv file. Works well for me so far. Do the rest of you do something like this?

#Python #DataBS
November 27, 2024 at 9:29 PM
Prediction: poeple will monetize custom feeds
github.com/bluesky-soci...
November 28, 2024 at 6:53 PM
Reposted by Julien Hurault
Something interesting is brewing in Iceberg-on-S3 land. 👀

lists.apache.org/thread/v7x65...

cc @eatonphil.bsky.social
lists.apache.org
November 26, 2024 at 7:26 PM
Building a data pipeline =
50% bringing data from A to B at time t
50% making pipeline fixing easy at time t+1
November 25, 2024 at 1:22 PM
New blog post: GCP & Iceberg

open.substack.com/pub/juhache/...
November 24, 2024 at 6:21 PM
Reposted by Julien Hurault
Now more than ever
November 24, 2024 at 3:42 AM
Is there a starter pack for data engineering here on bluesky?
November 24, 2024 at 8:49 AM
2010 — 2017:
ML = pip install scikit-learn

2017 — 2023:
ML = pip install torch

2023 — :
ML = pip install requests
November 23, 2024 at 12:26 PM
FINALLY! 🎉
aws.amazon.com/blogs/comput...

No more endless searching for support with intrinsic
docs.aws.amazon.com/step-functio...
LinkedIn
This link will take you to a page that’s not on LinkedIn
lnkd.in
November 22, 2024 at 9:24 PM
What about dumping Snowflake Marketplace free listings to R2 ?
@jakthom.bsky.social @ssp.sh @tobilg.com
November 19, 2024 at 10:14 AM
The Snowflake unbundling continues:
• Snowflake Storage → Iceberg
• Snowflake Marketplace → Cloudflare R2 + DuckDB :)
ipinfo datasets are now available (for free!) in hive.buz.dev!

From a @duckdb.org , running:

attach 'https://hive.buz.dev/ipinfo/catalog' as ipinfo;

will load the following tables:

- asn
- country
- country_asn

ip-enrich to your ❤️'s content...
November 19, 2024 at 8:17 AM