kafka-by-the-sea.bsky.social
@kafka-by-the-sea.bsky.social
Delta rs 1.0 has broken all of the polars, dask, and duckdb wrappers ooppssss
May 29, 2025 at 11:48 AM
I just realised I'm supposed to start intern candidate interviews tomorrow...I haven't made any list of questions...oh no
May 28, 2025 at 11:03 PM
Our celery tasks have a 5 minute time limit on them and my z order operation keeps failing once the table grows a week old. This is gonna be a fun argument with the dev team 🫠
May 28, 2025 at 12:12 AM
#databs anyone who's used delta_scan with duckdb? It seems like partition pruning doesn't take place at all. It makes HEAD requests to all parquet files in the S3 bucket regardless of the filter condition.

Any alternatives that read from deltalake without this headache? Pref not Spark
February 27, 2025 at 9:04 AM
Hello #databs. My data science team wants to query some IoT data to train their models, but they're finding reading in from long format and pivoting to be slow and expensive. Is it common / best practice to store this data in wide format (re: some 7-10k columns)?
February 20, 2025 at 8:51 PM
Caveat: ever since I got jacked and lean the amount of attention I've gotten has increased tenfold. Turns out looking good is helpful lol
I really wish I could impress on all these young guys who are being told by creeps and con artists that women only want a jacked guy with money how much women like a guy who can cook and/or make them laugh.
January 5, 2025 at 5:58 PM
I think I've managed to destroy my romance-related dopamine receptors. I only have interest in someone until i know they're interested in me and then I completely lose all desire to go further. It's like a messed up game in my head that I'd ideally like to fix
December 19, 2024 at 12:44 AM
If no one else got me, I know psql -d COPY got me
December 13, 2024 at 7:24 AM
Told there's a demo and we need a new instance of timescale to have a bunch of (~1 billion rows) data ready in three hours. Data needs to be extracted, transformed, and then uploaded back to new instance.

Pg_dump errors out, copy to parquet doesn't work, duckdb doesn't work.

I've had a bad day 🫠
December 11, 2024 at 11:17 PM
Hello duckdb folks @duckdb.org is the postgres adapter broken? I'm running a super simple SELECT * WHERE timestamp > static val query over a cloud instance and it's fully stuck.

It seems like despite the query, the wrapper tries to fetch all the data available instead and gets stuck?
December 11, 2024 at 11:12 PM
I just need ONE tool that
1) does lazy loading / eval
2) has an easy to use dataframe API
3) has a straightforward sql read/write. I should need to only pass the query, the connection string, and maybe params and it'll handle the rest.
4) not go OOM

Today's been a ridiculous day 🫠
December 11, 2024 at 11:10 PM
This client's given us 100gigs of raw CSVs that I now need to get into a warehouse...ran my usual ingestion script and it's taking so long my temporary AWS creds are expiring 🫠
December 10, 2024 at 12:22 PM
Reposted
Day 4! (Its not too late to catch up!) youtu.be/Jaann2QiGwQ
30 Days of Orchestration - Day 4
YouTube video by Sean Lopp @ Dagster Labs
youtu.be
December 2, 2024 at 4:57 PM
We've got _two_ devops guys who are entirely perpetually busy and now I need to twiddle my thumbs waiting around just to create a glue catalogue in dev. End me god
November 18, 2024 at 10:14 AM
Someone make it so duckdb autodetects creds instead of having to create a secret everytime pls
November 17, 2024 at 11:07 AM
Just started my first-ever proper bulk after cutting down 18 kilos and getting to roughly ~12-13% bf (visible abs, lots of veins etc) and I have MISSED feeling full all the time SO much. Just had the most fun push day in months.
November 17, 2024 at 10:54 AM
Okay I think I wanna try and make a cool fully-AWS and then an open-source version of the same end-to-end pipeline. Im thinking ingestion, orchestration, ML, and dashboarding...now the question is what topic do I actually pick for said project lol
November 15, 2024 at 11:53 AM
Trying to get deltalake to work with Dask and having to jump through three different slack channels..,,,I should've just used Spark
November 15, 2024 at 9:04 AM
The time column I need doesn't have an index on it, the time column that DOES have an index on it doesn't have offset and I'm assuming is in CST. The client's contractor who's set up their warehouse doesn't wanna put indexes on the column we actually need. A third column is in Unix timestamp. 🫠
November 12, 2024 at 11:12 PM
Ok I got my first release out w this and DAMN it works so seamlessly. Very little configuration required and so beautifully OOB.

I'm never writing a spark job again in my life.
I'm in-charge of designing the DE side of things at this startup I work at. I've decided the analytics side is gonna be dlthub for the ingestion, delta lake to store, and duckdb to query. Let's see how this goes lol.
November 12, 2024 at 9:58 AM
The last time I was seriously into ML was when the vision transformers paper had just dropped -- I used to specialise in computer vision. I'd like to pick up where I left off, any ideas where to begin? Specifically looking to learn all about LLMs lol #dataBS
November 11, 2024 at 9:49 AM
I need to learn how compliance and data contracts work better...so tempted to scrap S3 deltalake altogether and directly start loading data into @motherduckdb.bsky.social but I'm fairly certain that wouldn't be allowed by our clients 😭
November 7, 2024 at 11:52 AM
Ok now that this is a safe space -- it is genuinely hilarious how primarily the uneducated vote for Trump lol.

The guy goes and says he wants to dismantle the department of education and raise tariffs to 20% and people vote for him lmaooo.

Xi was right, democracy is a sham.
November 6, 2024 at 1:41 PM
Why is working w time-series data so hard. Doesn't require any modelling but there's just...so many problems trying to interpolate and store it nicely
November 5, 2024 at 2:59 PM
I wish I had a 15 year+ exp solutions architect bestie who would tell me I'm doing well
October 31, 2024 at 11:04 AM