Shiv Gupta
banner
sgupta.xyz
Shiv Gupta
@sgupta.xyz
Data Engineer @ Chainalysis. Views my own
📍Toronto
sgupta.xyz
Reposted by Shiv Gupta
Shivanshu Gupta is a Data Engineer at Chainalysis, where they run Dagster to orchestrate and maintain complex data pipelines.

🔍 In today's data landscape, engineers like Shivanshu need tools that reduce cognitive load while handling increasingly complex pipelines.
March 24, 2025 at 5:06 PM
Reposted by Shiv Gupta
S3 (Iceberg) Tables is everything I dreamt of, and more. I blogged some long-form thoughts: meltware.com/2024/12/04/s...

I think we're about to see an explosion of data tools (@materialize.com, @clickhouse.com, @duckdb.org, et al.) learn to write Iceberg tables via S3 table buckets.

#databs
A First Look at S3 (Iceberg) Tables
AWS announced S3 Tables today, which brings native support for Apache Iceberg to S3. It’s hard to overstate how exciting this is for the data analytics ecosystem. This post is a quick rundown of my th...
meltware.com
December 4, 2024 at 10:02 AM
Anyone here try to self-host Rill (@rilldata.com) or is that not a thing? $250/mo to deploy on Rill Cloud feels prohibitive for most hobby projects... #databs
November 29, 2024 at 6:23 PM
Reposted by Shiv Gupta
this is the only good WebApp out there, don't at me
November 23, 2024 at 2:20 AM
Reposted by Shiv Gupta
I've been seeing posts from people coming to #dataBS and getting overwhelmed and worrying they can't contribute.

I get it. I still feel that sometimes.

But I'm trying to use my impostor syndrome as a strength. It's permission to accept that we'll never know everything.

So join in the discussion!
November 20, 2024 at 9:45 AM
Anyone here manage to attach a Databricks-hosted Unity Catalog to DuckDB? github.com/duckdb/uc_ca...
GitHub - duckdb/uc_catalog: Proof-of-concept extension combining the delta extension with Unity Catalog
Proof-of-concept extension combining the delta extension with Unity Catalog - duckdb/uc_catalog
github.com
November 20, 2024 at 5:58 PM
Love that #DuckDB & #R2 are powering quick & easy access to open datasets. How long until we get the entire library of BQ public datasets but without the GCP fluff? #DataBS
abuse.ch malware feed is now available in the hive:

attach 'https://hive.buz.dev/abuse_ch/catalog' as abuse_ch;

select * from abuse_ch.malware limit 10;
November 20, 2024 at 4:09 AM
Custom Bluesky handle off my domain, neat
November 20, 2024 at 4:04 AM
Reposted by Shiv Gupta
I open my Bluesky feed and it is full of really cool people saying really smart things and being really passionate about really interesting stuff.

I like this version of social media, and the world.

Thanks, folks.
November 19, 2024 at 12:53 PM
I underestimated how far you can get with prompt engineering. Every time I thought I needed to bite the fine-tuning bullet, what I really needed was better prompts
November 20, 2024 at 3:28 AM
Going from CSV > #DuckDB > Delta Lake in S3 is a breeze. Only thing that's missing is write support to Delta Lake tables natively in SQL (vs having to go through Python). And DuckDB Google Sheets integration makes me want to use it for everything
November 20, 2024 at 3:27 AM
Starting to really love DuckDB
November 20, 2024 at 3:23 AM
October 30, 2024 at 3:17 AM