Lightnews — Scholar-powered news

Bijil Subhash

@bijilsubhash.bsky.social

Databricks vs Fabric feels a lot like Pied Piper vs Nucleus. Fans of the Silicon Valley show will get the reference :)

#databs #databricks #fabric

March 8, 2025 at 10:03 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/3) Among programming languages, I consider #Python to be a relatively easy to learn language, opening doors for many to start coding without formal training. However this also results in some poorly written, unmaintainable, and non-extensible code; the infamous spaghetti code.

February 18, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

Just finished watching the webinar on introducing SDF by dbt team. After seeing SDF in action, I have to admit that I am really looking forward to the future of dbt engine. I was wondering when dbt was going to bring in notable changes to the developer experience and this might be it.

#databs

February 1, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

I was speaking with someone who went all in on promoting duckdb to their clients. I did not get a chance to ask what exactly are they doing with duckdb. But I am curious to understand how duckdb is utilised in modern data pipelines.

January 21, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/4) SDF acquisition by dbt

If you work in data, you probably would have come across a version of this headline this past week. A small disclaimer, I have not used SDF and neither do I have solid understanding of the tech that sits behind it, so take what I say with a grain of salt.

January 18, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/2) Maybe an unpopular opinion, SQL is a powerful language and despite what anyone says, it is unlikely to be replaced by an LLM, at least not with the models we have today. LLMs are powerful and can be leveraged to generate ideas or as a tool to unblock when you are stuck.

#databs

January 16, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/3) Continuing from my previous thread on infrastructure as code for managing #Databricks. I have recently had the pleasure to work with an open source tool called Laktory, which is an abstraction that sits on top of Terraform/Pulumi to manage your Databricks workflow using YAML.

#databs

January 14, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

A default approach that I take when it comes to data modelling. It works because OBT is optimized for the modern vectorized data warehouses. At the same time, the underlying data is modelled using established best practices from Kimball.

Christian Minich @christiannolan.bsky.social · Jan 11

Really like the approach. OBT as a set of “views” (whether materialized as tables or not) that give business users / BI tools access with fewer joins and db understanding required, but developers are able to think in terms of kimbal facts and dims

#dataBS

www.brooklyndata.co/ideas/2025/0...

Our Hybrid Kimball & OBT Data Modeling Approach

We use a hybrid of Kimball’s Dimensional Modeling and One Big Table (OBT) to model our clients’ data — learn about both methods and why we combine them.

www.brooklyndata.co

January 11, 2025 at 8:38 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/2) Infrastructure as code (IaC) is ubiquitous in the data space. That being said, I have stayed away from doing any IaC work for as long as I can remember, mainly due to its aura of being difficult and also because I could pass the ball to the platform team.

#datasky #databs

January 10, 2025 at 1:44 AM

Bijil Subhash

@bijilsubhash.bsky.social

(1/4) What do you use for #data ingestion?

Its true that there are no shortage of tools when it comes to data ingestion. But before you open the wallet to one of the many options out there, it might be worth doing a thorough due diligence based on your current and future needs.

#databs

January 7, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

I have been binging on the early chapters of the new book from @joereis.bsky.social on data modeling. I haven't consumed a lot of material on this topic besides Kimball but this one is a must read if you work with data in a modern context. Looking forward to the official release in 2025!

#databs

January 4, 2025 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/5) dbt core vs dbt cloud

dbt has been a game changer to many #data teams, mainly for writing reusable and version controlled transformation logic. We are also witnessing an explosion of tools that wants to become the next #dbt. What is better, dbt core or dbt cloud?

#databs

January 2, 2025 at 4:00 PM

Reposted by Bijil Subhash

Andy Pavlo

@andypavlo.bsky.social

Buckle up because we're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowback, Databricks vs. Snowflake gangwar, @duckdb.org's shotgun weddings, and buying a quarterback to impress your lover: www.cs.cmu.edu/~pavlo/blog/...

Databases in 2024: A Year in Review

Andy rises from the ashes of his dead startup and discusses what happened in 2024 in the database game.

www.cs.cmu.edu

January 1, 2025 at 2:02 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/5) What is Unity Catalog (UC) in the context of #databricks?

The standard definition that you get is that it is a unified governance solution built into Databricks. It is accurate but that was not intuitive to me when I started building on UC. See 🧵 for some additional context on UC.

#databs

December 31, 2024 at 4:00 PM

Bijil Subhash

@bijilsubhash.bsky.social

(1/2) Autoloader is without doubt one of my favourite feature in Databricks!

In a nutshell, it is an abstraction that simplifies the incremental ingestion of data by monitoring the files that arrive in the cloud storage, supporting reliable and resilient data pipelines cost efficiently.

#databs

December 28, 2024 at 8:31 PM

Bijil Subhash

@bijilsubhash.bsky.social

If you are an aspiring data engineer, do yourself a favour by learning the basics of data modelling amongst other fundamentals (Python and SQL) before jumping into whatever tool is on the headlines.

December 27, 2024 at 9:35 PM

Bijil Subhash

@bijilsubhash.bsky.social

Great write up on LLM frameworks.

www.anthropic.com/research/bui...

Without attaching a name to it, I have tested all except the agent workflow in 2024.

Currently running an orchestrator-worker, prompt chaining, and routing workflows across a handful of projects in production.

Building effective agents

A post for developers with advice and workflows for building effective AI agents

www.anthropic.com

December 22, 2024 at 8:59 PM

Bijil Subhash

@bijilsubhash.bsky.social

I am sure some of us can relate to this.

#dataengineer #data #dataarchitecture

December 4, 2024 at 2:26 AM

Bijil Subhash

@bijilsubhash.bsky.social

Five ways to copy a dictionary in Python:

- Unpacking an iterable*
- Use copy method*
- Using dict constructor*
- Dictionary comprehension*
- Using deepcopy (from copy)

First 3 are has similar performance, followed by dictionary comprehension, and finally deep copy.

*shallow copy

December 2, 2024 at 8:28 AM

Bijil Subhash

@bijilsubhash.bsky.social

Do we have any data engineers in the #buildinpublic community? If so, what are you building?

December 1, 2024 at 9:07 PM

Bijil Subhash

@bijilsubhash.bsky.social

What is your go to analogy for explaining what a data engineer does?

Check the🧵for the one that I use time to time.

December 1, 2024 at 7:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news