Bijil Subhash
banner
bijilsubhash.bsky.social
Bijil Subhash
@bijilsubhash.bsky.social
Data Engineer, Recovering Academic, and Entrepreneur | bijilsubhash.io | Sydney, Australia
Databricks vs Fabric feels a lot like Pied Piper vs Nucleus. Fans of the Silicon Valley show will get the reference :)

#databs #databricks #fabric
March 8, 2025 at 10:03 PM
(1/3) Among programming languages, I consider #Python to be a relatively easy to learn language, opening doors for many to start coding without formal training. However this also results in some poorly written, unmaintainable, and non-extensible code; the infamous spaghetti code.
February 18, 2025 at 4:00 PM
Just finished watching the webinar on introducing SDF by dbt team. After seeing SDF in action, I have to admit that I am really looking forward to the future of dbt engine. I was wondering when dbt was going to bring in notable changes to the developer experience and this might be it.

#databs
February 1, 2025 at 4:00 PM
I was speaking with someone who went all in on promoting duckdb to their clients. I did not get a chance to ask what exactly are they doing with duckdb. But I am curious to understand how duckdb is utilised in modern data pipelines.
January 21, 2025 at 4:00 PM
(1/4) SDF acquisition by dbt

If you work in data, you probably would have come across a version of this headline this past week. A small disclaimer, I have not used SDF and neither do I have solid understanding of the tech that sits behind it, so take what I say with a grain of salt.
January 18, 2025 at 4:00 PM
(1/2) Maybe an unpopular opinion, SQL is a powerful language and despite what anyone says, it is unlikely to be replaced by an LLM, at least not with the models we have today. LLMs are powerful and can be leveraged to generate ideas or as a tool to unblock when you are stuck.

#databs
January 16, 2025 at 4:00 PM
(1/3) Continuing from my previous thread on infrastructure as code for managing #Databricks. I have recently had the pleasure to work with an open source tool called Laktory, which is an abstraction that sits on top of Terraform/Pulumi to manage your Databricks workflow using YAML.

#databs
January 14, 2025 at 4:00 PM
A default approach that I take when it comes to data modelling. It works because OBT is optimized for the modern vectorized data warehouses. At the same time, the underlying data is modelled using established best practices from Kimball.
Really like the approach. OBT as a set of “views” (whether materialized as tables or not) that give business users / BI tools access with fewer joins and db understanding required, but developers are able to think in terms of kimbal facts and dims

#dataBS

www.brooklyndata.co/ideas/2025/0...
Our Hybrid Kimball & OBT Data Modeling Approach
We use a hybrid of Kimball’s Dimensional Modeling and One Big Table (OBT) to model our clients’ data — learn about both methods and why we combine them.
www.brooklyndata.co
January 11, 2025 at 8:38 PM
(1/2) Infrastructure as code (IaC) is ubiquitous in the data space. That being said, I have stayed away from doing any IaC work for as long as I can remember, mainly due to its aura of being difficult and also because I could pass the ball to the platform team.

#datasky #databs
January 10, 2025 at 1:44 AM
(1/4) What do you use for #data ingestion?

Its true that there are no shortage of tools when it comes to data ingestion. But before you open the wallet to one of the many options out there, it might be worth doing a thorough due diligence based on your current and future needs.

#databs
January 7, 2025 at 4:00 PM
I have been binging on the early chapters of the new book from @joereis.bsky.social on data modeling. I haven't consumed a lot of material on this topic besides Kimball but this one is a must read if you work with data in a modern context. Looking forward to the official release in 2025!

#databs
January 4, 2025 at 4:00 PM
(1/5) dbt core vs dbt cloud

dbt has been a game changer to many #data teams, mainly for writing reusable and version controlled transformation logic. We are also witnessing an explosion of tools that wants to become the next #dbt. What is better, dbt core or dbt cloud?

#databs
January 2, 2025 at 4:00 PM
Reposted by Bijil Subhash
Buckle up because we're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowback, Databricks vs. Snowflake gangwar, @duckdb.org's shotgun weddings, and buying a quarterback to impress your lover: www.cs.cmu.edu/~pavlo/blog/...
Databases in 2024: A Year in Review
Andy rises from the ashes of his dead startup and discusses what happened in 2024 in the database game.
www.cs.cmu.edu
January 1, 2025 at 2:02 PM
(1/5) What is Unity Catalog (UC) in the context of #databricks?

The standard definition that you get is that it is a unified governance solution built into Databricks. It is accurate but that was not intuitive to me when I started building on UC. See 🧵 for some additional context on UC.

#databs
December 31, 2024 at 4:00 PM
(1/2) Autoloader is without doubt one of my favourite feature in Databricks!

In a nutshell, it is an abstraction that simplifies the incremental ingestion of data by monitoring the files that arrive in the cloud storage, supporting reliable and resilient data pipelines cost efficiently.

#databs
December 28, 2024 at 8:31 PM
If you are an aspiring data engineer, do yourself a favour by learning the basics of data modelling amongst other fundamentals (Python and SQL) before jumping into whatever tool is on the headlines.
December 27, 2024 at 9:35 PM
Great write up on LLM frameworks.

www.anthropic.com/research/bui...

Without attaching a name to it, I have tested all except the agent workflow in 2024.

Currently running an orchestrator-worker, prompt chaining, and routing workflows across a handful of projects in production.
Building effective agents
A post for developers with advice and workflows for building effective AI agents
www.anthropic.com
December 22, 2024 at 8:59 PM
I am sure some of us can relate to this.

#dataengineer #data #dataarchitecture
December 4, 2024 at 2:26 AM
Five ways to copy a dictionary in Python:

- Unpacking an iterable*
- Use copy method*
- Using dict constructor*
- Dictionary comprehension*
- Using deepcopy (from copy)

First 3 are has similar performance, followed by dictionary comprehension, and finally deep copy.

*shallow copy
December 2, 2024 at 8:28 AM
Do we have any data engineers in the #buildinpublic community? If so, what are you building?
December 1, 2024 at 9:07 PM
What is your go to analogy for explaining what a data engineer does?

Check the🧵for the one that I use time to time.
December 1, 2024 at 7:48 PM