Lightnews — Scholar-powered news

braaannigan.bsky.social

@braaannigan.bsky.social

Polars has built in date/datetime/duration functions. I use them a lot because they have a consistent API across python versions and the syntax for working with timezones is a lot easier to remember than Python datetimes!

September 26, 2025 at 10:03 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Polars has neat built-in approaches for casting common string datetime formats these days, so long .str.strfmt followed by some pattern I could never remember

September 25, 2025 at 3:32 PM

braaannigan.bsky.social

@braaannigan.bsky.social

Need to find performance bottlenecks? Then pyinstrument is an excellent tool. Recently it showed me that my pipeline run weren't slow because of my data - it was because I was re-authenticating to AWS every time. You get this nice visual which makes it easy to spot the laggards

September 8, 2025 at 10:03 AM

braaannigan.bsky.social

@braaannigan.bsky.social

New blog post from NVIDIA and Polars showing how you can process datasets too large to fit on GPU memory (link below). For a single GPU it may be best to use the spill-to-system-memory approach while for mutli-gpus there is a new streaming engine approach

July 2, 2025 at 1:02 PM

braaannigan.bsky.social

@braaannigan.bsky.social

As projects mature you will want to invest in a tool to validate the schema and data in your dataframes. This blog post sets out a good summary on the different options for Polars users: https://posit-dev.github.io/pointblank/blog/validation-libs-2025/

June 18, 2025 at 10:03 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Pypi download stats work in mysterious ways. In the last few months Polars exhibited low continuous growth. Then basically overnight downloads almost double and become much more variable. Why?

June 9, 2025 at 11:31 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Using pytest with Polars? When there's an error the default traceback is often very long and you have to scroll through a lot to get to the relevant part. You can make it snappier by passing --tb=short to your pytest command to get to the point!

May 1, 2025 at 12:31 PM

braaannigan.bsky.social

@braaannigan.bsky.social

You can add a new column to a Polars DataFrame at a specified index position with insert_column. Your data needs to be a Polars Series first

May 1, 2025 at 10:03 AM

braaannigan.bsky.social

@braaannigan.bsky.social

We can handle tricky JSON with Polars nested dtypes.

Here we have a list of dicts. But each row also contains a list of dicts. We deal with this by exploding the inner list of dicts to get each entry on its own row. Then we unnest the inner dicts so each field is its own column

April 30, 2025 at 9:12 AM

braaannigan.bsky.social

@braaannigan.bsky.social

One thing to be careful with Polars is using pl.when.then in cases where it isn't needed as Polars pre-calculates all of the possible paths. It may be that a pl.when.then can be replaced by a join or replace_strict. This query is 5x faster as a join for example

April 22, 2025 at 9:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

One thing to be careful with Polars is using pl.when.then in cases where it isn't needed as Polars pre-calculates all of the possible paths. It may be that a pl.when.then can be replaced by a join or replace_strict. This query is 5x faster as a join for example

April 17, 2025 at 9:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Polars has native support for nested data types - it's a long way from object columns with Python dictionaries in Pandas. Native support means Polars has an API built to work with nested data and a query engine that can do vectorized transformations on nested data

April 2, 2025 at 10:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

One tool I use a lot these days is token-count. I use it to check how many tokens there are in one or more files before adding them to model context. It's a command line tool that can be pip installed. In this example we see that there are 300k tokens in just one Polars crate!

April 1, 2025 at 8:01 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Frantically trying to finish my Polars LLM evals experiments before my online event on Wednesday. I'll be evaluating which models work best for Polars and how you can prompt engineer to even better results. Deepseek-v3 the hot (and cheap) new entrant!

March 30, 2025 at 9:01 AM

braaannigan.bsky.social

@braaannigan.bsky.social

You can change display properties for Polars with pl.Config settings. In the snippet below I change to markdown format. This can be very handy - in JIRA, for example, with the markdown format a dataframe renders as a nice table rather than a mess of data

March 28, 2025 at 10:45 AM

braaannigan.bsky.social

@braaannigan.bsky.social

You can set a default engine for Polars instead of specifying it in every .collect statement. You do this with the POLARS_ENGINE_AFFINITY env var. The options are in-memory (default), streaming or gpu. If your query isn't supported with the last 2 then it reverts to in-memory

March 26, 2025 at 10:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

We can make a column based on if-elif-else in Polars with when.then.otherwise. The trick is that we can chain together as many when.thens as we need.

In this example we classify under 18 as a child, 18-64 as working age and over 64 as retired (as if any of us will retire at 65😭

March 25, 2025 at 9:18 AM

braaannigan.bsky.social

@braaannigan.bsky.social

The new Polars streaming engine does lots of async stuff internally, but you can also execute lazy Polars queries asynchronously. You need to collect the query inside an await block with the collect_async method

March 24, 2025 at 10:01 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Big changes afoot in Polars: you now specify how you want to evaluate lazy queries with the engine parameter to e.g. .collect(). The options are:
- auto (default, =in-memory)
- in-memory
- streaming
- gpu

March 20, 2025 at 10:10 AM

braaannigan.bsky.social

@braaannigan.bsky.social

If loading data from a database is a bottleneck for you I recommend reading this very accessible paper (link below). The authors show that there is a lot of low-hanging fruit to speed up database queries and how they addressed this in connectorx - which is the default DB parsing engine for Polars

March 19, 2025 at 12:02 PM

braaannigan.bsky.social

@braaannigan.bsky.social

Polars doesn't validate a lazy query as we write it - instead this typically happens when we call .collect() to trigger execution. This validation is normally fast as it is checking logic rather than data.

If we want to validate without executing we can call .collect_schema

March 19, 2025 at 10:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Polars looks for optimizations everywhere. Take this query where we add a column with the mean price and a column with the difference from mean price. Polars identifies that we need the same query twice, so it calculates it once and then caches the output for re-use

March 18, 2025 at 10:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

If you are doing LLM experiments with claude/openai/local models/... I recommend using the LLM plugins (link below). They provide you with A) a consistent interface across models and B) automatic caching of your prompts and the model responses - helpful if you do many iterations

March 17, 2025 at 10:06 AM

braaannigan.bsky.social

@braaannigan.bsky.social

Periodic reminder that Polars (and Arrow) store null counts at all times so you can just look it up. You should never be doing some kind of is_null.sum() operation!

March 14, 2025 at 11:02 AM

braaannigan.bsky.social

@braaannigan.bsky.social

The Gemma3 open model released by google today is an interesting option for Polars, especially as the training cutoff for these google models is obviously more recent than the other big players. I can run the 27b on my macbook with 36 GB of memory. Not perfect, e.g. doesn't use expressions enough

March 12, 2025 at 12:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news