Author | Lightnews

Data Elixir

@dataelixir.com

ClickHouse solved Advent of Code 2025 puzzles using single ClickHouse queries. No UDFs, no temp tables, no preprocessing. Just pure SQL doing things SQL probably shouldn't do. Impressive and slightly cursed in the best way.

Solving the "Impossible" in ClickHouse: Advent of Code 2025

At ClickHouse, we don't like the word "impossible." We believe that with the right tools, everything is a data problem. To prove it, we decided to complete the 2025 Advent of Code unconventionally:…

clickhouse.com

January 10, 2026 at 1:37 AM

Data Elixir

@dataelixir.com

If you're wondering where database tech is really headed in 2026, Andy Pavlo's annual review cuts through the noise. Sharding, Postgres evolution, and hot takes you won't find in vendor blogs.

Databases in 2025: A Year in Review

The world tried to kill Andy off but he had to stay alive to to talk about what happened with databases in 2025.

www.cs.cmu.edu

January 9, 2026 at 2:01 AM

Data Elixir

@dataelixir.com

MarkItDown (Microsoft) converts a variety file types to Markdown optimized for LLMs. Handles structure, not just text. Pairs well with Kreuzberg for extraction (supports 50+ file formats!). If you're building RAG systems, these belong in your stack.

Top Python libraries of 2025

Explore our 11th annual Top Python Libraries roundup, featuring two curated Top 10 lists for General Use and AI / ML / Data tools that matter today.

tryolabs.com

January 6, 2026 at 1:37 AM

Data Elixir

@dataelixir.com

jax-js compiles NumPy-style array code into WebAssembly and WebGPU kernels that run entirely client-side. No server, no dependencies, just JAX's programming model in your browser. This changes what's possible for interactive ML demos.

jax-js: an ML library for the web

JAX in pure JavaScript, as a flexible machine learning library and compiler.

ss.ekzhang.com

January 3, 2026 at 3:34 PM

Data Elixir

@dataelixir.com

Jan Van Haaren's 2025 soccer analytics review is a solid reference if you're doing applied ML in sports or want to see how domain experts handle sequential data. Covers spatio-temporal models, graph methods, Bayesian forecasting, and tracking data metrics. janvanhaaren.be/posts/soccer...

Soccer Analytics 2025 Review – Jan Van Haaren

Collection of the soccer analytics content that I liked the most in 2025!

janvanhaaren.be

January 2, 2026 at 8:57 PM

Data Elixir

@dataelixir.com

ML progress isn't driven by elegant theory. It's benchmarks, leaderboards, and engineering culture. In this post, Ben Recht explores why empirical testing beats clean math in practice and why that tension defines the field.

Benchmark Studies

It is impossible to disentangle technical innovation from technical debt

www.argmin.net

December 20, 2025 at 4:02 PM

Data Elixir

@dataelixir.com

The paradox: show a complex graph and execs check their phones. Show a simple one and they demand endless breakdowns. The solution isn't more or less data. It's recentering discussions on the actual decision at hand. methodmatters.github.io/true-stories...

True Stories from the (Data) Battlefield – Part 1: Communicating About Data

A blog about data science, statistics, and data analysis with open-source software.

methodmatters.github.io

December 19, 2025 at 3:47 PM

Data Elixir

@dataelixir.com

Your browser can run Python (Pyodide), execute OCR on PDFs, crop videos, and call LLM APIs—all without uploading anything to a server. The localStorage + CORS pattern makes surprisingly powerful tools possible with zero backend infrastructure.

Useful patterns for building HTML tools

I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to …

simonwillison.net

December 18, 2025 at 3:37 PM

Data Elixir

@dataelixir.com

Tired of charts that hide the story? Density plots + percentile intervals reveal what averages can't: full variability, extremes, historical context. Nine examples comparing Spain's temperature data show how geometry choice changes what readers understand.

Broken Chart: discover 9 visualization alternatives

Researcher in climate science at MBG-CSIC

dominicroye.github.io

December 17, 2025 at 6:04 PM

Data Elixir

@dataelixir.com

Fisher arbitrarily chose p<0.05 a century ago and we've just... kept it. The problem: calling it "arbitrary" only works if you can suggest something less arbitrary. No one has, so here we are circling p-values close to 0.05 like it means something.

vilgot-huhn.github.io/mywebsite/po...

December 16, 2025 at 1:13 PM

Data Elixir

@dataelixir.com

Haskell for data science? dataHaskell adds dataframes, NSE-style column operations, and compiler optimizations that turn chained operations into single-pass computations. Immutability + strong types + functional composition might be the combo we've been missing. jcarroll.com.au/2025/12/05/h...

Haskell IS a Great Language for Data Science

I’ve been learning Haskell for a few years now and I am really liking a lot of the features, not least the strong typing and functional approach. I thought it was lacking some of the things I missed…

jcarroll.com.au

December 15, 2025 at 7:48 PM

Data Elixir

@dataelixir.com

Learning SQL is like learning a foreign language: you need to read more variations than you'll actually write. Learn disciplined canonical syntax for your own queries, but understand the messy dialects others use.

A modern guide to SQL JOINs

There are many SQL JOINs guides and tutorials, but this one takes a different approach. We try to avoid misleading wording and imagery, and we structure the material in a different way. The goal of…

kb.databasedesignbook.com

December 5, 2025 at 2:01 AM

Data Elixir

@dataelixir.com

Side projects still open more doors than traditional applications in data roles. The trick: build small, interesting things that signal skills and make your work discoverable. Especially relevant given the current job market.

Make Things, Tell People

On side projects and finding work

presentofcoding.substack.com

December 1, 2025 at 1:37 PM

Data Elixir

@dataelixir.com

Most time-to-event metrics are broken. Amazon thought customer support wait times were under 1 min. Bezos called in: 10+ minutes. Why? They only measured customers who stayed on hold long enough to be served. The ones who hung up? Never counted. www.counting-stuff.com/why-you-are-...

Why You Are (Probably) Measuring Time Wrong: Why do we need to use Survival Analysis more

Author: Michał Chorowski [Hey everyone! A guest post from Michał this week! This newsletter is always willing to host/share data-related content, so if you've created anything you'd like to share,…

www.counting-stuff.com

November 29, 2025 at 4:02 PM

Data Elixir

@dataelixir.com

Hot take: Your analysts doing "some basic data engineering" is killing your analytics function. The MTA hired 5 dedicated data engineers and it unlocked everything else. Stop asking data scientists to maintain pipelines. www.mta.info/article/less...

Lessons learned in starting a central data team

Learn how the MTA succeeded in setting up a central data team and a general purpose, cloud-based platform for data analytics.

www.mta.info

November 29, 2025 at 3:47 AM

Data Elixir

@dataelixir.com

Sometimes the best polars pattern is knowing when to exit the DataFrame. partitionby() splits data into a dict of frames, letting you process with list comprehensions. Cleaner than forcing everything through mapgroups() when further wrangling isn't needed.

Python Rgonomics: User-defined functions in polars | Emily Riederer

Polars provides a consistent API for conducting transformations against a DataFrame. But what do you do when you need to apply a user-defined function beyond the native API? This post surveys the…

www.emilyriederer.com

November 24, 2025 at 1:37 PM

Data Elixir

@dataelixir.com

Most devs treat AI coding agents like infinite context machines. Reality: a 200k token window fills fast. The /compact feature is a trap. Better approach: /clear + document state in markdown, then resume. Treat context like disk space. You need a cleanup strategy.

How I Use Every Claude Code Feature

A brain dump of all the ways I've been using Claude Code.

blog.sshh.io

November 23, 2025 at 4:15 PM

Data Elixir

@dataelixir.com

Most modern dimensionality reduction (t-SNE, UMAP, Isomap) shares a pattern: represent data as a graph capturing local similarity, then embed to preserve that structure. It's graphs all the way down.

A Visual Introduction to Dimensionality Reduction with Isomap

"To deal with hyper-planes in a 14-dimensional space, visualize a 3D space and say 'fourteen' to yourself very loudly. Everyone does it." - Geoffrey Hinton

alechelbling.com

November 22, 2025 at 3:47 AM

Reposted by Data Elixir

Maria Antoniak

@mariaa.bsky.social

I curated some readings for class on "data tensions" and the list felt worth sharing. Come on a tour of datasets, books, the web, and AI with me...

We'll start with this piece on the Google Books project: the hopes, dreams, disasters, and aftermath of building a public library on the internet.

1/n

Torching the Modern-Day Library of Alexandria

“Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”

www.theatlantic.com

November 14, 2025 at 4:39 PM

Data Elixir

@dataelixir.com

Everyone's rushing to pgvector for "simple" vector search in Postgres. This reality check shows what actually happens at scale: indexing nightmares and performance walls. Simple isn't always sustainable in production.

The Case Against pgvector | Alex Jacobs

What happens when you try to run pgvector in production and discover all the things the blog posts conveniently forgot to mention

alex-jacobs.com

November 14, 2025 at 2:01 AM

Data Elixir

@dataelixir.com

When healthcare becomes algorithmic, what gets optimized out? This Guardian essay asks the hard question about AI spreading through diagnostics and therapy: are we trading care quality for efficiency without realizing the cost?

What we lose when we surrender care to algorithms | Eric Reinhart

A dangerous faith in AI is sweeping American healthcare – with consequences for the basis of society itself

www.theguardian.com

November 13, 2025 at 4:16 AM

Data Elixir

@dataelixir.com

Thinking Machines Lab solved a problem everyone accepted as unsolvable: LLM nondeterminism at temperature 0. Same prompt, same model, 1000 runs → 80 different outputs. With batch-invariant kernels? Bitwise identical every time. Open sourced. www.distributedthoughts.org/will-i-make-...

Will I Make It To The Restaurant Before The Soup Dumplings Get Cold? (And Other Problems In Machine Learning)

I'm chronically late. Not because I want to be rude - I feel terrible about it every single time - but because I'm catastrophically bad at predicting how long it takes to get anywhere. Turns out…

www.distributedthoughts.org

November 8, 2025 at 3:47 AM

Data Elixir

@dataelixir.com

Most marketplaces have SKUs. Etsy has 100M+ unique items with no standard attributes. How do you build filters when one listing is a "porcelain sculpture that looks like a t-shirt" and dimensions live in random photo text? www.etsy.com/codeascraft/...

www.etsy.com

November 6, 2025 at 3:37 AM

Data Elixir

@dataelixir.com

GeoUtil converts between GeoJSON, TopoJSON, Shapefile, KML, WKT, and CSV without touching a server. TopoJSON compression alone cuts file sizes 80%+ while preserving topology. All free, all browser-based. geoutil.com

GeoUtil — Free Online Map & Geography Tools

All-in-one online geography toolkit. Measure distance & area, convert GeoJSON, TopoJSON, JSON, merge or minify files, and more — fast, free, and browser-based.

geoutil.com

October 31, 2025 at 2:47 PM

Data Elixir

@dataelixir.com

Debugging constraint problems is backwards: remove constraints until something works, then figure out what broke. No stack traces, just an "unsatisfiable." Forces you to think differently about what you're actually asking the system to solve. www.righto.com/2025/10/solv...

Solving the NYTimes Pips puzzle with a constraint solver

The New York Times recently introduced a new daily puzzle called Pips . You place a set of dominoes on a grid, satisfying various condition...

www.righto.com

October 30, 2025 at 3:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news