Lightnews — Scholar-powered news

Gabor Szarnyas

@szarnyasg.org

Thanks! All the fiddling with SVGs was worth it! 🙌

November 17, 2025 at 9:34 PM

Gabor Szarnyas

@szarnyasg.org

DuckDB does not support GraphQL. GraphQL itself is a bit of misnomer as it is not a full-fledged graph query language, it's primarily intended query REST endpoints. GQL and SQL/PGQ are full-fledged graph query languages, supporting both pattern matching and path finding.

October 24, 2025 at 7:31 AM

Gabor Szarnyas

@szarnyasg.org

I work at DuckDB Labs so obviously I am biased but this really looks like a prime use case for @duckdb.org

Last year I reimplemented a lot of the cut / awk / csvkit examples of the “Data Science at the Command Line Book in DuckDB“ book in DuckDB and got good results:

szarnyasg.org/posts/data-s...

Data Science at the Command Line Book in DuckDB

Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.co...

szarnyasg.org

May 16, 2025 at 12:13 PM

Gabor Szarnyas

@szarnyasg.org

I don't think there is such a test in DuckDB at the moment. You'd have to look at the binary code with a disassembler and try to find vector instructions.

May 9, 2025 at 8:30 AM

Gabor Szarnyas

@szarnyasg.org

It would be an interesting experiment to try to make use of RISC-V RVV but I'm not aware of any attempts.

In the official DuckDB code base, the engine doesn't have any platform-specific code to ensure portability. So it's up to the compilers to auto-vectorize the code.

May 8, 2025 at 10:12 AM

Gabor Szarnyas

@szarnyasg.org

Here is a query making use of prefix aliases in all three clauses:

SELECT
"Station name": s.name_short,
"Max distance": max(d.distance)
FROM s: 's3://duckdb-blobs/stations.parquet'
JOIN d: 's3://duckdb-blobs/distances.parquet'
ON d.station1 = s.code
GROUP BY ALL
ORDER BY "Max distance" DESC;

February 25, 2025 at 3:04 PM

Gabor Szarnyas

@szarnyasg.org

I recently added your instructions for building DuckDB on RISC-V to the DuckDB documentation: duckdb.org/docs/dev/bui...

Thanks for the great work on this!

Unofficial and Unsupported Platforms

Warning The platforms listed on this page are not officially supported. The build instructions are provided on a best-effort basis. Community contributions are very welcome. DuckDB is built and distri...

duckdb.org

February 21, 2025 at 8:10 PM

Gabor Szarnyas

@szarnyasg.org

I don't think this is possible in the moment. I would go the other route and try to do unnest and join. To save memory, you could peel away the nested column (CREATE TEMP TABLE tmp AS SELECT column FROM original_table), and do the unnest and join on this table, then join it back to the original.

February 19, 2025 at 11:19 AM

Gabor Szarnyas

@szarnyasg.org

The list_reduce function iterates through the list and picks the correct categoriy.

You can generalize this and put a MAP value into the list_reduce function to capture the mapping, then do exact matching on the MAP's keys. For more details, see list_reduce in the docs: duckdb.org/docs/sql/fun...

Lambda Functions

Lambda functions enable the use of more complex and flexible expressions in queries. DuckDB supports several scalar functions that operate on LISTs and accept lambda functions as parameters in the for...

duckdb.org

February 19, 2025 at 10:27 AM

Gabor Szarnyas

@szarnyasg.org

I ran into a similar problem recently when I needed to categorize posts into according to their length:

– 0: 0 ≤ length < 40
– 1: 40 ≤ length < 80
– 2: 80 ≤ length < 160
– 3: 160 ≤ length

I came up with this:

list_reduce([0, 40, 80, 160], (acc, x, i) -> IF(x <= length, i - 1, acc)) AS category

February 19, 2025 at 10:27 AM

Gabor Szarnyas

@szarnyasg.org

My post on DuckDB vs. wc received a lot of feedback. Based on these, I ran a few more experiments to see how DuckDB stacks up against parallelized wc and grep/ripgrep on Linux.

I wrote up my results in a blog post.

TL;DR: it depends but DuckDB is still pretty fast!
szarnyasg.org/posts/duckdb...

December 4, 2024 at 9:25 PM

Gabor Szarnyas

@szarnyasg.org

Oops, that's the difference of reading the CSV with or without its header. Well-spotted!

December 2, 2024 at 10:58 PM

Gabor Szarnyas

@szarnyasg.org

3) The ts command adds a timestamp at the beginning of each line. On macOS, it's available in the moreutils package on Homebrew.

November 30, 2024 at 7:50 PM

Gabor Szarnyas

@szarnyasg.org

2) A single sed command can include multiple search and replace pairs separated by semicolon. This makes sed commands *even less readable*, so use it with caution.

November 30, 2024 at 7:50 PM

Gabor Szarnyas

@szarnyasg.org

1) The bat tool – an alternative to cat – prints the newline characters if it's invoked with the -A switch. This output mode reveals whether a file is using CR/LF or LF newlines (or both).

November 30, 2024 at 7:50 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news