Pavel Veselý
pavelvesely.bsky.social
Pavel Veselý
@pavelvesely.bsky.social
Computer scientist at Charles University, Prague 🇨🇿 I like all kinds of efficient algorithms and data structures for large datasets || also ⛰️🇺🇦https://iuuk.mff.cuni.cz/~vesely/
Reposted by Pavel Veselý
Ubohost

Tohle jedno slovo nejlépe vystihuje počin jednoho z nově zvolených ústavních činitelů téhle země

Začít mandát tím, že sundáte ukrajinskou vlajku z budovy pěkně ilustruje to, o co mu jde. Nikoliv o zlepšení téhle země, ale jen o rozdmýchávání vášní

Co s tím? Já si koupil drona
November 6, 2025 at 8:57 PM
During the next two months, I will have two long talks about streaming algorithms / data sketching for high-school students. Did you give a similar talk? What was your experience?
October 2, 2025 at 1:23 PM
Tomorrow at ESA: my former postdoc Nick Matsakis will present our streaming algorithm for diameter in high-dimensional spaces. Very simple: just 4 lines of pseudocode, and yet, achieving optimal approximation. Joint work with Magnús M. Halldórsson. arxiv.org/abs/2505.16720
Streaming Diameter of High-Dimensional Points
We improve the space bound for streaming approximation of Diameter but also of Farthest Neighbor queries, Minimum Enclosing Ball and its Coreset, in high-dimensional Euclidean spaces. In particular, o...
arxiv.org
September 16, 2025 at 8:44 PM
Reposted by Pavel Veselý
Pythagorean Triple Square Day, as one man affectionately calls 9/16/25, is a day like no other this century.
On 9/16/25, celebrate a date of mathematical beauty
Pythagorean Triple Square Day, as one man affectionately calls 9/16/25, is a day like no other this century.
n.pr
September 16, 2025 at 11:50 AM
Reposted by Pavel Veselý
Zstandard's --long range mode works wonders for assemblies, but needs uninterrupted single line sequences.

*AllTheBacteria 661k, multiline fasta*
gzip (pigz): 751GB
zstandard --long: 641GB (30% original size)

*Single line fasta*
gzip (pigz): 700GB
zstandard --long: 232GB (10% original size)
September 9, 2025 at 10:27 AM
Reposted by Pavel Veselý
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
September 3, 2025 at 8:39 AM
Reposted by Pavel Veselý
At scale, the way that we store (and process) data matters! Many may think that the way we keep data, the file formats we adopt, and the way that we compress data are unimportant details, but they are, in fact, critical considerations to allow science to move forward at scale!
I can't believe the biggest bottleneck in my lab right now is scrounging to afford more storage for processing data, not sequencing costs, technique or analysis difficulty.
August 26, 2025 at 2:45 PM
Reposted by Pavel Veselý
Springer publishes a P ≠ NP "proof" and Eric Allender has words to say.

blog.computationalco...
Some thoughts on journals, refereeing, and the P vs NP problem
A guest post by Eric Allender prompted by an  (incorrect) P ≠ NP proof   recently published  in Springer Nature's Frontiers of Computer Scie...
blog.computationalcomplexity.org
August 4, 2025 at 6:08 PM
Reposted by Pavel Veselý
A monumental collaborative effort with many incredible people ☺️ Proud to be part of this!
Hans-Peter Lehmann, Thomas Mueller, Rasmus Pagh, Giulio Ermanno Pibiri, Peter Sanders, Sebastiano Vigna, Stefan Walzer
Modern Minimal Perfect Hashing: A Survey
https://arxiv.org/abs/2506.06536
June 10, 2025 at 8:21 AM
Reposted by Pavel Veselý
Slides from my talk (with @kamilsjaron.bsky.social) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kme...
June 3, 2025 at 9:25 AM
Reposted by Pavel Veselý
Nicely written blog post by David Eppstein on the Boyer–Moore (deterministic) streaming algorithm to find a majority element in a stream, and its extensions, first to the turnstile model, and then to frequency estimation (Misra–Gries).
11011110.github.io/blog/2025/05... via @theory.report
Turnstile majority
A famous algorithm of Boyer and Moore for the majority problem finds a majority element in a stream of elements while storing only two values, a single tenta...
11011110.github.io
May 6, 2025 at 1:30 PM
Reposted by Pavel Veselý
We finally concluded the meeting. Thanks to all attendees for their scientific contributions and for traveling (near or far) to the meeting! Thanks to the local organizers for the infrastructure and catering, and thanks to the co-organizers @yaronorenstein.bsky.social @camillemrcht.bsky.social!
April 25, 2025 at 8:18 AM
Reposted by Pavel Veselý
@pavelvesely.bsky.social (CSI) on the mother of spss: masked superstrings that help you representing k-mer sets in a very compact way. He actually takes a lot from @brinda.eu since he squeezed 3 papers in a 12 min talk 😳
April 24, 2025 at 5:05 AM
Reposted by Pavel Veselý
🚀 Just 2 days to go!

Excitement is building for #RECOMBseq 2025 in Seoul 🎉
Join leading researchers as we dive into cutting-edge computational genomics, from single-cell to long-read sequencing.

🗓 April 24-25
📍 Seoul
📄 Program: recomb-seq.github.io/program/

#RECOMB2025 #Genomics #Bioinformatics
Program
RECOMB-seq is the RECOMB Satellite Conference on Biological Sequence Analysis
recomb-seq.github.io
April 22, 2025 at 11:57 AM
Reposted by Pavel Veselý
A decade ago, we had thousands of bacterial genomes. Now, we have millions. How to scale computational methods?

Our paper in @naturemethods.bsky.social answers this: use evolutionary history to guide compression and search.

rdcu.be/eg4OA

w/ @baym.lol, @zaminiqbal.bsky.social et al. 🧵1/
April 11, 2025 at 3:01 PM
Reposted by Pavel Veselý
So glad this is finally out. The method has been instrumental in allowing us to compress the AllTheBacteria data - ~2 million bacterial genomes shrink from 3Terabytes (gzipped) to 100Gb using phylogenetic compression. Great work by @brinda.eu
April 9, 2025 at 10:27 PM
Reposted by Pavel Veselý
European Sympsium on Algorithms 2025 will be held in Warsaw in September, as part of ALGO 2025. Do you have great work on design and analysis of algorithms? Submit it by April 23! algo-conference.org/2025/esa/
ESA – ALGO2025
algo-conference.org
April 8, 2025 at 2:45 PM
STOC in Prague! With FOCS deadline over, there's no excuse to postpone the registration.
acm-stoc.org/stoc2025/
April 4, 2025 at 7:00 AM
Reposted by Pavel Veselý
Taking a break from the submission season? Swing by the Workshop on Algorithms for Large Data (Online), WALDO 2025 🗓️ April 14—16: waldo-workshop.github.io/2025.html
Registration is free! (but necessary by April 7)
Workshop on Algorithms for Large Data (Online) 2025
waldo-workshop.github.io
April 4, 2025 at 6:44 AM
Reposted by Pavel Veselý
Aleksander {\L}ukasiewicz, Jakub T\v{e}tek, Pavel Vesel\'y
SplineSketch: Even More Accurate Quantiles with Error Guarantees
https://arxiv.org/abs/2504.01206
April 3, 2025 at 5:13 AM
Reposted by Pavel Veselý
The upcoming trade war with the penguins, Is a good excuse to mention the following fantastic book...
en.m.wikipedia.org/wiki/War_wit...
War with the Newts - Wikipedia
en.m.wikipedia.org
April 3, 2025 at 12:31 PM
Reposted by Pavel Veselý
The talks regarding Ukraine and Russia will fail. We have been there multiple times and this pattern will not suddenly change. Even the chances of a mere ceasefire are - even when not zero - rather modest. Though if it would occur it would fall apart, eventually, and probably soon. It is inevitable.
March 31, 2025 at 11:29 AM
Reposted by Pavel Veselý
🚨 Late-Breaking Posters! 🚨

Missed the deadline? We’ve got you covered! Submit your poster abstracts for RECOMB-seq and showcase your work in sequence analysis & computational genomics.

🗓️ Late poster deadline: April 5, 23:59 AoE

Don't miss this last chance! 🚀 #RECOMBseq
March 25, 2025 at 10:46 AM
Open question for CS community: What to teach from modern parallel algorithms to 2nd CS undergrads? We teach sorting networks (bitonic sort) and boolean circuits (summation, multiplication), which is fine but does not feel very modern. What else to show about parallelism in, say, 2 lectures?
March 24, 2025 at 11:41 PM
Reposted by Pavel Veselý
I'm going to slowly repost my math notes from the other site🐦 here🦋; it's the only thing I posted over there that I think may have some long-term value & worth not deleting.

These started out as notes for myself, but people seem to appreciate them. 😅

I'll keep track of all of them in this thread.
November 14, 2024 at 5:18 PM