Kenny Pavan
banner
kennypavan.com
Kenny Pavan
@kennypavan.com
Science, nature photos, mental-health advocate, and star trek 🖖

Bioinformatics | ML | PhD Candidate @ OHSU studying synaptic connectivity.

myOwnOpinions=True
Probably just the duck from DuckDb running a quick query. It'll be gone soon.
August 15, 2025 at 12:16 AM
We're excited to offer the first #SQL based single-cell exploration and analysis tool. If you encounter anomalies or bugs, please report to the AnnSQL GitHub issue board: github.com/ArpiarSaunde...
GitHub - ArpiarSaundersLab/annsql: The AnnSQL package enables SQL based queries on AnnData objects.
The AnnSQL package enables SQL based queries on AnnData objects. - ArpiarSaundersLab/annsql
github.com
March 24, 2025 at 5:29 PM
𝗦𝗽𝗲𝗲𝗱: AnnSQL was built for querying large datasets.

tl;dr; If your dataset is > ~100k cells or out of memory, consider using AnnSQL. If your dataset is < ~100k cells or within memory, Scanpy or Seurat are the winners.
March 24, 2025 at 5:29 PM
𝗜𝗻𝘁𝗲𝗿𝗼𝗽𝗲𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝘆: AnnSQL databases (.asql) can be shared and also play nicely with a variety of languages; even R!

docs.annsql.com/interoperabi...
March 24, 2025 at 5:29 PM
𝗠𝗲𝗺𝗼𝗿𝘆: Using the #DuckDb in-process database engine behind the scenes, AnnSQL is memory respectful. AnnSQL also enabled a chunking parameter for all preprocessing methods— making it a powerful ally capable of processing massive datasets on a laptop.
March 24, 2025 at 5:29 PM
𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: Tutorials and API usage can be viewed at: docs.annsql.com
March 24, 2025 at 5:29 PM
𝗡𝗲𝘄 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀:
- Save raw layer
- Filter by cell & gene counts
- Highly variable gene selection
- PCA (experimental)
- Leiden clustering, UMAP, Diff Expression
- Various plotting utilities
March 24, 2025 at 5:29 PM
𝗣𝗿𝗲𝗽𝗿𝗶𝗻𝘁: In our updated preprint, we runtime profile filters vs queries using AnnSQL, AnnData, and Seurat.

www.biorxiv.org/content/10.1...
March 24, 2025 at 5:29 PM
Happy to see lots of DuckDuckGoers. It's also worth considering Brave search. 🦁

search.brave.com
February 14, 2025 at 9:29 PM
Whoa, that's dedication 💪 Hope you find what you're looking for soon!
February 8, 2025 at 4:15 AM
Currently using ScEasy or SeuratDisk. Both are great tools and I've had some success using both, however, they sometimes can fail depending on the dataset and the layers present. The interoperability page you linked has some options I've yet to explore. Thank you for sharing!
February 5, 2025 at 4:26 AM
SQL is way underappreciated in #bioinfo. Our lab just dropped a preprint for a tool (AnnSQL) that uses DuckDb for single-cell analysis.

www.biorxiv.org/content/10.1...
AnnSQL: A Python SQL-based package for large-scale single-cell genomics analysis on a laptop
As single-cell genomics technologies continue to accelerate biological discovery, software tools that use elegant syntax and minimal computational resources to analyze atlas-scale datasets are increas...
www.biorxiv.org
January 28, 2025 at 3:53 AM
Usually I try to reduce it to an easily digestible analogy. Something like... Biology is like a giant puzzle and we use computers to try and put all the pieces together.
January 28, 2025 at 3:49 AM
Named after a sci-fi series whose most notable AI component, the Replicators, sought to consume humans and all other resources in the galaxy 👍
January 22, 2025 at 2:35 AM