Lightnews — Scholar-powered news

Florian Huber

@me-datapoint.bsky.social

2.1K followers 600 following 48 posts

Professor for data science at HSD, @zdd-hsd.bsky.social
| ML fan & critic | current research mostly #datascience, #machinelearning, #cheminformatics #dataviz #nlp | ✨ #openscience #openaccess #rse | living data point 🚲

Posts Replies Media Videos

Florian Huber

@me-datapoint.bsky.social

New #matchms release (0.31)🚀

With functionalities that were on our TODO list for a looooong time: Flash Entropy and BLINK scores! The new "FlashSimilarity" allows computing modified cosine, spectral entropy etc., about 100x faster (or more if you use Linux).

#Python #opensource #massspec

October 6, 2025 at 4:00 PM

Florian Huber

@me-datapoint.bsky.social

@jorainer.bsky.social and @philouail.bsky.social gave a great overview of the ecosystem around #RforMassSpectrometry and #XCMS!

#MetSoc25
I am super glad they now also provide options to combine with #Python and #matchms (thanks🙏)

June 26, 2025 at 9:32 AM

Florian Huber

@me-datapoint.bsky.social

Great keynote by @sneumann.bsky.social at #MetSoc25, strongly advocating for #opensource , data-sharing, and making things interoperable.

Glad to also spot #matchms in this universe :)

Slide from presentation of Steffen Neumann

June 25, 2025 at 7:35 AM

Florian Huber

@me-datapoint.bsky.social

4/4
We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice.

An example use-case are chemical space visualizations.

Preprint: www.biorxiv.org/content/10.1...

Chemical Space Visualizations using UMAP and various molecular fingerprints.

June 23, 2025 at 9:22 AM

Florian Huber

@me-datapoint.bsky.social

3/4
A huge issue is bit collisions.
Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules.

--> Consider using sparse fingerprints!
--> Morgan >> MAP4 / RDKit

June 23, 2025 at 9:22 AM

Florian Huber

@me-datapoint.bsky.social

2/4
We focused on weaknesses of the fingerprints.
Many show frequent duplicates, so same fingerprint for different compounds. Most problematic: this can include *very* different compounds ending up with identical fingerprints.

- MAP4 >> Morgan-type >> daylight
- count >> binary

#cheminformatics

Benchmarking plot on fingerprint duplications.

June 23, 2025 at 9:22 AM

Florian Huber

@me-datapoint.bsky.social

New preprint out!
1/4

@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.

Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.

--> www.biorxiv.org/content/10.1...

Sketch of count/binary fingerprints and weighing options.

June 23, 2025 at 9:22 AM

Florian Huber

@me-datapoint.bsky.social

Good start for me at #metabolomics2025 with a hands-on workshop on MS2LDA by Jonas Dietrich, Rosina Torres Ortega and @jjjvanderhooft.bsky.social.

June 23, 2025 at 8:11 AM

Florian Huber

@me-datapoint.bsky.social

Went by train to #Prague for #metabolomics2025.

These are the kind of moments that remind me how great the European project is. No border controls, no visas. Just a train following a river to the neighboring country.

Elbe river seen from a train somewhere after Dresden.

June 22, 2025 at 2:02 PM

Florian Huber

@me-datapoint.bsky.social

Da kann der Bürgermeister @duesseldorf.bsky.social noch so oft die "Fahrradhauptstadt" (🥹🤭😭) herbeibeschwören... es braucht dann doch ein bisschen mehr als ein paar Kleckse Farbe.

#Düsseldorf weiterhin konstant bei 4- im #ADFC Klimatest. Läuft. @adfcnrw.bsky.social @adfc-duesseldorf.de

Screenshot vom ADFC Fahrradklima-Test 2024 für Düsseldorf.

June 17, 2025 at 7:35 PM

Florian Huber

@me-datapoint.bsky.social

When you prepare lesson material while being hungry...

(added some text edits and more sketches/figures to the NLP chapters of the "Hands-on Introduction to #DataScience with #Python" textbook)

florian-huber.github.io/data_science...

#OpenScience #Teaching #CCBY

June 6, 2025 at 11:38 AM

Florian Huber

@me-datapoint.bsky.social

New release of my "Hands-on Introduction to Data Science with Python" textbook!

Contains many text edits and figure updates. For instance, in the sections on Clustering and Machine Learning.

All fully #opensource and #openaccess. Figures are #CCBY.

--> florian-huber.github.io/data_science...

May 14, 2025 at 8:10 PM

Florian Huber

@me-datapoint.bsky.social

I would say triumph.

April 3, 2025 at 3:31 PM

Florian Huber

@me-datapoint.bsky.social

Here is a short blog post on the typical data science workflow (if something like that even exists). Happy to take any feedback or suggestions.
--> medium.com/@f.huber/wat...

#DataScience #Python #Teaching

March 25, 2025 at 12:49 PM

Florian Huber

@me-datapoint.bsky.social

@jjjvanderhooft.bsky.social sharing his vision on #matchms during our developer workshop @zdd-hsd.bsky.social.

#opensource #Python #massspec

March 13, 2025 at 9:16 AM

Florian Huber

@me-datapoint.bsky.social

#matchms workshop in full swing!
Great fun to work with this fantastic group of people on improving, expanding, applying matchms for handling #massspec data in #Python.

March 12, 2025 at 3:28 PM

Florian Huber

@me-datapoint.bsky.social

It took us quite a while... not easy with most authors undergoing career switches (me, too). But finally the last bit of my postdoc work with @marileend.bsky.social is now published in @biophysj.bsky.social.

#microtubules with EB3 growing against rigid barriers.
--> www.cell.com/biophysj/ful...

Microtubules with EB3 growing against rigid barriers.

December 3, 2024 at 3:21 PM

Florian Huber

@me-datapoint.bsky.social

Enjoyed exploring the beautiful map of #bluesky (from 2024-11-07) from @syntacrobat.xyz!

--> aurora.ndimensional.xyz

UMAP plot of all bluesky accounts of 2024-11-07 made with https://aurora.ndimensional.xyz/.

November 22, 2024 at 7:06 PM

Florian Huber

@me-datapoint.bsky.social

Delighted that a project that took shape within a #machinelearning workshop at the @esciencecenter.bsky.social and which was initiated and led by #RozaKamioglu and #DisaSauter led to a data analysis of human #laughter sound

--> royalsocietypublishing.org/doi/10.1098/...

#OpenAccess #DataScience

Sketch of analysis framework for laughter context.

November 20, 2024 at 11:18 AM

Florian Huber

@me-datapoint.bsky.social

I continue to work on #opensource versions of my various teaching materials. Here is the first complete draft for my Python Introduction (for the moment only in German, but English version is on the Todo-list)
--> florian-huber.github.io/python-intro...

#DataScience #teaching

Cover for the German Programming with Python for Data Science Course.

November 15, 2024 at 8:26 PM

Florian Huber

@me-datapoint.bsky.social

New version of my (renamed) textbook: "Hands-on Introduction to Data Science using Python" 🎉

The content is now mostly complete. Text and figures will undergo further polishing.

--> florian-huber.github.io/data_science...
#datascience #opensource #teaching

October 14, 2024 at 12:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news