Florian Huber
banner
me-datapoint.bsky.social
Florian Huber
@me-datapoint.bsky.social
Professor for data science at HSD, @zdd-hsd.bsky.social
| ML fan & critic | current research mostly #datascience, #machinelearning, #cheminformatics #dataviz #nlp | ✨ #openscience #openaccess #rse | living data point 🚲
New #matchms release (0.31)🚀

With functionalities that were on our TODO list for a looooong time: Flash Entropy and BLINK scores! The new "FlashSimilarity" allows computing modified cosine, spectral entropy etc., about 100x faster (or more if you use Linux).

#Python #opensource #massspec
October 6, 2025 at 4:00 PM
@jorainer.bsky.social and @philouail.bsky.social gave a great overview of the ecosystem around #RforMassSpectrometry and #XCMS!

#MetSoc25
I am super glad they now also provide options to combine with #Python and #matchms (thanks🙏)
June 26, 2025 at 9:32 AM
Great keynote by @sneumann.bsky.social at #MetSoc25, strongly advocating for #opensource , data-sharing, and making things interoperable.

Glad to also spot #matchms in this universe :)
June 25, 2025 at 7:35 AM
4/4
We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice.

An example use-case are chemical space visualizations.

Preprint: www.biorxiv.org/content/10.1...
June 23, 2025 at 9:22 AM
3/4
A huge issue is bit collisions.
Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules.

--> Consider using sparse fingerprints!
--> Morgan >> MAP4 / RDKit
June 23, 2025 at 9:22 AM
2/4
We focused on weaknesses of the fingerprints.
Many show frequent duplicates, so same fingerprint for different compounds. Most problematic: this can include *very* different compounds ending up with identical fingerprints.

- MAP4 >> Morgan-type >> daylight
- count >> binary

#cheminformatics
June 23, 2025 at 9:22 AM
New preprint out!
1/4

@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.

Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.

--> www.biorxiv.org/content/10.1...
June 23, 2025 at 9:22 AM
Good start for me at #metabolomics2025 with a hands-on workshop on MS2LDA by Jonas Dietrich, Rosina Torres Ortega and @jjjvanderhooft.bsky.social.
June 23, 2025 at 8:11 AM
Went by train to #Prague for #metabolomics2025.

These are the kind of moments that remind me how great the European project is. No border controls, no visas. Just a train following a river to the neighboring country.
June 22, 2025 at 2:02 PM
Da kann der Bürgermeister @duesseldorf.bsky.social noch so oft die "Fahrradhauptstadt" (🥹🤭😭) herbeibeschwören... es braucht dann doch ein bisschen mehr als ein paar Kleckse Farbe.

#Düsseldorf weiterhin konstant bei 4- im #ADFC Klimatest. Läuft. @adfcnrw.bsky.social @adfc-duesseldorf.de
June 17, 2025 at 7:35 PM
When you prepare lesson material while being hungry...

(added some text edits and more sketches/figures to the NLP chapters of the "Hands-on Introduction to #DataScience with #Python" textbook)

florian-huber.github.io/data_science...

#OpenScience #Teaching #CCBY
June 6, 2025 at 11:38 AM
New release of my "Hands-on Introduction to Data Science with Python" textbook!

Contains many text edits and figure updates. For instance, in the sections on Clustering and Machine Learning.

All fully #opensource and #openaccess. Figures are #CCBY.

--> florian-huber.github.io/data_science...
May 14, 2025 at 8:10 PM
I would say triumph.
April 3, 2025 at 3:31 PM
Here is a short blog post on the typical data science workflow (if something like that even exists). Happy to take any feedback or suggestions.
--> medium.com/@f.huber/wat...

#DataScience #Python #Teaching
March 25, 2025 at 12:49 PM
@jjjvanderhooft.bsky.social sharing his vision on #matchms during our developer workshop @zdd-hsd.bsky.social.

#opensource #Python #massspec
March 13, 2025 at 9:16 AM
#matchms workshop in full swing!
Great fun to work with this fantastic group of people on improving, expanding, applying matchms for handling #massspec data in #Python.
March 12, 2025 at 3:28 PM
It took us quite a while... not easy with most authors undergoing career switches (me, too). But finally the last bit of my postdoc work with @marileend.bsky.social is now published in @biophysj.bsky.social.

#microtubules with EB3 growing against rigid barriers.
--> www.cell.com/biophysj/ful...
December 3, 2024 at 3:21 PM
Enjoyed exploring the beautiful map of #bluesky (from 2024-11-07) from @syntacrobat.xyz!

--> aurora.ndimensional.xyz
November 22, 2024 at 7:06 PM
Delighted that a project that took shape within a #machinelearning workshop at the @esciencecenter.bsky.social and which was initiated and led by #RozaKamioglu and #DisaSauter led to a data analysis of human #laughter sound

--> royalsocietypublishing.org/doi/10.1098/...

#OpenAccess #DataScience
November 20, 2024 at 11:18 AM
I continue to work on #opensource versions of my various teaching materials. Here is the first complete draft for my Python Introduction (for the moment only in German, but English version is on the Todo-list)
--> florian-huber.github.io/python-intro...

#DataScience #teaching
November 15, 2024 at 8:26 PM
New version of my (renamed) textbook: "Hands-on Introduction to Data Science using Python" 🎉

The content is now mostly complete. Text and figures will undergo further polishing.

--> florian-huber.github.io/data_science...
#datascience #opensource #teaching
October 14, 2024 at 12:48 PM