Mostly #NLP, #AI, #SoftwareEngineering
With functionalities that were on our TODO list for a looooong time: Flash Entropy and BLINK scores! The new "FlashSimilarity" allows computing modified cosine, spectral entropy etc., about 100x faster (or more if you use Linux).
#Python #opensource #massspec
With functionalities that were on our TODO list for a looooong time: Flash Entropy and BLINK scores! The new "FlashSimilarity" allows computing modified cosine, spectral entropy etc., about 100x faster (or more if you use Linux).
#Python #opensource #massspec
We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice.
An example use-case are chemical space visualizations.
Preprint: www.biorxiv.org/content/10.1...
We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice.
An example use-case are chemical space visualizations.
Preprint: www.biorxiv.org/content/10.1...
A huge issue is bit collisions.
Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules.
--> Consider using sparse fingerprints!
--> Morgan >> MAP4 / RDKit
A huge issue is bit collisions.
Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules.
--> Consider using sparse fingerprints!
--> Morgan >> MAP4 / RDKit
We focused on weaknesses of the fingerprints.
Many show frequent duplicates, so same fingerprint for different compounds. Most problematic: this can include *very* different compounds ending up with identical fingerprints.
- MAP4 >> Morgan-type >> daylight
- count >> binary
#cheminformatics
We focused on weaknesses of the fingerprints.
Many show frequent duplicates, so same fingerprint for different compounds. Most problematic: this can include *very* different compounds ending up with identical fingerprints.
- MAP4 >> Morgan-type >> daylight
- count >> binary
#cheminformatics
1/4
@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.
Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.
--> www.biorxiv.org/content/10.1...
1/4
@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.
Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.
--> www.biorxiv.org/content/10.1...
Curious to listen to some interesting Talks and meet new people 😀
Curious to listen to some interesting Talks and meet new people 😀
rlhfbook.com
rlhfbook.com
Reality is, there is no AGI, just language models. If you think they are giving you some truth, the stupidity is on you, not the LLM.
Reality is, there is no AGI, just language models. If you think they are giving you some truth, the stupidity is on you, not the LLM.
@chrmanning.bsky.social @shikharmurty.bsky.social
@chrmanning.bsky.social @shikharmurty.bsky.social
#WissZeitVG
@drkeichhorn.bsky.social @kubon.bsky.social
#WissZeitVG
@drkeichhorn.bsky.social @kubon.bsky.social
It can't. But, what DOGE accidentally revealed about themselves in the process is fascinating. 🧵
It can't. But, what DOGE accidentally revealed about themselves in the process is fascinating. 🧵
I can't stop thinking about that.
I can't stop thinking about that.
👉 2025.aclweb.org/calls/studen...
Send your paper by ⏰ May 18th, 2025 🚀 #nlproc
👉 2025.aclweb.org/calls/studen...
Send your paper by ⏰ May 18th, 2025 🚀 #nlproc
@cvprconference.bsky.social ! 🚀
🌐 sites.google.com/view/mmfm3rd...
📅 Deadline: Apr 1, 2025 (non-proceedings)
📝 Submission: cmt3.research.microsoft.com/MMFM2025
@cvprconference.bsky.social ! 🚀
🌐 sites.google.com/view/mmfm3rd...
📅 Deadline: Apr 1, 2025 (non-proceedings)
📝 Submission: cmt3.research.microsoft.com/MMFM2025