Chris Mungall
cmungall.bsky.social
Chris Mungall
@cmungall.bsky.social
Berkeley Lab, Environmental Genomics and Systems Biology division. #GeneOntology #MonarchInitiative #AllianceGenome #NationalMicrobimeDataCollaborative #OBOFoundry.
In order to fine tune the reasoner model, the authors used three kinds of soft verifiers in the RL loop - experimental (e.g. CRISPRi knockdown), "simulation" (e.g Transcriptformer), and knowledge-based. For knowledge-based, they used GO @geneontology.bsky.social!
August 25, 2025 at 12:41 AM
The applications of this are very interesting, allowing for interrogation in natural language, as well as background reasoning over the wealth of biology in the literature. So you can ask what happens to other genes if you knock down a gene in a cell type, and get a biological explanation
August 25, 2025 at 12:41 AM
@severaltimes.bsky.social talking about the BioPortal MCP at #BOSC2025 / #BOKR2025 #ISMBECCB2025
July 22, 2025 at 1:31 PM
And thank you to the ENCODE team who laid the groundwork for this AI work ten years ago, and took such care with annotating using standard ontologies pmc.ncbi.nlm.nih.gov/articles/PMC...
June 27, 2025 at 6:04 PM
And the example Colab notebook shows how you can use UBERON terms in API calls to explore tissue specificity. Nice! colab.research.google.com/github/googl...
June 27, 2025 at 6:04 PM
De Crécy-Lagard showed that when the DL approach was used on proteins that differed from those in the training set (the "unknome"), many of the predicted functions were biologically implausible or impossible, based on prior "deep knowledge" of microbial gene function, and hence likely wrong.
June 6, 2025 at 4:49 AM
Here’s the results with the quotes. It’s reasonable to assume something like these snippets are included in the prompt, which will confound a simple LLM (even though to us it’s obvious they are unrelated)
January 4, 2025 at 5:40 AM
Hey @Sunbasketmeals your delivery company forgot to configure the RAG on their rubbish AI chatbot
December 7, 2024 at 10:34 PM
Glad to see our OntoGPT/SPIRES paper finally out in Bioinformatics!
academic.oup.com/bioinformati.... Great work from Harry Caufield who led the study, and all the authors. SPIRES uses a schema and ontology driven approach to extract complex knowledge nuggets from text.
February 22, 2024 at 10:41 PM
Hey @Docker, what's with the sudden revoking of our sponsored open source subscription? We are getting this for @linkml_data and colleagues getting the same thing for @OBOFoundry. 🙏 for the O/S subscription, but more advance warning of its cancelation would have been nice☹️
December 7, 2024 at 10:34 PM
The idea here is to extract structured information from free text, e.g. a description of a person into a LinkML schema such as the tutorial PersonInfo schema https://github.com/linkml/linkml/blob/main/examples/PersonSchema/personinfo.yaml
December 7, 2024 at 10:44 PM
It also turns out the latent GPT KB ("no synopsis") method has an unfair advantage. For larger gene sets, synopses get truncated due to prompt length constraints. When we control for this and look at only gene sets <75 genes in size, ontological descriptions emerge as the winner!
December 7, 2024 at 11:56 PM
Our first results show that on the one hand GPT, does pretty well - when using gpt-3.5-turbo, most of the terms returned are actually statistically significant (0.65 when using refseq summaries). However, we rarely saw the most informative term included.
December 7, 2024 at 11:46 PM
Here is an example of SPINDOCTOR results overlaid on top of bona-fide GO enrichment results for sensory ataxia genes (significant terms indicated with bonferonni adjusted p-vals). Gene description sources as boxes (ONT=ontological description, NAR=narrative refseq, NS=no summary)
December 7, 2024 at 11:31 PM
We created a tool called SPINDOCTOR that performs summarization of gene sets. The idea is simple: after normalizing the input gene sets, it retrieves external gene descriptions, and generate a prompt which is fed to @OpenAI. The results are then parsed to ontology terms.
December 7, 2024 at 11:05 PM
Now all of a sudden everyone is asking their biological questions of ChatGPT and other AI agent. You can even feed it a list of gene symbols, and say "what's going on with all these genes?". And it will give you a plausible answer!
December 7, 2024 at 10:54 PM
Are #LLMs capable of interpreting the results of high-throughput genomics experiments? Given a list of genes (e.g. all genes over expressed under a certain condition), can an LLM tell us what those genes have in common, suggesting underlying biological mechanisms? 🧵
December 7, 2024 at 10:34 PM
Qualitatively, the results are variable but usually interesting. We learned a bit about controlling hallucinations. Extraction tasks are in general less prone to this than querying the GPT "knowledge base" directly, but this all feels more art than science at the moment...
December 7, 2024 at 11:20 PM
As an optional next step, this can be further transformed into an OWL TBox and reasoned over, allowing results to be auto-classified and validated for logical inconsistencies.
December 7, 2024 at 11:15 PM
This results in a structured nested document (YAML or RDF) conforming to the @linkml_data schema. Results are highly variable but usually informative about gaps in ontologies. Here we can see #FoodOn has good coverage but still some gaps...
December 7, 2024 at 11:10 PM
SPIRES stands for Structured Prompt Interrogation and Recursive Extraction of Semantics. It's geared at rich schemas (> just Relation Extraction) - for example, a recipe or a biological pathway doesn't really fit into a flat TSV structure, instead we break into nested classes
December 7, 2024 at 10:44 PM
Here's our pre-print describing our GPT-3 based knowledge extraction tool SPIRES: https://arxiv.org/abs/2304.02711. Great work from @harry_caufield et al! SPIRES allows you to specify a knowledge schema (in @linkml_data) and then populate instances of that schema from unstructured...
December 7, 2024 at 10:34 PM
And special thanks to @figgyjam for the awesome logo!
December 7, 2024 at 11:10 PM
We also have some preliminary support for import and export from the @cytoscape CX format, and for retrieving networks from the awesome @NDExProject (thanks to help from @benjamingyori). See https://github.com/INCATools/ontology-access-kit/pull/479 for more details
December 7, 2024 at 10:49 PM
Now you can explore RO (https://oborel.github.io/) like any other ontology, with inverses, domains, and ranges treated as edges. See https://github.com/INCATools/ontology-access-kit/pull/466
December 7, 2024 at 10:44 PM