Lightnews — Scholar-powered news

January 20, 2026 at 1:25 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

i've written a few blog posts lately on semantic mappings, SSSOM, JSKOS, and automated assembly of data and knowledge. i'm also always very proud to do this by hand, without AI

1. SSSOM and Wikidata: https://cthoyt.com/2026/01/08/sssom-to-wikidata.html
2. SSSOM and JSKOS […]

Mapping from SSSOM to Wikidata

January 16, 2026 at 4:26 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

a data modeling language that errors when _real_ examples aren't given for all fields, all structures, all everythings

January 14, 2026 at 4:03 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

every wanted to put semantic mappings in SSSOM into @wikidata

now you can

https://cthoyt.com/2026/01/08/sssom-to-wikidata.html

At the 4th Ontologies4Chem Workshop in Limburg an der Lahn, I proposed an initial crosswalk between the Simple Standard for Sharing Ontological Mappings (SSSOM) and the Wikidata semantic mapping data model. This post describes the motivation for this proposal and the concrete implementation I’ve developed in `sssom-pydantic`. This work is part of the NFDI’s Ontology Harmonization and Mapping Working Group, which is interested in enabling interoperability between SSSOM and related data standards that encode semantic mappings. The TL;DR for this post is that I implemented a mapping from SSSOM to Wikidata in `sssom-pydantic` in cthoyt/sssom-pydantic#32. One high-level entrypoint is the following function, which reads an SSSOM file and prepares QuickStatements which can be reviewed in the web browser, then uploaded to Wikidata. This script can be run from Gist with `uv run https://gist.github.com/cthoyt/f38d37426a288989158a9804f74e731a#file-sssom-wikidata-demo-py` ## Semantic Mappings in SSSOM The Simple Standard for Sharing Ontological Mappings (SSSOM) is a community-driven data standard for semantic mappings, which are necessary to support (semi-)automated data integration and knowledge integration, such as in the construction of knowledge graphs. While SSSOM primary a tabular data format that is best serialized in TSV, it uses LinkML to formalize the semantics of each field such that SSSOM can be serialized to and read from OWL, RDF, and JSON-LD. Here’s a brief example: subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification ---|---|---|---|---|--- wikidata:Q128700 | cell wall | skos:exactMatch | GO:0005618 | cell wall | semapv:ManualMappingCuration wikidata:Q47512 | acetic acid | skos:exactMatch | CHEBI:15366 | acetic acid | semapv:ManualMappingCuration ## Semantic Mappings in Wikidata Wikidata has two complementary formalisms for representing semantic mappings. The first uses the exact match (P2888) property with a URI as the object. For example, cell wall (Q128700) maps to the Gene Ontology (GO) term for cell wall by its URI `http://purl.obolibrary.org/obo/GO_0005618`. The second formalism uses semantic space-specific properties (e.g. P683 for ChEBI) with local unique identifiers as the object. For example, acetic acid (Q47512) maps to the ChEBI term for acetic acid using the P683 property for ChEBI and local unique identifier for acetic acid (within ChEBI) `15366`. Wikidata has a data structure that enables annotating qualifiers onto triples. Therefore, other parts of semantic mappings modeled in SSSOM can be ported: 1. Authors and reviewers can be mapped from ORCiD identifiers to Wikidata identifiers, then encoded using the S50 and S4032 properties, respectively 2. A SKOS-flavored mapping predicate (i.e., exact, narrow, broad, close, related) can be encoded using the S4390 property 3. The publication date can be encoded using the S577 property 4. The license can be mapped from text to a Wikidata identifier, then encoded using the S275 property Note that properties that normally start with a `P` when used in triples are changed to start with an `S` when used as qualifiers. Other fields in SSSOM could potentially be mapped to Wikidata later. ### Finding Wikidata Properties using the Semantic Farm The Semantic Farm (previously called the Bioregistry) maintains mappings between prefixes that appear in compact URIs (CURIEs) and their corresponding Wikidata properties. For example, the prefix `CHEBI` maps to the Wikidata property P683. These mappings can be accessed in several ways: 1. via the Semantic Farm’s SSSOM export. Note: this requires subsetting to mappings where Wikidata properties are the object. 2. via the Semantic Farm’s live API, 3. via the Bioregistry Python package (this will get renamed to match Semantic Farm, eventually) using the following code: import bioregistry # get bulk prefix_to_property = bioregistry.get_registry_map("wikidata") # get for a single resource resource = bioregistry.get_resource("chebi") chebi_wikidata_property_id = resource.get_mapped_prefix("wikidata") ## Notable Implementation Details I’ve previously built two package which were key to making this work: 1. `wikidata-client`, which interacts with the Wikidata SPARQL endpoint and has high-level wrappers around lookup functionality. I’m also aware of WikidataIntegrator - I’ve contributed several improvements, but working with its codebase doesn’t spark joy and the last time I tried to use it, it was fully broken due to some of its dependencies not working on modern Python. 2. `quickstatements-client`, which implements an object model for QuickStatements v2 and an API client. Along the way to this PR, I made improvements to the wikidata-client in cthoyt/wikidata-client#2 to add high-level functionality for looking up multiple Wikidata records based on values for a property (e.g., to support ORCID lookup in bulk). All other changes were made in `sssom-pydantic` in cthoyt/sssom-pydantic#32. The other key challenge was to avoid adding duplicate information to Wikidata - unlike a simple triple store, we could accidentally end up with duplicate statements. Therefore, the sssom-pydantic implementation looks up all existing semantic mappings in Wikidata for entities appearing in an SSSOM file, then filters appropriately to avoid uploading duplicate mappings to Wikidata. ## Pulling it All Together This new module in `sssom-pydantic` implements the following interactive workflows: 1. Read an SSSOM file, convert mappings to Wikidata schema, then open a QuickStatements tab in the web browser using `read_and_open_quickstatements()` 2. Convert in-memory semantic mappings to the Wikidata schema, then open a QuickStatements tab in the web browser using `open_quickstatements()` Here’s what the QuickStatements web interface looks like after preparing some demo mappings: It also implements the following non-interactive workflows, which should be used with caution since they write directly to Wikidata: 1. Read an SSSOM file, convert mappings to Wikidata schema, then post non-interactively to Wikidata via QuickStatements using `read_and_post()` 2. Convert in-memory semantic mappings to the Wikidata schema, then post non-interactively to Wikidata via QuickStatements using `post()` * * * I’m a bit hesitant to start uploading SSSOM content to Wikidata in bulk, because I don’t yet have a plan for how to maintain mappings that might change over time in their upstream single source of truth, e.g., mappings curated in Biomappings. Otherwise, I think this is a good proof of concept and would like to get feedback about additional qualifiers that could be added, and if the ones I chose so far were the best.

Biocuration: from Evidence to Classification

January 8, 2026 at 5:54 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

the Variants and Us (VUS) Podcast did an episode focused on biocuration last summer, with focus on genomics and analysis of variants:

https://open.spotify.com/episode/3WphkfnZQMbS0q97wYZ3Ze?si=HyskwSw3RGig9pXMvhp2JA

def relevant for the @biocurator community

Variants and Us (VUS) Podcast · Episode

open.spotify.com

January 5, 2026 at 4:49 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

RE: https://xoxo.zone/@davidcelis/115748110273679680

i feel the same way

xoxo.zone

December 20, 2025 at 11:16 AM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

i've written about my experience at the @deNBI BioHackathon Germany 2025 with the @dalia, TeSS, and Bioschemas teams #bhg2025

https://cthoyt.com/2025/12/09/biohackathon-de-2025.html

Machine-Actionable Training Materials at BioHackathon Germany 2025

December 15, 2025 at 5:32 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

RE: https://scicomm.xyz/@ORCID_Org/115707294780355754

use ORCID the way they suggest. but also definitely keep using your personal email.

more thoughts on this: https://cthoyt.com/2022/02/06/use-your-personal-email.html

scicomm.xyz

December 12, 2025 at 4:44 PM

Reposted by Charles Tapley Hoyt

Ted

@desfontain.es

Me: urgh, why is Typst using _text_ for italic and *text* for bold, this is a pointless and annoying divergence from the Markdown syntax

Me after a week of using Typst: wait actually this makes so much more sense than Markdown's syntax, all hail Typst

December 8, 2025 at 4:17 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

biomappings is a project for predicting and curating semantic mappings between biomedical vocabularies in SSSOM

i'm working in @NFDI with researchers from other disciplines, so I recently did a full refactor of the underlying code into a new project, SSSOM Curator ( […]

November 24, 2025 at 10:25 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

I spent the entire day curating new prefixes for @bioregistry to support interoperability in the @NFDI -> nearly 100 new prefixes coming up across digital humanities, engineering, computer science, and more

November 18, 2025 at 4:06 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

@typst it would be cool if I could cite directly using a DOI, and you took care of looking up the metadata from crossref (or wherever) for me.

rather than @mcmurry2017 I would love to be able to do @doi:10.1371/journal.pbio.2001414

this is possible in Manubot (https://manubot.org) - see docs […]

GitHub - Beilstein-Institut/BChemLookup: An Open Science Initiative for Mapping Common Chemical Abbreviations

November 5, 2025 at 2:29 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

@BeilsteinInstitut I saw you're making an open source database https://github.com/Beilstein-Institut/BChemLookup. I already added a PR to help fix some data errors, would love some feedback.

An Open Science Initiative for Mapping Common Chemical Abbreviations - Beilstein-Institut/BChemLookup

github.com

November 5, 2025 at 2:16 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

The EBI has recently published a preprint describing OxO2, the second major version of their ontology mapping service, now based on SSSOM: https://arxiv.org/abs/2506.04286

nice to see citation of SeMRA and reuse of the comprehensive SSSOM semantic mapping datasets we produced and archived on […]

November 4, 2025 at 10:24 AM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

i am low-key offended when people paste text into my google docs that don't have spacing after paragraphs

October 22, 2025 at 3:10 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

@julian cool to see what you're building in @encyclia. what language are you developing it in?

October 19, 2025 at 4:01 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

in my first double blog post ever, I wrote about encoding databases as ontologies, the PyOBO software package, and the design choices and philosophy behind the HGNC (@genenames) to ontology converter.

1️⃣ background and software - […]

Original post on neuromatch.social

October 15, 2025 at 9:45 PM

Reposted by Charles Tapley Hoyt

jonny (good kind)

@jonny.neuromatch.social.ap.brid.gy

You are a very busy, very important professor publishing very important work. Do you
a) just publish the code and data along with the paper because you know your work will survive close scrutiny and you have better things to do
b) spend your time handling individual data requests, negotiating […]

neuromatch.social

October 8, 2025 at 9:19 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

after much ado, I have finished writing about bridging the @nfdi4culture and @NFDI4Chem@nfdi.socialknowledge graphs

📖 https://cthoyt.com/2025/10/07/bridging-culture-and-chemistry.html

the demo was to link experiments in Chemotion electronic lab notebooks […]

[Original post on scholar.social]

A schematic diagram of the boundaries between data resources that had to be federated to connect experiments in the Chemotion electronic lab notebook (ELN) that annotate the type of instrument used to gather data (e.g., microscope) with depictions of that instrument in paintings

October 8, 2025 at 12:15 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

have you ever annotated a TypedDict onto the **kwargs fuction and want Sphinx to automatically add it to the function's docstring?

class GreetingKwargs(TypedDict):
name: Annotated[str, Doc("the name of the person to greet")]

def greet(**kwargs […]

[Original post on scholar.social]

October 3, 2025 at 3:30 PM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

@ktk is there a programmatic way to get the list of prefixes/URI namespaces in a QLever instance?

I would love to add a tool to @bioregistry like the one that gets the prefix list from Virtuoso services and validates it (PR in https://github.com/biopragmatics/bioregistry/pull/1691)

Add Virtuoso prefix map validation by cthoyt · Pull Request #1691 · biopragmatics/bioregistry

Closes #1688 This can be run to validate the NFDI4Culture's endpoint $ bioregistry validate virtuoso https://nfdi4culture.de/sparql note that this doesn't work for all triple stores, just V...

github.com

September 26, 2025 at 9:01 AM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

@ResearchOrgs I just read the example organization in your bulk curation form is the "University of ROR" and I think that's very funny :)

September 26, 2025 at 8:34 AM

Charles Tapley Hoyt

@cthoyt.scholar.social.ap.brid.gy

I made a workflow to pull relations between organizations on @wikidata that have @ResearchOrgs identifiers and put them in a format that could be incorporated into ROR

📖 write-up here https://cthoyt.com/2025/09/25/enriching-ror-with-wikidata.html

Suggesting new relations in ROR from Wikidata