Kristján Moore (Kris)🔸
banner
kristjanmoore.bsky.social
Kristján Moore (Kris)🔸
@kristjanmoore.bsky.social
Research at deCODE genetics: genomic ancestry, ancient DNA, and whatever else needs doing. Trying to have true beliefs. 🇬🇧🇮🇸
Supervised mode after all just fixes some rows of the Q matrix to 1s and 0s and doesn't actually fix allele frequencies - P still gets updated given the whole data. Projection mode does literally fix (rather, overfit) the P matrix to the ref data, generally with poor results in my experience.
September 19, 2025 at 4:44 PM
Indeed. But the inferred allele frequencies for a component (as given in the .P file) can get pulled away from the frequencies observed in its training pop and towards the freqs in the test samples if this allows better likelihood over the whole data, which can happen if test n >> training n.
September 19, 2025 at 4:44 PM
Agree that different reference data is clearly warranted here, but if test n >> training n you would still likely see this kind of behaviour.
September 16, 2025 at 1:37 PM
What I mean is that of the 5 superpopulations in 1000G, AMR has the lowest sample count. I have seen myself when test n >> training n that ADMIXTURE can basically give up on describing the ancestry of smaller training pops and instead use their maximised components to explain test sample variation.
September 16, 2025 at 1:37 PM
Seems they used all 3,502 1000G individuals split by superpopulation as training samples. AMR is the smallest superpop. If test samples >> training samples (perhaps the case here?), it's easiest for ADMIXTURE to sacrifice the smallest training pop's ancestry component to describe test set ancestry
September 16, 2025 at 1:10 PM
Frustrating. Well, at least I can feel like I've done *something* to dissuade people from this line of research.
July 4, 2025 at 1:40 PM