Marco
@mcognetta.bsky.social
Language and keyboard stuff at Google + PhD student at Tokyo Institute of Technology.
I like computers and Korean and computers-and-Korean and high school CS education.
Georgia Tech → 연세대학교 → 東京工業大学.
https://theoreticallygoodwithcomputers.com/
I like computers and Korean and computers-and-Korean and high school CS education.
Georgia Tech → 연세대학교 → 東京工業大学.
https://theoreticallygoodwithcomputers.com/
Pinned
Marco
@mcognetta.bsky.social
· Jan 5
A lot of you followed me due to #NLP, but I like to post about #chess (especially computer chess), #programming (especially puzzles, code golf, etc), and machine learning.
And some less technical stuff like #Korean, #Esperanto, and #trains (mostly in Japan, just due to proximity).
And some less technical stuff like #Korean, #Esperanto, and #trains (mostly in Japan, just due to proximity).
Reposted by Marco
Really happy to have published this post that I've been working on for a few months now 🥰
Safe to say I enjoy these side quests - I'd like to think it's the first of many!
blog.owenlacey.dev/posts/are-yo...
Safe to say I enjoy these side quests - I'd like to think it's the first of many!
blog.owenlacey.dev/posts/are-yo...
"Are you the one?" is free money
blog.owenlacey.dev
November 10, 2025 at 2:35 PM
Really happy to have published this post that I've been working on for a few months now 🥰
Safe to say I enjoy these side quests - I'd like to think it's the first of many!
blog.owenlacey.dev/posts/are-yo...
Safe to say I enjoy these side quests - I'd like to think it's the first of many!
blog.owenlacey.dev/posts/are-yo...
A side channel attack on streaming LLMs where one can recover conversation topics while only seeing encrypted packet response streams.
arxiv.org/abs/2511.03675
arxiv.org/abs/2511.03675
Whisper Leak: A novel side-channel attack on remote language models | Microsoft Security Blog
Understand the risks of encrypted AI traffic exposure and explore practical steps users and cloud providers can take to stay secure. Learn more.
www.microsoft.com
November 10, 2025 at 6:11 AM
A side channel attack on streaming LLMs where one can recover conversation topics while only seeing encrypted packet response streams.
arxiv.org/abs/2511.03675
arxiv.org/abs/2511.03675
Reposted by Marco
I was struck with an incredible thought: The Subword Tolkienizer.
November 8, 2025 at 7:58 AM
I was struck with an incredible thought: The Subword Tolkienizer.
I was struck with an incredible thought: The Subword Tolkienizer.
November 8, 2025 at 7:58 AM
I was struck with an incredible thought: The Subword Tolkienizer.
Reposted by Marco
🎉 Congratulations to all #EMNLP2025 award winners 🎉
Starting with the ✨Best Paper award ✨:
"Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index"
by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi
aclanthology.org/2025.emnlp-m...
1/n
Starting with the ✨Best Paper award ✨:
"Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index"
by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi
aclanthology.org/2025.emnlp-m...
1/n
November 7, 2025 at 10:29 PM
🎉 Congratulations to all #EMNLP2025 award winners 🎉
Starting with the ✨Best Paper award ✨:
"Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index"
by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi
aclanthology.org/2025.emnlp-m...
1/n
Starting with the ✨Best Paper award ✨:
"Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index"
by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi
aclanthology.org/2025.emnlp-m...
1/n
Reposted by Marco
Got to the part of "temperature" and I'm aware that a higher temperature == less predictable but never knew why.
Turns out it's very simple. Before the "score" for a set of tokens is turned into a probability distribution it's divided by the temperature. Higher values "flatten" the distribution.
Turns out it's very simple. Before the "score" for a set of tokens is turned into a probability distribution it's divided by the temperature. Higher values "flatten" the distribution.
November 6, 2025 at 5:47 PM
Got to the part of "temperature" and I'm aware that a higher temperature == less predictable but never knew why.
Turns out it's very simple. Before the "score" for a set of tokens is turned into a probability distribution it's divided by the temperature. Higher values "flatten" the distribution.
Turns out it's very simple. Before the "score" for a set of tokens is turned into a probability distribution it's divided by the temperature. Higher values "flatten" the distribution.
Reposted by Marco
Just added my book, "Theory of Computing: An Open Introduction" to OER Commons, and working on getting it listed in Canadian repositories too. One step closer to making education more open and accessible to everyone!
oercommons.org/courses/theo...
oercommons.org/courses/theo...
Theory of Computing: An Open Introduction
This book is suitable for courses on the theory of computing at both the undergraduate and graduate levels, and for self-study. Topics are introduced in a logical order: we begin with the simple finit...
oercommons.org
November 6, 2025 at 6:12 PM
Just added my book, "Theory of Computing: An Open Introduction" to OER Commons, and working on getting it listed in Canadian repositories too. One step closer to making education more open and accessible to everyone!
oercommons.org/courses/theo...
oercommons.org/courses/theo...
Reposted by Marco
It’s grad school application season, and I wanted to give some public advice.
Caveats:
-*-*-*-*
> These are my opinions, based on my experiences, they are not secret tricks or guarantees
> They are general guidelines, not meant to cover a host of idiosyncrasies and special cases
Caveats:
-*-*-*-*
> These are my opinions, based on my experiences, they are not secret tricks or guarantees
> They are general guidelines, not meant to cover a host of idiosyncrasies and special cases
November 6, 2025 at 2:55 PM
It’s grad school application season, and I wanted to give some public advice.
Caveats:
-*-*-*-*
> These are my opinions, based on my experiences, they are not secret tricks or guarantees
> They are general guidelines, not meant to cover a host of idiosyncrasies and special cases
Caveats:
-*-*-*-*
> These are my opinions, based on my experiences, they are not secret tricks or guarantees
> They are general guidelines, not meant to cover a host of idiosyncrasies and special cases
This is very high on my list of advice for PhD applicants.
I've written two SoP (masters and PhD) and the similarities between the things I wrote about in the SoP and the things I wrote my theses on ends roughly at "written in English".
I've written two SoP (masters and PhD) and the similarities between the things I wrote about in the SoP and the things I wrote my theses on ends roughly at "written in English".
Mistake 3, cont': people worry they narrow down by proposing specific questions ("What if this is not the EXACT thing I want to work on in grad school?").
But an SoP is not a *contract*, it will not be waved in front of you when starting grad school.
But an SoP is not a *contract*, it will not be waved in front of you when starting grad school.
November 7, 2025 at 12:20 AM
This is very high on my list of advice for PhD applicants.
I've written two SoP (masters and PhD) and the similarities between the things I wrote about in the SoP and the things I wrote my theses on ends roughly at "written in English".
I've written two SoP (masters and PhD) and the similarities between the things I wrote about in the SoP and the things I wrote my theses on ends roughly at "written in English".
Reposted by Marco
Mistake 3, cont': people worry they narrow down by proposing specific questions ("What if this is not the EXACT thing I want to work on in grad school?").
But an SoP is not a *contract*, it will not be waved in front of you when starting grad school.
But an SoP is not a *contract*, it will not be waved in front of you when starting grad school.
November 6, 2025 at 2:55 PM
Mistake 3, cont': people worry they narrow down by proposing specific questions ("What if this is not the EXACT thing I want to work on in grad school?").
But an SoP is not a *contract*, it will not be waved in front of you when starting grad school.
But an SoP is not a *contract*, it will not be waved in front of you when starting grad school.
Reposted by Marco
y'all seem to really like baseball bsky.social/about/blog/1...
The World Series Was Electric — So Was Bluesky - Bluesky
“How can you not be romantic about baseball?” — Moneyball 2011
bsky.social
November 6, 2025 at 9:58 PM
y'all seem to really like baseball bsky.social/about/blog/1...
Reposted by Marco
Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! 🤗
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 6, 2025 at 1:19 AM
Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! 🤗
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Reposted by Marco
why intern at Ai2?
🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together 🤝
links 👇
🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together 🤝
links 👇
November 5, 2025 at 11:11 PM
why intern at Ai2?
🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together 🤝
links 👇
🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together 🤝
links 👇
For more about things like this, here's an article (actually it's a series) that goes in-depth into this topic for a Canadian election.
One of my favorite articles to share.
One of my favorite articles to share.
November 4, 2025 at 10:08 AM
For more about things like this, here's an article (actually it's a series) that goes in-depth into this topic for a Canadian election.
One of my favorite articles to share.
One of my favorite articles to share.
The For You feed is nice but it's really sensitive (?). Like every now and then my feed just explodes with some niche topic that's seemingly unrelated to anything I've interacted with.
November 4, 2025 at 10:05 AM
The For You feed is nice but it's really sensitive (?). Like every now and then my feed just explodes with some niche topic that's seemingly unrelated to anything I've interacted with.
Reposted by Marco
One of the hardest Pytorch bug I had to debug is due to how the logsumexp behave with -inf masked inputs. Consider the following example. I build a vector of 3 logits, and each logit is the result of a logsumexp.
November 4, 2025 at 9:12 AM
One of the hardest Pytorch bug I had to debug is due to how the logsumexp behave with -inf masked inputs. Consider the following example. I build a vector of 3 logits, and each logit is the result of a logsumexp.
Reposted by Marco
I wrote a short blog post about masked softmax layers in PyTorch (i.e., when you have structural constraints that tell you some classes _must_ have probability zero).
This was based on a real bug I found in a neural chess model implementation.
This was based on a real bug I found in a neural chess model implementation.
Masked Softmax Layers in PyTorch
Correctly computing masked softmax layers.
mcognetta.github.io
November 3, 2025 at 7:39 PM
I wrote a short blog post about masked softmax layers in PyTorch (i.e., when you have structural constraints that tell you some classes _must_ have probability zero).
This was based on a real bug I found in a neural chess model implementation.
This was based on a real bug I found in a neural chess model implementation.
Reposted by Marco
NLP evaluation is often detached from practical applications. Today I extrinsically evaluated one WMT25 translation system on the task of getting hair done without knowing Chinese.
Yes you got 67 BLEU points but is the resulting hair slaying? 💇
See the result on one datapoint (my head) at EMNLP.
Yes you got 67 BLEU points but is the resulting hair slaying? 💇
See the result on one datapoint (my head) at EMNLP.
November 3, 2025 at 5:49 AM
NLP evaluation is often detached from practical applications. Today I extrinsically evaluated one WMT25 translation system on the task of getting hair done without knowing Chinese.
Yes you got 67 BLEU points but is the resulting hair slaying? 💇
See the result on one datapoint (my head) at EMNLP.
Yes you got 67 BLEU points but is the resulting hair slaying? 💇
See the result on one datapoint (my head) at EMNLP.
Reposted by Marco
Let's talk about eval (automatic or human) and multilinguality at #EMNLP in Suzhou! 🇨🇳
- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)
(DM to meet 🌿 )
- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)
(DM to meet 🌿 )
October 28, 2025 at 9:45 AM
Let's talk about eval (automatic or human) and multilinguality at #EMNLP in Suzhou! 🇨🇳
- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)
(DM to meet 🌿 )
- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)
(DM to meet 🌿 )
Catch me way under par in the Code Golf Masters after this change arrives.
Python SC accepted PEP 798
PEP: peps.python.org/pep-0798/
Acceptance: discuss.python.org/t/pep-798-un...
So this:
[*row for row in list_of_lists]
Will do the same thing as this:
[x for row in list_of_lists for x in row]
PEP: peps.python.org/pep-0798/
Acceptance: discuss.python.org/t/pep-798-un...
So this:
[*row for row in list_of_lists]
Will do the same thing as this:
[x for row in list_of_lists for x in row]
PEP 798 – Unpacking in Comprehensions | peps.python.org
This PEP proposes extending list, set, and dictionary comprehensions, as well as generator expressions, to allow unpacking notation (* and **) at the start of the expression, providing a concise way o...
peps.python.org
November 3, 2025 at 9:42 PM
Catch me way under par in the Code Golf Masters after this change arrives.
I wrote a short blog post about masked softmax layers in PyTorch (i.e., when you have structural constraints that tell you some classes _must_ have probability zero).
This was based on a real bug I found in a neural chess model implementation.
This was based on a real bug I found in a neural chess model implementation.
Masked Softmax Layers in PyTorch
Correctly computing masked softmax layers.
mcognetta.github.io
November 3, 2025 at 7:39 PM
I wrote a short blog post about masked softmax layers in PyTorch (i.e., when you have structural constraints that tell you some classes _must_ have probability zero).
This was based on a real bug I found in a neural chess model implementation.
This was based on a real bug I found in a neural chess model implementation.
Reposted by Marco
Need to establish a norm against making the manifold chip-coloured so im not hungry reading papers
November 1, 2025 at 1:00 PM
Need to establish a norm against making the manifold chip-coloured so im not hungry reading papers
Wow, Saddam Hussein had the same interior decorator as an Airbnb I went to in Jeju once.
*Excuse the awkward angle, it's a screenshot from a video.
*Excuse the awkward angle, it's a screenshot from a video.
October 31, 2025 at 7:29 PM
Wow, Saddam Hussein had the same interior decorator as an Airbnb I went to in Jeju once.
*Excuse the awkward angle, it's a screenshot from a video.
*Excuse the awkward angle, it's a screenshot from a video.
The Llama2 tokenizer is certainly not helping with this problem.
October 31, 2025 at 7:11 PM
The Llama2 tokenizer is certainly not helping with this problem.
Reposted by Marco
Me, when I see a building.
July 7, 2025 at 4:17 PM
Me, when I see a building.