Kriz Tahimic
banner
kriztahimic.bsky.social
Kriz Tahimic
@kriztahimic.bsky.social
Undergraduate | Interested in AI & Mechanistic Interpretability
Long way ahead 😅
December 19, 2024 at 4:51 AM
Even better - (I view this as a duality) that when we multiply a matrix by a vector, we're essentially translating that vector into the coordinate system defined by our matrix's basis vectors!
December 15, 2024 at 9:40 AM
🧵I’m still not 100% confident about this but this is where I currently stand.
December 2, 2024 at 1:49 PM
🧵(2) Even though simply Transformers is made to predict the next token as it scales it seems possible that it will continue to reliably predict the next proper token but this time tokens form the correct proofs or code.
December 2, 2024 at 1:49 PM
🧵There are two reasons that made me switch: (1) The current performance improvement to Maths (AlphaGeometry) and Coding (SWE-bench) just through scaling and improving the current paradigm alone.
December 2, 2024 at 1:49 PM
🧵But through my quick search before tweeting I come to the opposite conclusion.
December 2, 2024 at 1:49 PM
🧵My thinking is it wasn’t natural for the Transformers to learn to reason. Unlike CNN architecture to learn classification tasks. And transformers to translate or generate more words.
December 2, 2024 at 1:49 PM