Lightnews — Scholar-powered news

Kriz Tahimic

@kriztahimic.bsky.social

27 followers 140 following 25 posts

Undergraduate | Interested in AI & Mechanistic Interpretability

Posts Replies Media Videos

Kriz Tahimic

@kriztahimic.bsky.social

Long way ahead 😅

December 19, 2024 at 4:51 AM

Kriz Tahimic

@kriztahimic.bsky.social

Even better - (I view this as a duality) that when we multiply a matrix by a vector, we're essentially translating that vector into the coordinate system defined by our matrix's basis vectors!

December 15, 2024 at 9:40 AM

Kriz Tahimic

@kriztahimic.bsky.social

🧵I’m still not 100% confident about this but this is where I currently stand.

December 2, 2024 at 1:49 PM

Kriz Tahimic

@kriztahimic.bsky.social

🧵(2) Even though simply Transformers is made to predict the next token as it scales it seems possible that it will continue to reliably predict the next proper token but this time tokens form the correct proofs or code.

December 2, 2024 at 1:49 PM

Kriz Tahimic

@kriztahimic.bsky.social

🧵There are two reasons that made me switch: (1) The current performance improvement to Maths (AlphaGeometry) and Coding (SWE-bench) just through scaling and improving the current paradigm alone.

December 2, 2024 at 1:49 PM

Kriz Tahimic

@kriztahimic.bsky.social

🧵But through my quick search before tweeting I come to the opposite conclusion.

December 2, 2024 at 1:49 PM

Kriz Tahimic

@kriztahimic.bsky.social

🧵My thinking is it wasn’t natural for the Transformers to learn to reason. Unlike CNN architecture to learn classification tasks. And transformers to translate or generate more words.

December 2, 2024 at 1:49 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news