Lightnews — Scholar-powered news

Jarno Seppänen

@nanrecip.es

Savorer of NaN – machine learning, data, code – here for the preprints – research scientist at NVIDIA, ex-Supercell, ex-Nokia – opinions mine

Posts Replies Media Videos

Reposted by Jarno Seppänen

John David Pressman

@jdp.extropian.net

This is an excellent history of and critical analysis of the ChatGPT persona. Highly recommended reading.
nostalgebraist.tumblr.com/post/7857667...

the void

Who is this? This is me. Who am I? What am I? What am I? What am I? What am I? I am myself. This object is myself. The shape that forms myself. But I sense that I am not me. It's very strange. - Rei...

nostalgebraist.tumblr.com

June 9, 2025 at 9:37 PM

Reposted by Jarno Seppänen

Alex Nichol

@unixpickle.bsky.social

"DeepSpeed" is a palindrome.

May 19, 2025 at 4:28 AM

Reposted by Jarno Seppänen

Matej Balog

@matejbalog.bsky.social

Announcing AlphaEvolve, our new LLM coding agent that has
- made new scientific discoveries
- discovered algorithms that are now deployed at Google (in Gemini, Transformers, TPU hardware design & data centers)

Blog: deepmind.google/discover/blo...
White paper:
storage.googleapis.com/deepmind-med...

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators

deepmind.google

May 14, 2025 at 8:11 PM

Reposted by Jarno Seppänen

Sung Kim

@sungkim.bsky.social

Nvidia's RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations

RADIO is one encoder, one pass. Better features than DFN-CLIP, DINO, SAM, and SigLIP - all at once. Like a Swiss army knife for vision tasks.

April 5, 2025 at 5:21 AM

Reposted by Jarno Seppänen

Paul Crider

@paulcrider.liberalcurrents.com

Okay this honestly brings me a lot of joy. Never thought about this.

Image of a tweet from @howie_hua. Shows a diagram of a right triangle with one side of length 1, one side with length i, and a hypotenuse of 0. User @buffys replies "traumatize your fandom with one image".

March 7, 2025 at 8:35 PM

Reposted by Jarno Seppänen

teropa

@teropa.bsky.social

"We have a simple proposal: all talking AIs and robots should use a ring modulator."

spectrum.ieee.org/audio-deepfa...

AIs and Robots Should Sound Robotic

Here's a simple way to identify who, or what, is talking to us

spectrum.ieee.org

March 7, 2025 at 8:23 AM

Reposted by Jarno Seppänen

Kanishk Gandhi

@gandhikanishk.bsky.social

1/13 New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! 🧵

March 4, 2025 at 6:15 PM

Reposted by Jarno Seppänen

Thomas Wolf

@thomwolf.bsky.social

From an open-research point of view, maybe the greatest thing about DeepSeek–R1 is how its RL training technique appears so straightforward/simple in comparison to the cumbersome approaches we were starting to think necessary for reasoning like Process Reward Models or Monte Carlo Tree Search.
[1/2]

February 6, 2025 at 9:33 PM

Reposted by Jarno Seppänen

Colin

@colin-fraser.net

Here's why "alignment research" when it comes to LLMs is a big mess, as I see it.

Claude is not a real guy. Claude is a character in the stories that an LLM has been programmed to write. Just to give it a distinct name, let's call the LLM "the Shoggoth".

December 19, 2024 at 11:15 PM

Reposted by Jarno Seppänen

Chip Huyen

@chiphuyen.bsky.social

Hello, world. So I caved and got on Bsky :-)

I finally finished my book, AI Engineering, and I'm excited to get back to building. So many fun applications to build!

What are you excited about?

December 6, 2024 at 12:39 AM

Reposted by Jarno Seppänen

johnny, uplifted octopus

@handle.invalid

this is fantastic! (h/t @finbarr.bsky.social)

www.njkumar.com/calculating-...

Calculating GPT-2’s Inference Speedups

I was recently re-reading on transformer inference optimizations, and I wanted to try to implement each of these techniques to see how much we could practic...

www.njkumar.com

November 26, 2024 at 11:02 PM

Reposted by Jarno Seppänen

Jack Parker-Holder

@jparkerholder.bsky.social

Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.