Lightnews — Scholar-powered news

Reposted

The Verge

@theverge.com

OpenAI launches new tools to help developers build AI agents

OpenAI’s new Responses API comes with web search, the ability to look through files, and computer use out of the box.

buff.ly

March 11, 2025 at 5:10 PM

Reposted

Simon Willison

@simonwillison.net

Wrote up my first impressions of the new GPT-4.5 - it's quite slow, VERY expensive and doesn't appear to be a notable leap forward from GPT-4o or o3-mini simonwillison.net/2025/Feb/27/...

Introducing GPT-4.5

GPT-4.5 is out today as a "research preview" - it's available to OpenAI Pro ($200/month) customers but also to developers with an API key. OpenAI also published [a GPT-4.5 system …

simonwillison.net

February 27, 2025 at 9:26 PM

Reposted

Ethan Mollick

@emollick.bsky.social

Been using GPT-4.5 for a few days and it is a very odd and interesting model. It can write beautifully, is very creative, and is occasionally oddly lazy on complex projects.

Feels like Claude 3.7 while Claude 3.7 feels like GPT-4.5.

February 27, 2025 at 8:30 PM

Reposted

Sung Kim

@sungkim.bsky.social

OpenAI GPT API pricing.

I don't think OpenAI wants people using GPT-4.5.

February 27, 2025 at 8:45 PM

Reposted

Simon Willison

@simonwillison.net

Some initial notes on Claude 3.7 Sonnet, including SVGs of pelicans on bicycles (which it does very well) simonwillison.net/2025/Feb/24/...

Claude 3.7 Sonnet and Claude Code

Anthropic released **Claude 3.7 Sonnet** today - skipping the name "Claude 3.6" because the Anthropic user community had already started using that as the unofficial name for their [October update …

simonwillison.net

February 24, 2025 at 8:51 PM

Reposted

WIRED

@wired.com

Claude 3.7, the latest model from Anthropic, can be instructed to engage in a specific amount of reasoning to solve hard problems.

Anthropic Launches the World’s First ‘Hybrid Reasoning’ AI Model

Claude 3.7, the latest model from Anthropic, can be instructed to engage in a specific amount of reasoning to solve hard problems.

buff.ly

February 24, 2025 at 6:50 PM

Reposted

Sung Kim

@sungkim.bsky.social

Claude 3.7 Sonnet

February 24, 2025 at 6:57 PM

Reposted

Sung Kim

@sungkim.bsky.social

tiny-gpu

A minimal GPU in Verilog optimized for learning about how GPUs work from the ground up.

Built with <15 files of fully documented Verilog, complete documentation on architecture & ISA, working matrix addition/multiplication kernels, and full support for kernel simulation & execution traces

February 11, 2025 at 5:22 AM

Reposted

Ethan Mollick

@emollick.bsky.social

Surprised we haven't seen more about Deepseek r1-zero (no one seems to host it?)

Unlike r1, which was trained to "think" in a readable, kinda charming way, r1-zero is the self-trained reasoner that had the *aha moment* about math & produces "thoughts" that are not human readable

January 31, 2025 at 2:17 AM

Reposted

Sung Kim

@sungkim.bsky.social

ByteDance's veRL: Volcano Engine Reinforcement Learning for LLM

veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs).

github.com/volcengine/v...

GitHub - volcengine/verl: veRL: Volcano Engine Reinforcement Learning for LLM

veRL: Volcano Engine Reinforcement Learning for LLM - volcengine/verl

github.com

January 30, 2025 at 5:10 AM

Reposted

Ted Underwood

@tedunderwood.com

The story Nathan tells here is more nuanced than the headline implies. He thinks chain-of-thought abilities learned from RL will generalize beyond domains like math and code that are easy to verify. But generalization might be uneven and might require bootstrapping new verifiers. +

Nathan Lambert @natolambert.bsky.social · Jan 28

Why reasoning models will generalize
DeepSeek R1 is just the tip of the ice berg of rapid progress.
People underestimate the long-term potential of “reasoning.”

Why reasoning models will generalize

People underestimate the long-term potential of “reasoning.”

buff.ly

January 29, 2025 at 1:42 PM

Reposted

Sung Kim

@sungkim.bsky.social

DeepSeek has released the Janus model.

Model: huggingface.co/deepseek-ai/...

They have also released two Janus Pro models as well.

Model 1B: huggingface.co/deepseek-ai/...
Model 7B: huggingface.co/deepseek-ai/...

January 27, 2025 at 5:57 PM

Reposted

Simon Willison

@simonwillison.net

DeepSeek R1 appears to be a VERY strong model for coding - examples for both C and Python here: simonwillison.net/2025/Jan/27/...

ggml : x2 speed for WASM by optimizing SIMD

PR by Xuan-Son Nguyen for `llama.cpp`: > This PR provides a big jump in speed for WASM by leveraging SIMD instructions for `qX_K_q8_K` and `qX_0_q8_0` dot product functions. > > …

simonwillison.net

January 27, 2025 at 6:33 PM

Reposted

Simon Willison

@simonwillison.net

OpenAI's Canvas feature got a big upgrade today, turning it into a direct competitor for Anthropic's excellent Claude Artifacts feature - my notes here: simonwillison.net/2025/Jan/25/...

OpenAI Canvas gets a huge upgrade

[Canvas](https://openai.com/index/introducing-canvas/) is the ChatGPT feature where ChatGPT can open up a shared editing environment and collaborate with the user on creating a document or piece of co...

simonwillison.net

January 25, 2025 at 1:26 AM

Reposted

Sung Kim

@sungkim.bsky.social

DeepSeek-R1!

⚡ Performance on par with OpenAI-o1
📖 Fully open-weight model & technical report
🏆 MIT licensed: Distill & commercialize freely!

🌐 Website & API are live now!
Demo: chat.deepseek.com
Models: huggingface.co/deepseek-ai

January 20, 2025 at 3:12 PM

Reposted

Simon Willison

@simonwillison.net

DeepSeek released a whole family of inference-scaling / "reasoning" models today, including distilled variants based on Llama and Qwen

Here are my notes on the new models, plus how I ran DeepSeek-R1-Distill-Llama-8B on my Mac using Ollama and LLM

simonwillison.net/2025/Jan/20/...

DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B

DeepSeek are the Chinese AI lab who dropped the best currently available open weights LLM on Christmas day, DeepSeek v3. That model was trained in part using their unreleased R1 …

simonwillison.net

January 20, 2025 at 3:22 PM

Reposted

Thomas Wolf

@thomwolf.bsky.social

Mind blown 👇 when people ask whether you need an agent framework at all!

All evals should move to agentic evals in 2025 in my opinion.

We’re just leaving so much capabilities of our models on the table.

Benchmarked with smolagents: github.com/huggingface/...

January 15, 2025 at 10:40 AM

Reposted

Gus

@gusthema.bsky.social

Unlock a universe of AI personalities with ONE 💎 Gemma model! 🤯

Customer Service: 💎+❤️ = Empathetic Gemma😊
Marketing: 💎+💡 = Idea Generator Gemma🚀
Coding: 💎+💻 = Code Guru Gemma👩‍💻

Multiple LoRA adapters on the same GCP endpoint!
Customize your AI and maximize your resources

medium.com/google-cloud...

Open Models on Vertex AI with Hugging Face: Serving multiple LoRA Adapters on Vertex AI

This blog post provides a practical example of how to deploy a Gemma 2 model with multiple LoRA adapters on Vertex AI using custom…

medium.com

January 14, 2025 at 9:53 AM

Reposted

Vicki

@vickiboykis.com

here's the post

vickiboykis.com/2025/01/14/h...

January 14, 2025 at 10:55 PM

Reposted

Simon Willison

@simonwillison.net

My notes so far on Codestral 25.01 - the new code-focused API-only LLM released today by @mistralai.bsky.social simonwillison.net/2025/Jan/13/...

Codestral 25.01

Brand new code-focused model from Mistral. Unlike [the first Codestral](https://simonwillison.net/2024/May/30/codestral/) this one isn't ([yet](https://twitter.com/sophiamyang/status/18789084748114046...

simonwillison.net

January 13, 2025 at 9:45 PM

Reposted

Ezgi Korkmaz

@ezgikorkmaz.bsky.social

I wrote a recent survey about deep reinforcement learning. The paper is a compact guide to understand some of the key concepts in reinforcement learning.

Link: arxiv.org/pdf/2401.023...

#ReinforcementLearning #ICLR2025 #ACL2025 #NAACL2025 #NeurIPS2024 #ICML2025 #DeepRL #DeepReinforcementLearning

January 12, 2025 at 4:21 PM

Reposted

Ethan Mollick

@emollick.bsky.social

Most of the talk around AI and energy use refer to an older 2020 estimate of GPT-3 energy consumption, but a more recent paper directly measures energy use of Llama 65B as 3-4 joules per decoded token.

So an hour of streaming Netflix is equivalent to 70-90,000 65B tokens. arxiv.org/pdf/2310.03003

January 13, 2025 at 2:43 AM

Reposted

SebasFC

@sebasfc.bsky.social

If you have 15 minutes to read and learn Something important regarding #Energy and #AI... Please read this Christmas present from @mliebreich.bsky.social to the world