Lightnews — Scholar-powered news

Johannes Hoffart

@hoffart.ai

SAP-RPT-1 is here: The foundation model for tabular data.
Predict payment delays, ticket severity, churn and more with high accuracy and minimal setup.

Try it now:
* Playground: rpt.cloud.sap
* Get open source: huggingface.co/SAP/sap-rpt-...
* Read the paper: arxiv.org/abs/2506.10707

SAP-RPT-1 Playground | Try with your own test data or SAP-provided example datasets

SAP-RPT-1 is a relational pretrained foundation model that delivers accurate predictive insights from structured business data. It uses in-context learning, allowing users to provide example records directly in API calls, to generate instant, reliable predictions without any model training.

rpt.cloud.sap

November 5, 2025 at 8:39 AM

Johannes Hoffart

@hoffart.ai

A new player enters the arena of Foundation Models on Tabular Data: www.limix.ai - novel methods for pre-training and data generation that look highly relevant. Their evaluation on selected datasets is showing strong performance. Exciting times, looking forward to further in depth comparisons!

LimiX

www.limix.ai

September 15, 2025 at 11:28 AM

Johannes Hoffart

@hoffart.ai

At #VLDB2025 London I joined a panel on Neural Relational Data. My Take: LLMs solve some data management tasks, but the next wave is Foundation Models on Relational Data and Semantically Linked Tables. More on this and further trends in #AI and #DataManagement - www.hoffart.ai/vldb-2025-ai...

VLDB 2025: AI Meets Enterprise Data Management — The Tabular FM Moment – Johannes Hoffart

www.hoffart.ai

September 11, 2025 at 11:55 AM

Johannes Hoffart

@hoffart.ai

Our team developing Foundation Models on Tables & Linked Business Data is looking for a new Senior Applied Research Scientist! Excited about pushing the frontier in foundation models on tabular data? Want to have business impact and academic visibility?

Look no further: jobs.sap.com/job/Walldorf...

Senior/Principal Applied Research Scientist (f/m/d): Foundation Models on Linked Business Data

jobs.sap.com

August 1, 2025 at 7:47 AM

Reposted by Johannes Hoffart

Grace Lindsay

@neurograce.bsky.social

For the past 3 years, I've taught a course on Machine Learning for Climate Change to undergrads. At times, people have asked if the course lectures could be made available online. While I can't offer that, I have decided to start making "5 Minute Papers on AI for the Planet" videos. Hope its useful!

5 Minute Papers on AI for the Planet

AI is more than just chatbots! Learn about how AI can be used to protect biodiversity, fight climate change, and just better understand our planet through 5-minute explainers covering academic papers ...

www.youtube.com

June 20, 2025 at 1:55 AM

Reposted by Johannes Hoffart

eleutherai.bsky.social

@eleutherai.bsky.social

Can you train a performant language model using only openly licensed text?

We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2

June 6, 2025 at 7:19 PM

Reposted by Johannes Hoffart

Simon Willison

@simonwillison.net

Here's the full workshop handout plus annotated slides from "Building software on top of Large Language Models", a three hour tutorial I presented yesterday at PyCon US #PyConUS simonwillison.net/2025/May/15/...

Building software on top of Large Language Models

I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they …

simonwillison.net

May 15, 2025 at 12:29 PM

Reposted by Johannes Hoffart

François Fleuret

@francois.fleuret.org

I asked "on the other platform" what were the most important improvements to the original 2017 transformer.

That was quite popular and here is a synthesis of the responses:

April 28, 2025 at 6:47 AM

Reposted by Johannes Hoffart

Ethan Mollick

@emollick.bsky.social

This was helpful.

Also worth noting that Bluesky remains a very fraught place for AI discussions for a variety of reasons, good & bad, but with the impact of keeping a lot of the most relevant AI news, paper discussions & biggest names on X

That might change, but it hasn’t yet. Still posting, tho.

Naomi Saphra @nsaphra.bsky.social · Apr 26

I wrote something up for AI people who want to get into bluesky and either couldn't assemble an exciting feed or gave up doomscrolling when their Following feed switched to talking politics 24/7.

The AI Researcher's Guide to a Non-Boring Bluesky Feed | Naomi Saphra

How to migrate to bsky without a boring feed.

nsaphra.net

April 26, 2025 at 2:55 AM

Reposted by Johannes Hoffart

Simon Willison

@simonwillison.net

It's been a couple of years since GPT-4 powered Bing, but with the various Deep Research products and now o3/o4-mini I'm ready to say that AI assisted search-based research actually works now simonwillison.net/2025/Apr/21/...

AI assisted search-based research actually works now

For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the …

simonwillison.net

April 21, 2025 at 2:14 PM

Reposted by Johannes Hoffart

Sebastian Raschka (rasbt)

@sebastianraschka.com

I just shared a new article, "The State of Reasoning Models", where I am exploring 12 new research articles on improving the reasoning capabilities of LLMs (all published after the release of DeepSeek R1): magazine.sebastianraschka.com/p/state-of-l...

Happy reading!

The State of LLM Reasoning Models

Part 1: Inference-Time Compute Scaling Methods

magazine.sebastianraschka.com

March 8, 2025 at 2:37 PM

Reposted by Johannes Hoffart

Thomas Wolf

@thomwolf.bsky.social

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century"

Here: thomwolf.io/blog/scienti...

It's an extension of this interview discussion from the AI summit: youtu.be/AxBd3G0lFLs?...

March 6, 2025 at 1:03 PM

Reposted by Johannes Hoffart

Eunsol Choi

@eunsol.bsky.social

When using LLM-as-a-judge, practitioners often use greedy decoding to get the most likely judgment. But we found that deriving a score from the judgment distribution (like taking the mean) works better!
❌LLM-as-a-judge with greedy decoding
😎Using the distribution of the judge’s labels

Victor Wang @victorwang37.bsky.social · Mar 6

LLM judges have become ubiquitous, but valuable signal is often ignored at inference.

We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵

(w/ Michael J.Q. Zhang, @eunsol.bsky.social)

March 6, 2025 at 10:04 PM

Reposted by Johannes Hoffart

ELLIS

@ellis.eu

Discover European cities ✈️ while building your career! Check out the ELLIS PhD/Postdoc Program's 2025 Winter & Summer School Schedule! Dive deep into cutting-edge #AI research, learn from top researchers & connect with peers across Europe. Learn more: bit.ly/42iow66 #PhD #machinelearning

January 13, 2025 at 12:45 PM

Reposted by Johannes Hoffart

Thomas Wolf

@thomwolf.bsky.social

Our first release of 2025: 𝙨𝙢𝙤𝙡𝙖𝙜𝙚𝙣𝙩𝙨, 𝘁𝗵𝗲 𝘀𝗶𝗺𝗽𝗹𝗲𝘀𝘁 𝗹𝗶𝗯𝗿𝗮𝗿𝘆 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝘀𝘆𝘀𝘁𝗲𝗺𝘀!

💥 Main logic in ~1000 LoC
🧑‍💻 Agent writes its actions in code! LLMs are much better at writing code than current standard of writing JSON => higher perf
🌍 Any LLM support (h/t LiteLLM)
🛡️ Secure code exec (h/t E2B)

January 1, 2025 at 3:21 PM

Johannes Hoffart

@hoffart.ai

Have a look at our work on foundation models on tabular data, published today at #TRL @ #NeurIPS2024:

📜 PORTAL, an open weight and code foundation model trained on tabular data, and
📜 SALT, a real business data set containing millions of sales orders across multiple tables.

Further details 👇

December 14, 2024 at 5:56 PM

Reposted by Johannes Hoffart

Simon Willison

@simonwillison.net

Wrote up my initial impressions of the new Google Gemini 2.0 Flash model - it's really good, and the streaming mode (where you can stream video and audio to it and get audio streamed right back) is pure science-fiction simonwillison.net/2024/Dec/11/...

Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode

Huge announcment from Google this morning: Introducing Gemini 2.0: our new AI model for the agentic era. There’s a ton of stuff in there (including updates on Project Astra and …

simonwillison.net

December 11, 2024 at 8:22 PM

Reposted by Johannes Hoffart

Table Representation Learning research

@trl-research.bsky.social

The 3rd Table Representation Learning (TRL) workshop at NeurIPS 2024 is approaching soon ✨

Join us Saturday 14 Dec from 8:30AM for an amazing program and discussions about all things neural models + tabular data (table-representation-learning.github.io ).

Not in Vancouver? Join online neurips.cc 😎

Table Representation Learning Workshop

TRL Workshop ---

table-representation-learning.github.io

December 9, 2024 at 6:18 PM

Johannes Hoffart

@hoffart.ai

We are growing the team building the SAP Knowledge Graph and are #hiring AI & Data Scientists, Data Engineers, Knowledge Engineers and Applied Research Scientists in Germany (Berlin, Walldorf) and India (Bangalore): jobs.sap.com/search/?crea...

Let's take GenAI to the next level with #KG!

SAP Knowledge Graph - SAP Jobs

Find SAP Knowledge Graph at SAP

jobs.sap.com

December 4, 2024 at 10:36 AM

Reposted by Johannes Hoffart

Davide Paglieri

@dpaglieri.bsky.social

Tired of saturated benchmarks? Want scope for a significant leap in capabilities?

🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!

BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.

1/🧵

November 21, 2024 at 4:24 PM

Reposted by Johannes Hoffart

Raphaël Troncy

@rtroncy.bsky.social

Great blog post from @odihq.bsky.social @esimperl.bsky.social on the current development state of #dataspaces in Europe.

theodi.org/news-and-eve...

What are data spaces and what do they do?

Learn more about data spaces: what they are, what they do and what’s next.

theodi.org

November 30, 2024 at 12:29 PM

Reposted by Johannes Hoffart

Ivan Rubachev

@puhsu.bsky.social

Tabular DL and AutoML podcast just dropped. For sure watching this

youtu.be/3qpQ-sMRafE

How AutoML Creates New Opportunities for Europe - Frank Hutter // CyberValley Podcast #5

YouTube video by Cyber Valley

youtu.be

November 26, 2024 at 6:42 PM

Johannes Hoffart

@hoffart.ai

Let me surface this again now that this place is more lively: Come join us at SAP in the US or Germany for a PhD Summer Internship in 2025 in Foundation Models on Structured Data, Table Representation Learning, LLMs and Knowledge Graphs! #MLInternships

Johannes Hoffart @hoffart.ai · Nov 18

We are looking for PhD summer interns for 2025 in the area of Foundation Models on Structured Data, Table Rep Learning, LLMs and Knowledge Graphs.
If you want to work on groundbreaking research on the richest business data available, please reach out to me or apply here: jobs.sap.com/job/Berlin-P...

PhD Intern (f/m/d) - Business AI Research

jobs.sap.com

November 26, 2024 at 8:51 PM

Reposted by Johannes Hoffart

Mark Collier

@markcollier.me

Added some more folks to the Open Source AI Starter Pack:

go.bsky.app/N8yVZdW

November 24, 2024 at 6:43 PM

Reposted by Johannes Hoffart

Gerard de Melo

@gdemelo.bsky.social

I am chairing the
AI@HPI Conference: Responsible AI

December 3-4 in Potsdam (Berlin metropolitan area)

Discussing AI with regard to bias, elections/society, trustworthiness, copyright, the EU AI Act, and best practices.

Registration:
hpi.de/en/ai-hpi-co...

Please spread the word!

November 21, 2024 at 5:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news