Lightnews — Scholar-powered news

tksiia

@tksiia.bsky.social

We're excited to announce the launch of Google Developer Group AI for Science Japan!🎉

If you're interested, we’d love to have you join our community.

GDG AI for Science Japan
gdg.community.dev/gdg-ai-for-s...

GDG AI for Science - Japan | Google Developer Groups

gdg.community.dev

November 28, 2025 at 6:21 AM

Reposted by tksiia

Johannes Ackermann

@johannesack.bsky.social

Reward models do not have the capacity to fully capture human preferences.
If they can't represent human preferences, how can we hope to use them to align a language model?

In our #COLM2025 "Off-Policy Corrected Reward Modeling for RLHF", we investigate this issue 🧵

July 29, 2025 at 10:22 AM

tksiia

@tksiia.bsky.social

Released bibfixer 🎉 A tiny AI tool that cleans & standardizes your BibTeX files using LLMs + web search.

No more tedious edits like fixing capitalization (ai -> AI), swapping arXiv for the conference version, or expanding "and others" into full author lists. Let bibfixer do the grunt work for you!

GitHub - takashiishida/bibfixer: A Python tool that automatically cleans, completes, and standardizes BibTeX entries using LLMs and web search.

A Python tool that automatically cleans, completes, and standardizes BibTeX entries using LLMs and web search. - takashiishida/bibfixer

github.com

September 29, 2025 at 12:54 PM

Reposted by tksiia

hardmaru

@hardmaru.bsky.social

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

Paper: pub.sakana.ai/edinet-bench/

We just released a Japanese financial benchmark designed to evaluate the performance of AI Agents on challenging financial tasks like accounting fraud detection.

June 9, 2025 at 2:02 AM

tksiia

@tksiia.bsky.social

Excited to announce EDINET-Bench, a financial LLM benchmark built from 40k annual reports in Japan!

It features accounting fraud detection, earnings forecasting, industry classification, and includes our tool edinet2dataset as a foundation for designing new tasks.

Hope researchers find it useful!

sakanaai.bsky.social @sakanaai.bsky.social · Jun 9

日本語金融ベンチマーク「EDINET-Bench」を公開

ブログ: sakana.ai/edinet-bench/
論文: pub.sakana.ai/edinet-bench/

金融庁の電子開示システムであるEDINETの有価証券報告書を活用し、高度な金融タスクにてAIがどの程度対応できるかを測るための日本語金融ベンチマークを構築しました。

EDINET-Bench での評価の結果、現状のLLMを単純に適用するだけでは、会計不正検知等において実用的な性能は出ないという課題が確認された一方、入力情報を工夫することによる性能向上の可能性も示唆されました。

June 9, 2025 at 10:12 AM

Reposted by tksiia

Conference on Language Modeling

@colmweb.org

Our discussion period just started. Authors, please read our instructions carefully. We require responses by June 2.

But, what you really want to hear about is stats .... right? -> 🧵

May 27, 2025 at 5:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news