If you're interested, we’d love to have you join our community.
GDG AI for Science Japan
gdg.community.dev/gdg-ai-for-s...
If you're interested, we’d love to have you join our community.
GDG AI for Science Japan
gdg.community.dev/gdg-ai-for-s...
If they can't represent human preferences, how can we hope to use them to align a language model?
In our #COLM2025 "Off-Policy Corrected Reward Modeling for RLHF", we investigate this issue 🧵
If they can't represent human preferences, how can we hope to use them to align a language model?
In our #COLM2025 "Off-Policy Corrected Reward Modeling for RLHF", we investigate this issue 🧵
No more tedious edits like fixing capitalization (ai -> AI), swapping arXiv for the conference version, or expanding "and others" into full author lists. Let bibfixer do the grunt work for you!
No more tedious edits like fixing capitalization (ai -> AI), swapping arXiv for the conference version, or expanding "and others" into full author lists. Let bibfixer do the grunt work for you!
Paper: pub.sakana.ai/edinet-bench/
We just released a Japanese financial benchmark designed to evaluate the performance of AI Agents on challenging financial tasks like accounting fraud detection.
Paper: pub.sakana.ai/edinet-bench/
We just released a Japanese financial benchmark designed to evaluate the performance of AI Agents on challenging financial tasks like accounting fraud detection.
It features accounting fraud detection, earnings forecasting, industry classification, and includes our tool edinet2dataset as a foundation for designing new tasks.
Hope researchers find it useful!
ブログ: sakana.ai/edinet-bench/
論文: pub.sakana.ai/edinet-bench/
金融庁の電子開示システムであるEDINETの有価証券報告書を活用し、高度な金融タスクにてAIがどの程度対応できるかを測るための日本語金融ベンチマークを構築しました。
EDINET-Bench での評価の結果、現状のLLMを単純に適用するだけでは、会計不正検知等において実用的な性能は出ないという課題が確認された一方、入力情報を工夫することによる性能向上の可能性も示唆されました。
It features accounting fraud detection, earnings forecasting, industry classification, and includes our tool edinet2dataset as a foundation for designing new tasks.
Hope researchers find it useful!
But, what you really want to hear about is stats .... right? -> 🧵
But, what you really want to hear about is stats .... right? -> 🧵