🐘
banner
pkydrm.bsky.social
🐘
@pkydrm.bsky.social
research scientist @MosaicML x @Databricks re: rlhf, humans in the loop, and figuring out what it means to have a good model 🤖🧑‍🎨✨
I'm extremely curious -- would you want digital tools that would help with this (e.g. planning, time organization) or embodied AI (e.g. physical assistance in-home, transportation)?
April 16, 2025 at 5:27 PM
December 19, 2024 at 4:26 PM
and a big shout out to my collaborators: Erica Ji Yuen, Kartik Sreenivasan, Yue (Andy) Zhang, Sam Havens, Michael Carbin, Matei Zaharia, Jonathan Frankle
December 19, 2024 at 4:25 PM
3/3 🔑 Want to see how different models perform on enterprise tasks? Full analysis in the blog here: databricks.com/blog/benchma...!
Benchmarking Domain Intelligence
databricks.com
December 19, 2024 at 4:25 PM
📊 DIBS measures real enterprise needs. We tested 14 models & found:

- Academic benchmarks mask enterprise gaps
- No single model wins across all tasks
- Open models are competitive on key capabilities
- Some enterprise tasks show clear paths forward, others are more complex

2/3
December 19, 2024 at 4:25 PM
December 19, 2024 at 4:24 PM
And of course a big shout out to my collaborators: Erica Ji Yuen, Kartik Sreenivasan, Yue (Andy) Zhang, Sam Havens, Michael Carbin, Matei Zaharia, and Jonathan Frankle for their help!
December 19, 2024 at 4:23 PM
3/3 🔑 Want to see how different models perform on enterprise tasks? Full analysis in the blog here: databricks.com/blog/benchma...!
Benchmarking Domain Intelligence
databricks.com
December 19, 2024 at 4:21 PM
📊 DIBS measures real enterprise needs. We tested 14 models & found:
- Academic benchmarks mask enterprise gaps
- No single model wins across all tasks
- Open models are competitive on key capabilities
- Some enterprise tasks show clear paths forward, others are more complex

2/3
December 19, 2024 at 4:20 PM
would love to be added :-)
December 11, 2024 at 7:33 PM
brat tulu is amazing
December 10, 2024 at 11:52 PM
i know some labs are already starting to do this; i hope more continue to. it is challenging, complex technical work and we should think of it as a first-class contribution in the field. 5/5
November 26, 2024 at 2:09 PM