Also btw the name of the model is truly legendary.
Also btw the name of the model is truly legendary.
200b -> 5T tokens (same as current fully open sota like Marin/AllenAI pretrain).
Your method can also scale far further because it provides a methodology for compliant licensing.
200b -> 5T tokens (same as current fully open sota like Marin/AllenAI pretrain).
Your method can also scale far further because it provides a methodology for compliant licensing.
I wonder if this would scale to making a larger model like gpt oss 120b in capability. Would need more tokens but assuming similar scaling far less than 2.1 million H100 hours worth of tokens.
I wonder if this would scale to making a larger model like gpt oss 120b in capability. Would need more tokens but assuming similar scaling far less than 2.1 million H100 hours worth of tokens.
People love it and so there will be more. Getting users and farming engagement is a “real usecase” I’m afraid.
People love it and so there will be more. Getting users and farming engagement is a “real usecase” I’m afraid.
Now they are basically frontier and OSS leader too. Clearly have the right setup internally to be able to very quickly adopt new developments. Unusual for a big corp.
Now they are basically frontier and OSS leader too. Clearly have the right setup internally to be able to very quickly adopt new developments. Unusual for a big corp.
After all, lots of models and lots of users means lots of customers and lots of money…
After all, lots of models and lots of users means lots of customers and lots of money…
I think the thing here will simply be a verifier for proof correctness.
Also probably a truly astronomical amount of test time tokens.
I think the thing here will simply be a verifier for proof correctness.
Also probably a truly astronomical amount of test time tokens.
Yeah honestly a smart businessman should only build on an open model, imo.
Yeah honestly a smart businessman should only build on an open model, imo.