Software Engineer, Sci-Fi lover homelab enthusiast and infatuated with the possibilities of ML. I remain optimistic for the future of humanity despite evidence to the contrary.
You might be interested in something like this project - github.com/weaviate/Verba - it will let you use your data with an existing base model (open or otherwise).
You might be interested in something like this project - github.com/weaviate/Verba - it will let you use your data with an existing base model (open or otherwise).
That is a great dataset for fine tuning or RAG but nowhere near enough to train a model from scratch. TinyLlama - github.com/jzhang38/Tin... - is an open model trained on an open dataset consisting of 3T tokens (about 2T words) and it's not really usable.
That is a great dataset for fine tuning or RAG but nowhere near enough to train a model from scratch. TinyLlama - github.com/jzhang38/Tin... - is an open model trained on an open dataset consisting of 3T tokens (about 2T words) and it's not really usable.
2) The compute resources required to turn that data into a prediction model is the next hurdle - yes it can be leased rather than bought but it's still going to be expensive and time consuming.
December 15, 2024 at 12:01 AM
2) The compute resources required to turn that data into a prediction model is the next hurdle - yes it can be leased rather than bought but it's still going to be expensive and time consuming.
1) The biggest blocker to having (and running) your own personal model is the amount of resources required - well beyond the capability of most individuals. The amount of data required to train a model is huge and it all has to be collected, validated, labelled, free of copyright, etc.
December 14, 2024 at 11:59 PM
1) The biggest blocker to having (and running) your own personal model is the amount of resources required - well beyond the capability of most individuals. The amount of data required to train a model is huge and it all has to be collected, validated, labelled, free of copyright, etc.
Fine tuning (updating the 'core' model with new content) and RAG (providing local knowledge specific to the query) are two techniques that are supported by both open and closed models. There are a lot of commercial and open source tools available to support both techniques.
December 14, 2024 at 11:54 PM
Fine tuning (updating the 'core' model with new content) and RAG (providing local knowledge specific to the query) are two techniques that are supported by both open and closed models. There are a lot of commercial and open source tools available to support both techniques.
I think the biggest driver of piracy is accessibility of the content - when NetFlix streaming first came out video piracy dropped dramatically, the same thing happened with Spotify and music. Now that video is being siloed off into multiple providers (Disney, Paramount, etc) it's on the rise again.
November 23, 2024 at 10:17 PM
I think the biggest driver of piracy is accessibility of the content - when NetFlix streaming first came out video piracy dropped dramatically, the same thing happened with Spotify and music. Now that video is being siloed off into multiple providers (Disney, Paramount, etc) it's on the rise again.
I needed something for a first post and I was indulging in an afternoon sugar top up so I thought - why not? Chocolate and banana slice with cream - both delicious and unhealthy :)
February 9, 2024 at 3:45 AM
I needed something for a first post and I was indulging in an afternoon sugar top up so I thought - why not? Chocolate and banana slice with cream - both delicious and unhealthy :)