sewoong79.bsky.social
@sewoong79.bsky.social
This is an optimistic approach, where the fear of losing the stake prevents any deviation from the agreement, including an adversarial post-training.
February 4, 2025 at 8:33 PM
A major challenge in community aligned AI is adversarial post-training aimed to re-align the model to values other than the community’s. The fingerprints can authenticate the community owned model, and such a re-alignment can then be detected and punished.
February 4, 2025 at 8:33 PM
We introduce a new fingerprinting technique that can add 24,576 fingerprints to a Llama-3.1-8B model --- two orders of magnitude more than existing schemes.
February 4, 2025 at 8:33 PM
Each copy of the model is fingerprinted with different fingerprints, in order to check for compliance. We demonstrate both empirically and theoretically that the security of such a protocol depends on how many fingerprints we can add to each model.
February 4, 2025 at 8:33 PM
We introduce a new semi-open protocol that allows a model owner (i.e., the community) to share the model weights with multiple hosts; each host signs an agreement and escrows a portion of their stake to get access to the model weights.
February 4, 2025 at 8:33 PM
The central dilemma in community ownership is, how do we ensure that the community retains the ownership of the model even after deployment, i.e., sharing the model weights?
February 4, 2025 at 8:33 PM
Community owned AIs can only be used with the approval of the community and share the economic rewards communally. Community aligned AIs have values that are aligned with the community. Community controlled AIs perform functions designed by the community.
February 4, 2025 at 8:33 PM
Beliefs and values in proprietary AI models are controlled by the companies who own them. We propose an alternative system for building AI that is loyal to the community, by providing solutions to three critical components: ownership, alignment, and control.
February 4, 2025 at 8:33 PM
Second post.
February 4, 2025 at 7:56 PM