Nice to read about those risks from an other pov!
Nice to read about those risks from an other pov!
There is a lot of exciting works on robust audits, here are a few I enjoyed:
arxiv.org/abs/2402.02675
arxiv.org/abs/2504.00874
arxiv.org/abs/2502.03773
arxiv.org/abs/2305.13883
arxiv.org/abs/2410.02777
There is a lot of exciting works on robust audits, here are a few I enjoyed:
arxiv.org/abs/2402.02675
arxiv.org/abs/2504.00874
arxiv.org/abs/2502.03773
arxiv.org/abs/2305.13883
arxiv.org/abs/2410.02777
We instantiate our framework with a simple idea: just look at the accuracy of the platform's answers.
Our experiments show that this can help reduce the amount of unfairness a platform could hide.
We instantiate our framework with a simple idea: just look at the accuracy of the platform's answers.
Our experiments show that this can help reduce the amount of unfairness a platform could hide.
🔒Crypto guarantees: the model provider is forced to commit their model and sign every answer.
📐Clever ML tricks: the auditor uses information about the model (training data, model structure, ...) to understand what is a "good answer".
🔒Crypto guarantees: the model provider is forced to commit their model and sign every answer.
📐Clever ML tricks: the auditor uses information about the model (training data, model structure, ...) to understand what is a "good answer".
Thus, nothing prevents you from manipulating the answers of your model to pass the audit.
And this is very easy! In fact, any fairness mitigation method can be transformed into an audit manipulation attack.
Thus, nothing prevents you from manipulating the answers of your model to pass the audit.
And this is very easy! In fact, any fairness mitigation method can be transformed into an audit manipulation attack.
An audit is pretty straightforward.
1/ I, the auditor 🕵️ come up with questions to ask your model.
2/ You, the platform 😈 answer my questions.
3/ I look at your answers and decide whether your system abides by the law by computing a series of aggregate metrics.
An audit is pretty straightforward.
1/ I, the auditor 🕵️ come up with questions to ask your model.
2/ You, the platform 😈 answer my questions.
3/ I look at your answers and decide whether your system abides by the law by computing a series of aggregate metrics.