Generative Music AI’s $350 Millon Problem: Compensating Creators for the Use of Copyrighted Materials in Training Sets
Josh Wagman, Kay Yan, Rafael Costa, Alex Levesque, Armita Afroushe
CUCAI 2025 Proceedings • 2025
Abstract
The rapid expansion of music AI technologies has led to the extensive use of large-scale datasets that often include copyrighted music without adequate oversight. Current legal and technical frameworks struggle to identify and quantify such copyrighted content, resulting in the under-compensation of copyright holders and potential violations of intellectual property rights. This study implements a unique approach to copyright detection. Utilizing federated learning (FL), our method trains models locally, preserving data privacy by keeping sensitive information on local servers while aggregating model updates centrally. Additionally, model fingerprinting assigns unique digital signatures to training data outputs, enabling precise tracking and verification of copyrighted material. Leveraging these techniques, our framework compiles a comprehensive catalog of artists and quantifies the number of songs present in the dataset, which is then integrated into our compensation mechanism to ensure fair remuneration for copyright holders. Our solution enhances transparency in data usage while delivering mutual benefits for both AI developers and creators, incentivizing a cooperative musical landscape where AI and creativity coexist.