Zuckerberg cites YouTube in his defense amid AI copyright lawsuit

Zuckerberg cites YouTube in his defense amid AI copyright lawsuit

Jan 16, 2025Anthony

Mark Zuckerberg’s position on AI and copyright took center stage recently during a deposition in the ongoing legal battle, Kadrey v. Meta. Plaintiffs claim that Meta used a pirated data set of e-books – known as LibGen – to train their AI systems, including their flagship Llama models. While defending Meta’s practices, Zuckerberg drew an analogy to YouTube’s approach to pirated content. Essentially, his argument positioned Meta’s use of the controversial e-book library as part of broader licensing challenges faced by digital platforms.

Zuckerberg acknowledged in the deposition, “YouTube might host pirated content for a period of time, but they actively work to remove it, and most of its content is licensed.” His comments highlight the increasingly fuzzy lines tech companies are grappling with when deciding how to navigate the use of copyrighted materials for training AI systems. While portions of Zuckerberg’s deposition suggest he supports careful handling of such materials, he also dismissed familiarity with LibGen and suggested blanket bans on datasets like it might not be practical.

The lawsuit is one of many currently dissecting the legalities of AI training practices. Authors like Sarah Silverman and Ta-Nehisi Coates — prominent plaintiffs in this case — argue that using copyrighted works without explicit permission isn’t “fair use.” Meta, like other major AI players, contends that the practice falls under fair-use doctrine, enabling innovation and advancement in generative AI systems.

Interestingly, internal Meta documents presented by the plaintiffs reveal significant internal concerns about using LibGen. Employees reportedly referred to the data set as "pirated" and warned of potential risks, including regulatory pushback and weakened negotiating positions with publishers. Despite these flagged issues, the lawsuit asserts that Meta proceeded with LibGen to train its Llama models. The allegations stretch further, claiming that similar tactics were applied to newer iterations, like Llama 3 and future Llama 4 models, and that datasets from another piracy-frequent database, Z-Library, were also used as recently as 2024.

Zuckerberg’s comments also offer insight into the tension between innovation and regulation. His analogy to YouTube suggested a pragmatic stance, emphasizing care rather than outright bans. “It’s unreasonable to prohibit datasets wholesale just because some content is problematic,” he argued during questioning. Yet, he also noted that companies must tread cautiously. “If a website clearly intends to violate rights, that deserves extra scrutiny,” he remarked, signaling a tightening balancing act Meta appears to be walking.

This isn’t the first or last case testing whether AI firms prioritized rapid growth over respecting intellectual property laws. With large-scale models like Llama being integral to Meta's competitiveness against giants like OpenAI, the stakes couldn’t be higher. For now, how courts decide the Kadrey v. Meta case — and others like it — could reshape everything from AI development pipelines to the future negotiation tactics of tech corporations.

As the legal drama continues, it leaves open questions about what innovation should cost and who ultimately pays the price. Meta’s choices to train its models on openly controversial datasets like LibGen may shed new light on industry practices, but they also keep the spotlight firmly on the unresolved issue of AI copyright standards.



More articles