Some e-mails and others' blog entries lead me to believe that the class definition in the settlement regarding Anthropic's use of pirated material as training material for its "AI" is causing some needless confusion. Slightly rewording it, the class members who can recover are those who:
- Hold the copyright,
- in works leeched by Anthropic to "train"1 its "AI" systems
- that were, for US works, registered with the Copyright Office
- with an effective date preceding the leeching
- and within five years after first publication.
It's those last two qualifications that are causing the confusion, because they're not about copyright law — they are, instead, about the technicalities of class certification, and specifically about the problem of class representives having circumstances typical of the class and that the common questions presented predominate. However, it does relate back to copyright law, in a way, too — because works falling outside of these parameters and leeched in this scheme still violates that work's copyright, but it would require further litigation. The incredibly ill-advised registration system — which is inconsistent with the Berne Convention's disdain for "formalities," but for both historical and hidden-agenda reasons beyond the scope here continues to be part of US copyright law — has two provisions that are prone to abusive litigation tactics. The class definition excludes those tactics by defining otherwise valid copyright claims out of the class.
The easiest to understand, and the one with the most validity, is that the effective date of registration2 needs to precede the date on which the material was misappropriated by Anthropic. This isn't about copyright validity, but about the availability of certain remedies under § 412. Since those remedies are important parts of this class action, they've been forced in through the class definition.3
It's much more difficult to accept the "registered within five years" limitation. A registration can be made at any time that a work's copyright is in force. The "five years" comes from an evidentiary qualification in § 410 of the Copyright Act: A registration whose effective date is within five years of first publication is prima facie valid, but a later registration is subject to challenge more generally. Excluding the post-five-years-registered works is a litigation decision made in negotiating this settlement (and in the class allegations in the complaint), because it appears that none of the proposed class representatives falls into that group and the additional squirreliness involved in validating those registrations might theoretically impair the "common question" aspect of class certification.
Unfortunately, that last point in particular has been misinterpreted in a number of places as meaning that more than five years after first publication, it's too late to register at all (instead of just for this particular lawsuit). Frankly, that's what some parties here want you to think, because without registration there's no individual cause of action that can be heard by the courts (§ 411, although this is a claim-processing rule and not jurisdiction4).
The fundamental problem is that the publishing industries — some more than others — have been at best slovenly in registering copyrights, even when the publishing contract requires such registration.5 (It was worse under the 1909 Act, when that failure to register also forfeited the copyright itself.) It is still worth doing late registrations (so long as the term hasn't expired and the registrant is even more careful than usual to proofread the application and ensure it's fully truthful and accurate). This suit, after all, is not going to resolve all questions regarding leeching of material under copyright… and the next set of class counsel to come along, or even individual lawyers, might be more aggressive. However, they can't file if there's no registration.
- I'm just not going to express my contempt for this sort of deceptive misuse of language here. Although that misuse is endemic to the general discussion of "AI" and "generative AI" and "chatbots," the point of this blawg entry is misunderstanding of copyright law by affected authors (and potentially many others).
Don't worry, you sleazebuckets. I'll deal with your intellectual dishonesty and intentionally deceptive acts and practices more directly another time. Bwahahahahaha!
- Although really not relevant here, the effective date of registration is ordinarily the earliest of the date of actual application (including payment of fees) or — if that application date is 90 days or less after first publication — the date of first publication. Naturally, the "date of first publication" is defined in the Copyright Act only for "phonorecordings."
- We'll pretend, for the moment, that § 505 provides the only way to recover attorney's fees. It doesn't; the rule governing class actions provides for attorney's fees (regardless of whether the cause of action otherwise provides for them), and on a far more generous basis than does the Copyright Act. Needless to say, I'm displeased with the confusion here, too.
- Reed Elsevier, Inc. v. Muchnick, 559 U.S. 154 (2010). This matters because federal class actions can include claims of dubious (or even no) subject-matter jurisdiction if pursued individually.
- This failure constitutes a breach of contract by the publisher. On one hand, it's probably long past the statute of limitations, since the failure to register "should be" apparent to the author not long after publication. Creative lawyers might use such breaches — especially when part of a pattern or practice — to strike other defenses offered by publishers for other breach-of-contract claims like failure to pay royalties; this is called the doctrine of unclean hands. Of course, the hands were a lot less clean when smearing linseed-oil-based inks…