17 July 2019

I Could Never Get the Hang of I-de-olh-oh-gee

I'm going to be as gentle about this as I can: Lawyers in general, and law professors in particular, should not be allowed access to statistical tools unless and until they take a minimum grad-school-two-credit-equivalent course in experimental design taught by scientists who actually design experiments. A recent article on copyright remedies epitomizes the GIGO problem that results. The article is fundamentally wrong, fundamentally unsound, and based on fundamentally ridiculous experimental design. Some of the policy conclusions may have some validity, and should be part of the discussion… but the article's data set and analysis are orthogonal to any valid conclusion.

It's worth quoting the article's description of its "experimental design and method" in full, omitting footnotes (which do not appear to resolve any of the difficulties discussed below, let alone the more-subtle ones not explicable in a blawg entry).

The docket study presented in this Part is based on the Copyright Data Project, a publicly available database that contains docket entries, complaints, and other documents of almost one thousand copyright disputes from the period between January 1, 2005 and December 31, 2008. The list of cases was populated by a search on Bloomberg Law of all cases filed in federal courts from January 1, 2005 to December 31, 2008 for which the “Nature of Suit” is Copyright. This four-year period provides an ideal window to study statutory damages since it allows us to compare the role of statutory damages in the context of P2P file sharing on the one hand, with more commonplace copyright disputes on the other. The final list of cases in the database contains a random selection of 957 out of 17,119 cases. By randomizing the cases for analysis, the analysis is based on a representative set of cases. The docket database contains 46 coded fields and 125 different variables on each of the copyright disputes randomly selected from that period.

It is helpful to summarize a few general observations about copyright litigation. In their topography of the field, Christopher Cotropia and James Gibson observe that (1) “the Central District of California and Southern District of New York are ‘hot districts’ for copyright cases,” (2) copyright cases are “no more likely to get contentious than other civil litigation, [but] when they do get contentious, they get very contentious—resulting in significantly more docket entries, substantive rulings, and trials,” and (3) copyright dockets contain a remarkable number of (successful) small firms and “low IP” industries. Where pertinent, the analysis below will take into account these particularities of copyright disputes. Additionally, the results will distinguish between “regular” and “peer-to-peer” (P2P) or file-sharing cases. Separating both types of cases is important given the flood of P2P litigation in the 2005–2008 period.1

Go ahead. Soak that in. Now let's consider just a few of the obvious systemic flaws in this dataset (and in this instance, I lay the blame as much upon Cotropia and Gibson as on DePoorter), in the order that they become unmistakeable (that is, not disclaimed in the next sentence sort of thing!) in this passage. To begin with, "docket entries, complaints, and other documents of almost one thousand copyright disputes from the period between January 1, 2005 and December 31, 2008" represents not precisely the Jurassic Period, but it's definitely Bronze Age. It's pre-tablet, pre-Kindle, pre-Spotify, and pre-online Netflix; it's nearly pre-Etsy; it's at the height of The Pirate Bay; it's at the beginning of TOR (which had enough activity to point at defendants only in the last twelve months of the studied period); it's before widespread release/use of decoders for online video content that enabled high-quality transcoding; it's before almost-universal social media account usage, a major vector for infringements; most to the point, it's before widespread adoption of cellphones able to store and replay pirated content. In short, the 'net doesn't look much like it did in 2008, let alone 2005, regarding copyright infringements.

Similarly, the reliance on only matters filed in US District Court is also more than a bit problematic. IMNSHO and consistent with both my personal experience and the general experience of authors' organizations — both specific to that period and overall since the late 1990s when I began representing and consulting for creators and other copyright holders regarding online piracy — the statutory-damages stick being waved in a takedown letter prevents a substantial proportion (at least three quarters) of lawsuit filings against nonhabitual offenders. It does so in three ways: It provides substantial incentive for nonhabitual offenders to settle, take down, or otherwise obviate the need for suit; it also provides substantial incentive to habitual offenders (overt pirate sites being the most obvious example, but also including, say, America Online, Inc. (pay attention to who argued and won that one — and concerning what kind of infringement!), Cox Communications, and so on, to make getting them into court procedurally difficult to impossible (not to mention expensive and time-consuming); and there's that pesky DMCA shield2 that locks many disputes right out of court (and the interplay with demand letters asserting ineligibility for DMCA safe harbors and corresponding availability of statutory damages can be fascinating in itself), notwithstanding that an infringement using any conduit but the internet might well have been meritorious. And that's just for indisputably-US-based defendants!

Randomization of cases for analysis isn't enough unless one first ensures that the random selection is across a single, or at least representative, population. And then one must ensure that all of the statistical tools used for analysis are sample-based tools… and perform all of the crosschecks later. Table 5, however, reflects that these crosschecks were not performed: Subsamples of two individual author plaintiffs, three "software-video games" plaintiffs, three "performing arts" plaintiffs, and five "fine arts" plaintiffs are nowhere close to the minimum sample size necessary to validate a statistical conclusion from a population of either 957 or 17,119 cases. Indeed, Tables 5 and 9 allow the inference that statutory damages, even when potentially enhanced for willful infringement, are too low to provide sufficient incentive for individual creators to file suit… although the article never considers this possibility (nor the difficulties of actually collecting such a judgment).3

The article also has a deeper, unstated analytic problem: It presumes that allegations of "willful infringement" are related directly, and only, to claims for enhanced statutory damages. Umm, not so much. Especially with smaller and independent copyright holders — authors, composers, and so on who are not operating under a collective umbrella — recovery of attorney's fees is at least as important as whether statutory damages might be enhanced. Since the willfulness of the violation is a critical factor in the Fogerty analysis, of course a high proportion of plaintiffs will plead willfulness.4 In short, the "willfulness" pleading issue is not an independent variable amenable to statistical analysis or argument.

There's also a fundamental legal-landscape problem that this article neglects: The Morris problem.5 By itself, this explains why there are so few individual authors who filed suit: They could't get into court in the Second Circuit during the study period. And authors of individual pieces in periodicals are those most in need statutory damages to make a lawsuit viable. The dataset self-selected against a substantial body of copyright holders — arguably, based upon the legislative history of the 1976 Act, those who were supposedly intended to benefit the most from the statutory damages provision. The Muchnick decision6 slightly relaxes this by enabling class-action suits to be filed and settled when not all copyright holders have "satisfactory" registrations, but only slightly — and only two years after the end of the study period.

Perhaps most egregiously, the article attempts to draw conclusions about the entire universe of copyright holder behavior from a nonrepresentative set. For example, as disclosed in Table 9, it analyzes more instances of motion picture and television-program infringements — a total of 27 filed cases, all of which by definition concern a copyright holder that is not a natural-person creator — than the combined total of art and text (10 clear instances and possibly up to 8 others categorized as "publishing" for unclear reasons). Let's just acknowledge that analyzing incentives, process, or anything else as would Warner Brothers is not at all comparable to an individual author or artist. The article also incorrectly assumes that copyright is the principle, or motivating, cause of action.!7

Bluntly, this article represents what happens when someone just crunches numbers without considering where they come from, where they're going, and what they omit as much as reveal. This is especially distressing because the article is ideologically aligned with a meme prevalent in the law-and-economics movement (and favorable to certain interests), but is not founded on solid data or solid analysis of the data (before even raising the "epidemiological versus empirical" monster from the depths of statistical analysis). The particular ideological meme involved — and it's apparent only when reaching the last six pages before the article's conclusion — is that because this dataset and analysis appears to show that the existing statutory damages regime is from at least one perspective (those potentially paying them — that is, the wrongdoers!) inefficient without using that word, the entire regime must be rejected. This meme, of course, rejects that "efficiency" is itself a normative judgment, precisely because it is perspective-bound to less than all participants in a particular economic system. As is this data set.


  1. Ben DePoorter, Copyright Enforcement in the Digital Age: When the Remedy Is Wrong, 66 UCLA L. Rev. 400, 417–18 (2019) (footnotes omitted).
  2. 17 U.S.C. § 512 (limiting liability of "online service providers" in a manner explicitly excluding statutory damages).
  3. Cf. Sony BMG Music Entertainment v. Tenenbaum, 719 F.3d 67 (1st Cir. 2011). Conversations with counsel involved in the matter indicate that nowhere near the $675,000 in statutory damages assessed for a long-running scheme that it is difficult to characterize as anything but willful infringement, given the number of warnings made, has been collected.
  4. Compare DePoorter, supra n.1, at 428 (neglecting attorney's fees as related to willfulness).
  5. Morris v. Business Concepts, Inc., 259 F.3d 65 (2d Cir. 2001), later op., 283 F.3d 502 (2d Cir. 2002) (publisher's compilation copyright registration certificate does not act as a copyright registration certificate for individual freelance articles contained in that issue of a periodical).
  6. Reed Elsevier, Inc. v. Muchnick, 559 U.S. 154 (2010).
  7. Cf., e.g., C.E. Petit, Accio Lawsuit!, Scrivener's Error (02 Nov 2007). ETA (21 Jul 2019): The characterization that there were "two" suits by "individual authors" is wrong… because I have personal knowledge, having been consulted on all of them, of at least four that should have been included in the data set (and I was not so influential or prolific or egotistical to think that means those were the only ones not included). Of course, three of them were adversary complaints in bankruptcy proceedings, so they probably didn't get noticed — and by definition the copyright claim, and even the statutory damages within the copyright claim, was not the primary motivation for those plaintiffs. I'm also aware of a 2005 bankruptcy proceeding — that is, comfortably inside of the study period — that included dozens of authors' claims, adversary complaints, and corresponding assertions of willful misconduct by the debtor related to statutory damages as a valuation measure.