23 November 2005

More Statistics

Professor Goldman notes a "study" on § 512(c)(3) notices that has been posted to the 'net. To say the least, I'm a little bit skeptical of the study.

Before getting into the substance, I'd like to note a problem that is distressingly common in the executive summaries of most "empirical studies" that were not supervised by scientists (and that does not include economists). Although the study itself may discuss this problem, the executive summary completely ignores the problem of overlapping data. For example, the study notes both that "[t]hirty percent of notices demanded takedown for claims that presented an obvious question for a court" and that "[o]ne out of 11 included significant statutory flaws that render the notice unusable (for example, failing to adequately identify infringing material)." Although I have not done an exhaustive analysis of the entire chillingeffects.org data set, a random sample of twenty "flawed" notices showed that fourteen—70%—also "presented an obvious question for a court." That is, one cannot get an accurate picture of the prevalance of the problem by adding up the flaws and, on the basis of the figures reported on the first page of the executive summary, conclude that most (or all) notices are so flawed that the system is meaningless. That, however, is precisely what some third-party commentators have done. I would hope that the complete article makes this problem clear; the executive summary, however, does not.

Turning to the substance, the study notes a relatively high proportion of "noncompliant" notice letters. Frankly, this is more the fault of the statute's drafters than anything else… and ignores the actual standard. (In any event, under § 512(c), an ISP is required to work with the complainant to make a complaint compliant—a startlingly rare event in practice, and one frowned upon by some courts.1 I'd like to think I know how to write a compliant § 512(c)(3) letter. I've noticed a fair number of the letters at chillingeffects.org that appear to draw on the two sample DMCA letters I've posted on my site.2

Dataset bias, though, is the main flaw in the study's basic concept. Even Google is not inclusive in providing notices to chillingeffects.org. Google's contributions reflect an extreme bias toward textual materials that seems (at least in my experience) inconsistent with actual practice. For example, there are at least half a dozen § 512(c)(3) letters that I've prepared for artists that do not appear at chillingeffects.org, although complaints to the same ISPs for textual materials somehow end up there. Then, too, there's the mistaken characterization of index problems as purely § 512(d) issues. I find it rather curious that the long-term storage of search results at Google gets no attention; in fact, in about 20% of the instances in which I assist an author in dealing with Google, the problem concerns both a search result and Google's target cache. In my judgment, Google's target cache doesn't even come close to qualifying for § 512(d), as it's nontransitory (and, in fact, frequently includes pages that disappeared months ago); that is a storage problem. However, Google has consistently characterized these complaints as attacks on its index only (although my own sample is too small to actually draw a conclusion).3

Finally, there's the 512(i) problem, as in Ellison: A significant proportion of ISPs and sites out there that have third-party material on them simply do not register a DMCA agent. This, in turn, means that only ISPs that are already knowledgeable of (and generally resistant to) § 512 are reporting to chillingeffects.org. In short, the database is self-selecting for the skeptical.

One would hope that these, and other, objections will be discussed in the full article. I am disappointed that they are not mentioned even in passing in the executive summary; when one will strongly qualify one's conclusions, as would be required by any of these problems, the executive summary must also reflect the qualification. That makes me question whether the final article will in fact do so.


  1. See, e.g., ALS-Scan v. ReMarQ Communities, Inc., 239 F.3d 619, 620–21 (4th Cir. 2001).
  2. One other problem that I'd like to note is the egregious mischaracterization of takedown notices as being irrelevant to § 512(a). The executive summary claims that:

    Second, we note that nearly half the notices in the self-reported set were sent in response to a situation where 512(a) would applylargely situations where alleged infringers are trading files across peer-to-peer networks. In fact, 512(a) establishes a straightforward safe harbor for OSPs acting as conduits, with no notice-and-takedown procedure; further, because complained-of files reside on user machines, the OSP cannot take down the material in the first place.

    Summary at 8–9 (citations omitted). This is incorrect, as it omits a critical qualification: That the "conduit" operator is not aware of the conduit's nature as a source of infringements. The only means available to an individual copyright holder to ensure that an ISP can't deny such knowledge is to send it a notice under § 512(c)(3)—another example of poor drafting of the statute. Cf. ALS-Scan, supra (noting that the very name of the newsgroups—a communication type that some courts (Ellison) and most of the "user community" incorrectly characterize as a mere conduit—provided sufficient knowledge of infringing activity to require at least further inquiry).

  3. This also assumes that the broadest—and, as I've remarked, indefensible—interpretation of Arriba Soft is the only correct statement of the law concerning indexes and infringement. That, of course, could take up a couple of hundred footnotes by itself.