06 October 2005

Author's Guild v. Google (6): Was There a Violation?

Due to a technical glitch, this did not post properly the first time around—thus, the backdating.

Ordinarily, this would be the first question one asks. If there would be no violation even without referring to the fair use defense, why bother with the four-factor weighing? Rhetorically, I think it was necessary to at least indicate that fair use is at best arguable—if only because so many ill-informed commentators12 have immediately jumped to that defense because it's easier to sell the public on that rhetoric.

In any event, let's take a closer look at the three violations I outlined previously.

  1. The scanning, copying, and storage of the works at the University of Michigan library
    This is probably the easiest theory to evaluate. Too many commentators are conflating this theory with the third one. That ignores the underlying premise of the Copyright Act: That a violation is complete upon copying absent a specific defense. The number of copies created, the profit from those copies, etc. all go to the appropriateness of the remedy and whether the violation should be treated as willful—but, excepting only fair use, they do not excuse the copying itself.
  2. The creation and dissemination of indices and/or search mechanisms from violation 1
    This is a somewhat more difficult violation to evaluate. On the one hand, making an index from a work is a derivative work that requires a license.13 On the other hand, the purported product—which is difficult to evaluate without having actually seen it—is not an index of just one work; it should, as described in public releases, function as a finding tool across a mass of works. Thus, there's an argument—not a good one in the abstract, but rhetorically attractive—that this is not an index to a particular work. The problem with that argument is that what Google proposes to do is, at its core, an index of indices. The top-level index (the composition—in the mathematical sense—of all those indices) is a violation only in the sense that it includes other violations; those lower-level, single-work indices, though, are direct violations.14 That does not, however, vitiate the original violation.

  3. The retrieval of undefined portions of the underlying works after a "hit" from violation 2 by a person or persons who have neither purchased nor otherwise obtained a license to do so
    This is by far the hardest violation to evaluate, because Google has promised to include only "limited portions" in response to each hit. The problem with this assertion, though, is that it is trivially easy to create a self-recursive program to spot a single work and recursively proceed through the manuscript at the granularity of the individual selections to recreate the entire text.15 Only by purposely sending false data back—which is antithetical to an accurate and useful indexing system—can one avoid this problem. On the other hand, that is at most an enablement similar to that in Sony.

    On balance, evaluating this potential violation must await Google's actual implementation. There are simply too many quirks to index-based retrieval for any evaluation at this stage to claim any accuracy.

That leaves us with one clear violation and two potential violations with varying degrees (and causes) of mud. Mud they may be; but they cover the ground. Thus, we do need to consider the fair use doctrine; but can (and should) only do so in the context of the actual violations.


  1. Or, worse, those with hidden (and not-so-hidden) agendas that create a results-oriented legal analysis. Despite the fact that I represent authors, I also represent several organizations and individuals who are primarily "re-users" of material. Perhaps that colors my analysis a bit; I think it does so far less than it does the EFF's, and certainly without the strident absolutism. For example, I concede that the Google program might well constitute fair use for some works under some circumstances; what I deny is that the program as a whole does so (or conceivably could do so, as we'll see down the road), whether considered as a class action or otherwise.
  2. Although this is far from the clearest line of reasoning under any circuit's law, it is clearest in the Second Circuit… and perhaps most obscure in the Ninth. Nonetheless, based on the reasoning in the most-recent major case out of the Second Circuit, the result of this prong of the violation is predicatable.
  3. For those of you who are law students and starting to study the Exclusionary Rule, this should sound a great deal like the "fruit of the poisonous tree" doctrine. The law is a web of common methods of reasoning; it's not seamless—if it was, there would be no conflict between the First Amendment and defamation—and this is an excellent example.
  4. I created and test-proved such a program on the original Amazon SITB offering—well, it did require an OCR side-trip, but that does not appear to be required for any system compatible with Google's current results system—in about two hours. It was quick; it was dirty; it had a purely command-line user interface; it worked. (Of course, I tested it on one of my own clients' works.)