Article The New York Law Journal

Searching for Web Crawling's Legal Boundaries

22 May 2017

Webpages are a treasure-trove of useful information for financial firms and software companies that are able to capture it using web crawling (or scraping) technology. Yet, for over 20 years, courts have struggled to draw the line between the usefulness of such information and the rights of the content owners and website operators from which that content is derived. Once a niche issue, the increased use of this technology has compounded the disputes related to it.

In particular, website operators have used the Computer Fraud and Abuse Act (CFAA) to prevent crawling of their websites. While recent judicial opinions have harmonized the rules for accessing websites without authorization, the courts diverge as to whether the CFAA prohibits accessing otherwise publicly available information for an unauthorized purpose. Moreover, new web crawling techniques are testing the limits of existing case law.1

Increased Use of Web Crawling

Whether a finance firm engaged in quantitative analysis or a software company developing new search algorithms, technology-minded businesses are routinely and automatically accessing third-party websites every day using variations on web crawling to gather content and information. Generally, they start with a seed list of webpages from which they will request content, including HTML, text, image, and other files. Then, they copy the files and either extract specific data or the entirety of the files for later analysis.

For example, search engines generally identify hyperlinks and keywords from accessed webpages, add that information to their database for later analysis to improve their search algorithm, and continue to move across the Internet looking for new sources of content. Technology-savvy businesses, however, continue to develop new uses for search technology. Thus, while early efforts may have involved creating databases of factual information or gathering contact information for marketing solicitations, all manner of uses have been developed, including follow-on and copy-cat services that repeatedly access competitors' platforms as part of their functionality. In addition to potential copyright issues not discussed here, these new services may raise concerns for website operators if they disrupt the operators' services or damage their servers.

Financial professionals (particularly "quant analysts") use similar technology with algorithmic trading and quantitative analysis. Their databases generally are more targeted and only access specific files on a website that are to be used for financial analysis. In analyzing such information, their systems identify new sources of information and analyze existing resources to optimize trading by the organization and to adjust financial models, often automatically.

Legal Questions

Depending on how such web crawling is conducted, it could implicate the rights of the content owner, as well as the website operator. Early cases analyzed web crawling technology through the lens of trespass law and similar rights.2 It, however, did not take long for content owners to bring claims under copyright law and other intellectual property disciplines against those using web crawling technology.3 Today, claims have been raised under a range of legal disciplines from breach of contract to trade secrets misappropriation to other forms of unfair competition.

One legal issue that merits particular attention is the CFAA, a criminal and civil statute that "prohibits acts of computer trespass by those who are not authorized users or who exceed authorized use."4 While some have attempted to limit the CFAA's reach by referring to it as merely an "anti-hacking" statute,5 courts have found that web crawling technology potentially can violate the Act.6

All courts to have considered the issue agree that a company using web crawling technology "can run afoul of the CFAA when he or she has no permission to access a computer or when such permission has been revoked explicitly."7 Permission can be revoked in a number of ways, including issuance of a cease and desist letter, implementing technological measures such as IP address blocking, or revoking login credentials.8

The courts, however, differ in their approach to those that are given some access but have "exceeded the limits of their authorization" by retrieving material for unauthorized purposes.9 As the Ninth Circuit recently reaffirmed, it, as well as the Second and Fourth Circuits, have interpreted the CFAA such that these activities are not a violation.10 The First, Fifth, Eighth, and Eleventh Circuits, by contrast, extend potential liability to access that falls outside the "purposes for which access has been given."11 While the Supreme Court recently considered application of the CFAA, it did not resolve this well-developed circuit split.12

As a result, while a web crawler that accesses websites without authorization or when authorization is revoked violates the CFAA, the courts might reach different results for a company that crawls webpages that permit public access but prohibit web crawling or other activities in which the company is engaged.

Practical Considerations

The growing interest in web crawling among financial firms and software companies suggests that disputes will continue to arise as new technologies are developed. It is therefore important for those engaged in web crawling to understand that simply because content and information can be found on the Internet, does not mean that all means of accessing it are permissible. Moreover, courts have held that accessing a website after authorization has been revoked is not permissible.13

That being said, while a careful web crawler might want to review the terms of use of each website it intends to capture to confirm that web crawling is permitted, the Ninth Circuit has expressed concern that requiring such careful analysis is not practical.14 Thus, it has held that "violation of the terms of use of a website cannot itself constitute access without authorization."15 The First Circuit, by contrast, has held that a "lack of authorization could be established by an explicit statement on the website restricting access," such as terms of use, but even it has cautioned that "public policy might in turn limit certain restrictions."16 Similarly, courts have considered whether use of technological measures, such as the Robots Exclusion Protocol (or robots.txt),17 might be used as a proxy for such restrictions.18

Conclusion

The ongoing litigations referenced in this article and those filed in the future may provide greater clarity on the bounds of legal web crawling. For now, businesses using these techniques should tread carefully lest they get caught in the CFAA's web.

Endnotes:

1. Cf. Adrianne Jeffries, "How Google eats a business whole," OUTLINE, April 17, 2017, https://theoutline.com/post/1399/how-google-ate-celebritynetworth-com.

2. eBay v. Bidder's Edge, 100 F. Supp. 2d 1058 (N.D. Cal. 2000); Am. Online v. LCGM , 46 F. Supp. 2d 444 (E.D. Va. 1998).

3. Associated Press v. Meltwater , 931 F. Supp. 2d 537 (2013); Ticketmaster v. Tickets.Com, No. 99 Civ. 7654, 2003 WL 21406289 (C.D. Cal. March 7, 2003).

4. Facebook v. Power Ventures , 844 F.3d 1058 (9th Cir. 2016).

5. United States v. Nosal , 844 F.3d 1024, 1049 (9th Cir. 2016) (Reinhardt, J., dissenting).

6. EF Cultural Travel BV v. Zefer , 318 F.3d 58 (1st Cir. 2003); CouponCabin v. Savings.com, No. 2:14 Civ. 39, 2017 WL 83337 (N.D. Ind. Jan. 10, 2017), 2016 WL 3181826 (N.D. Ind. June 8, 2016); Craigslist v. 3Taps, 942 F. Supp. 2d 962 (N.D. Cal. 2013); Snap-on Bus. Solutions v. O'Neil & Assocs. , 708 F. Supp. 2d 669 (N.D. Ohio 2010).

7. Facebook v. Power Ventures , 844 F.3d 1058, 1067 (9th Cir. 2016). The private right of action under the CFAA also requires that the plaintiff "suffer[] damages or loss," 18 U.S.C. §1030(g), but loss has been broadly defined and may include the time that the website operators spends "analyzing, investigating, and responding to [the web crawler's] actions." Facebook, 844 F.3d at 1066.

8. Facebook, 844 F.3d at 1067; United States v. Nosal , 844 F.3d 1024, 1036 (9th Cir. 2016); CouponCabin, 2017 WL 83337, at *3.

9. Facebook, 844 F.3d at 1068.

10. Id. (discussing United States v. Nosal , 676 F.3d 854 (9th Cir. 2012)); United States v. Valle , 807 F.3d 508 (2d Cir. 2015); WEC Carolina Energy Solutions v. Miller, 687 F.3d 199 (4th Cir. 2012).

11. United States v. John , 597 F.3d 263, 272 (5th Cir. 2010); see also United States v. Teague , 646 F.3d 1119 (8th Cir. 2011); United States v. Rodriguez , 628 F.3d 1258 (11th Cir. 2010); EF Cultural Travel BV v. Explorica , 274 F.3d 577, 581-84 (1st Cir. 2001).

12. Musacchio v. United States , 136 S. Ct. 709 (2016). Two petitions for certiorari are currently pending before the court. Power Ventures v. Facebook, No. 16-1105 (U.S.); Nosal v. United States, No. 16A840 (U.S.).

13. Facebook v. Grunin , 77 F. Supp. 3d 965 (N.D. Cal. 2015).

14. Nosal, 676 F.3d at 861.

15. Facebook, 844 F.3d at 1068.

16. EF Cultural Travel BV v. Zefer , 318 F.3d 58, 62 (1st Cir. 2003).

17. Website operators use the Protocol to tell web crawler programs what files or folders should not be visited.

18. QVC v. Resultly , 99 F. Supp. 3d 525, 540 (E.D. Pa. 2015); Healthcare Advocates v. Harding, Earley, Follmer & Frailey , 497 F. Supp. 2d 627, 648 (E.D. Pa. 2007).

Joshua L. Simmons, P.C.

Practices