If You Think It 'Thinks,' Think Again
In this New York Law Journal article, partner Claudia Ray discusses a recent, groundbreaking sanctions order from a Southern District of New York judge against an attorney who admitted to using fake citations generated by ChatGPT in a court filing.
Three columnists analyze a recent, groundbreaking sanctions order from a Southern District of New York judge against an attorney who admitted to using fake citations generated by ChatGPT in a court filing.
ChatGPT and other generative AI applications reflect a recurring challenge for lawyers: deciding whether, when and how to adopt new technologies. Although it might be tempting to base these decisions on anecdotal reports about the new technology’s purported benefits, such reports may not address fully (or at all) the technology’s limitations or risks.
As the recent sanctions decision in Mata v. Avianca illustrates, adopting a new technology without a clear understanding of what it can and cannot do can have significant negative consequences.
To help navigate these issues, there are some basic questions lawyers can ask to discover known risks of a new AI application, using information typically accessible online from the service provider (although each practitioner will need to consider whether additional questions and information might be dictated by their particular uses and needs).
On June 22, 2023, U.S. District Court Judge P. Kevin Castel of the Southern District of New York issued an opinion and order in Mata that provides a detailed narrative of the conduct of Peter LoDuca, Steven A. Schwartz, and their firm, Levidow, Levidow & Oberman P.C., who represented the plaintiff.
The sanctions order provides lessons to be learned about how to use (and not use) “generative artificial intelligence” (“generative AI”).
As the order explains, Schwartz used what is commonly referred to as ChatGPT (more formally referred to as “GPT-4” by its developer) for legal research and analysis in preparing an affirmation (the “Affirmation”) in opposition to defendant Avianca’s motion to dismiss, having “falsely assumed [it] was like a super search engine.” (Sanctions Order at 15.)
Schwartz did not investigate how ChatGPT works, its capabilities and limitations, or any warnings from its developer to users. He also overlooked warnings that its supposed legal analysis and supporting “case citations” and “case holdings” were factitious. As the order put it, ChatGPT responded to his requests for cases “by making them up.” (Id. at 17.) LoDuca filed the affirmation, including arguments that misrepresented the applicable law.
Avianca’s subsequent reply asserted that its counsel could not locate many cases cited in the Affirmation, and that others did not support the propositions for which they were cited. After the court itself was unable to locate the cited cases, it ordered LoDuca to provide the Court copies of several of them. Rather than looking to more traditional online research tools, Schwartz returned to ChatGPT.
When he prompted ChatGPT to produce the court-requested cases, rather than searching for existing judicial opinions it created material that had characteristics of judicial opinions but in fact were not. Although some of ChatGPT’s output consisted only of excerpts or incomplete opinions, Schwartz did not investigate further. Instead, he passed the ChatGPT-generated material to LoDuca, who submitted it to the court with an acknowledgment that most “cases” were incomplete and one could not be located.
The sanctions order describes at length the results of Castel’s review of the material LoDuca submitted, including errors in citations, case names, authors of opinions, and internal citations to other bogus “opinions.” Nevertheless, LoDuca and Schwartz “continued to stand by the fake opinions after judicial orders called their existence into question.” (Id. at 1.) The court ultimately found that the continued reliance on the ChatGPT material, compounded by untruthful statements made to the Court, constituted subjective bad faith. (Id. at 2, 29-31.)
The order suggests a number of questions lawyers should ask about capabilities and limitations, appropriate uses, validation of results, and security vulnerabilities whenever they use a new and relatively untested technology, especially if they do not themselves have any real familiarity with it.
Imagine that you have delegated the task of initial analysis, research, and drafting of a court brief to a lawyer you supervise. Your colleague proposes using ChatGPT. You ask, “what do you think it will help you do?” Your colleague says major law firms reportedly use it to speed up research and initial drafting of court filings, and there are some users who believe it is good at correctly stating and explaining the law. But you lack any experience with it. How can you decide whether ChatGPT is an appropriate tool for your project?
A reasonable first question is, what does the developer disclose about the technology’s capabilities, limitations, known risks, and user responsibilities? As it turns out, developers’ documentation addresses such issues.
For example, the Terms of Use posted by OpenAI, ChatGPT’s developer, provide notice that ChatGPT tends to generate inaccurate information:
• Section 3(d) (“Accuracy”): “Given the probabilistic nature of machine learning, use of our Services [ChatGPT] may in some situations result in incorrect Output that does not accurately reflect real people, places, or facts. You should evaluate the accuracy of any Output as appropriate for your use case, including by using human review of the Output.” ()
Takeaway: It’s up to the user to determine whether Output is accurate. Counsel will need time and other tools to validate the Output (or potentially risk Rule 11 sanctions). Anticipated efficiencies may not be achieved.
Another source of information is system cards, which explain a developer’s assessment of its technology’s performance, capabilities, and limitations, and the safeguards it has implemented to reduce the technology’s malfunctions and risks. The “GPT-4 System Card” provides additional details about ChatGPT’s tendency to generate inaccurate information, and recommends precautions a user can take to address it:
• ChatGPT is not a search engine; instead its model is trained “using a large dataset of text from the Internet, to predict the next word.”
• ChatGPT “has the tendency to ‘hallucinate,’ i.e., ‘produce content that is nonsensical or untruthful in relation to certain sources.” (GPT-4 System Card, 2 and 6.)
• ChatGPT “maintains a tendency to make up facts, to double-down on incorrect information, and to perform tasks incorrectly. Further, it often exhibits these tendencies in ways that are more convincing and believable than earlier GPT models (e.g., due to authoritative tone …), increasing the risk of overreliance. Overreliance occurs when users excessively trust and depend on the model, potentially leading to unnoticed mistakes and inadequate oversight. This can happen in various ways: users may not be vigilant for errors due to trust in the model; … or they may utilize the model in domains where they lack expertise, making it difficult to identify mistakes. … As mistakes become harder for the average human user to detect and general trust in the model grows, users are less likely to challenge or verify the model’s responses.” (ChatGPT-4 System Card, pp. 19 – 20.)
Takeaway: As the System Card warns, users must determine whether ChatGPT “makes up facts” and “doubles-down on incorrect information.” In practice, lawyers may find that checking facts and refining arguments offered up by generative AI takes more time than it saves.
In using any digital technology, a lawyer also must take precautions to ensure that cyber threat actors do not put at risk client confidential information or cause an inadvertent waiver of attorney-client and attorney work product privileges. OpenAI’s Terms of Use shifts significant responsibility for ChatGPT cybersecurity to the user:
• Section 5(b): “You must implement reasonable and appropriate measures designed to help secure your access to and use of the Services. If you discover any vulnerabilities or breaches related to your use of the Services, you must promptly contact Open AI and provide details of the vulnerability or breach.”
Takeaway: OpenAI makes no representation that ChatGPT is cybersecure, or that safeguards have been implemented to detect when threat actors might be operating within it or exploiting a user’s connection to gain access to a user’s confidential electronic records. There is no representation that OpenAI will alert its users to cyber breaches of ChatGPT so that users can disengage from it to protect their networks and data storage media. Users are on notice that they are responsible for cyber risks and may have incomplete information about risks to their data and systems.
OpenAI’s documentation also discloses that any confidential information a user puts into a “prompt” and uploads to ChatGPT will cease to be confidential. For example, its “Commonly Asked Questions” highlight the potential risks to client confidential information:
5. Who can view my conversations?
• [W]e review conversations to improve our systems and to ensure the content complies with our policies and safety requirements.
6. Will you use my conversations for training?
•Your conversations may be reviewed by our AI trainers to improve our systems.”
Takeaway: If a lawyer (or staff member) puts any client confidential information into a prompt and uploads it to ChatGPT, the information can be accessed by developer’s personnel and should probably be presumed to no longer be confidential.
The ChatGPT Terms of Use also reveal another way confidential information may be compromised: OpenAI specifically reserves the right to use “Content” to improve its Services. It is unclear, however, precisely what portion of information put into a “prompt” may be used. Compare the January and March 2023 versions of Section 3(c), addressing OpenAI’s use of user-entered information:
January 2023: “To help OpenAI provide and maintain the Services, you agree and instruct that we may use Content to develop and improve the Services.”
March 2023: “We do not use Content that you provide to or receive from our API (“API Content”) to develop or improve our Services. We may use Content from Services other than our API (“Non-API Content”) to help develop and improve our Services. You can read more here about how Non-API Content may be used to improve model performance.”
Takeaway: The Terms of Use neither define nor explain OpenAI’s API. For that, users have to dig further into the website. Beyond the API functionality, however, OpenAI may still have access to and “review” (but not “use”) user content. Leaving content open to access and review deprives the content of its confidentiality.
Lawyers also should bear in mind a caution explained in the System Card: as ChatGPT improves, users will likely perceive its inaccuracies to be less frequent, trust it more, rely on it too much, and be “less likely to challenge or verify the model’s responses.” As a result, OpenAI warns that ChatGPT may put users at increased risks over time of missing errors and inaccuracies.
Experimenting with new technology, testing its limitations, and considering the value it may bring to legal practice is consistent with our professional responsibility to keep abreast of technological developments. New technology does not, however, obviate a lawyer’s obligation to be competent, truthful, and trustworthy.
For example, none of Schwartz’s prompts for legal arguments and supporting cases requested cases to the contrary or later authorities citing the produced “cases.” And, the court found that Schwartz “did not have the full text of any ‘decision’ generated by ChatGPT [when he prepared the Affirmation]. … He cited and quoted only from excepts generated by the chatbot.” (Sanctions Order at 17.) Taking generative AI out of the picture, this still would not reflect best practices in researching a legal issue.
Further, the negative consequences of overreliance on generative AI can also be more subtle. Asking a generative AI application to craft a first draft deprives the lawyer of the opportunity to exercise their skills in crafting arguments and outlining the issues to be addressed.
OpenAI admits in its website description of ChatGPT, that “[t]he model is often excessively verbose and overuses certain phrases …” (Nov. 30, 2022, https://openai.com/blog/chatgpt.) Writing is a skill that requires continuous practice to sustain and improve. If writing is delegated to generative AI, the skills of experienced lawyers may atrophy, and newer lawyers will have fewer opportunities to exercise and develop their advocacy and writing skills.
Although generative AI may offer lawyers a potentially powerful tool, the developer-provided information suggests that at the very least, lawyers need to carefully investigate its capabilities and verify its output before accepting it as part of their broader legal toolkit.