Blog Post View


On May 13, 2025, a federal court filing changed the risk balance for every organization that relies on ChatGPT.

The review is prepared by Dmitry Baraishuk, a partner and Chief Innovation Officer (CINO) at a software development company Belitsoft (a Noventiq company).

U.S. Magistrate Judge Ona Wang signed a preservation order directing OpenAI to retain all ChatGPT conversations – regular chats, “temporary” sessions, items users had already deleted, and every request that passes through OpenAI’s enterprise API – until the court instructs otherwise.

The mandate is a forward-looking command that freezes the entire flow of conversational data in place. By definition, it overrides every automated deletion routine that OpenAI had implemented for consumer privacy, model training hygiene, and contract compliance.

The order surprised many litigators because, in ordinary copyright cases, judges preserve only the specific artifacts that plaintiffs already know exist. Here, Judge Wang concluded that plaintiffs could never know what exists unless everything is first saved.

The Theory That Convinced the Judge

The plaintiffs – The New York Times and a coalition of other news publishers – argued that verbatim or near-verbatim copies of paywalled articles regularly appear in ChatGPT answers when prompted in certain ways. They further argued that users who request entire articles have every reason to delete those chats as soon as they finish reading, because the dialogue itself is evidence of infringement. From the publishers’ perspective, every deleted chat is a potentially valuable admission that their content is being distributed without a license. When OpenAI produced only opt-in log samples – material from users who had explicitly allowed their conversations to be reviewed – the plaintiffs claimed the company was selecting the very population least likely to contain infringing behavior. Judge Wang accepted that logic, remarking in a January hearing that if a reader bypassed a paywall and then clicked “delete,” the key proof would vanish forever unless the court intervened.

OpenAI’s Procedural Objection and Its Limits

OpenAI responded that the order was rushed and issued ex parte, meaning the company had no time to brief or argue. In most civil cases, such a move would be grounds for an immediate motion to vacate. Courts, however, have broad latitude to preserve evidence when they believe delay risks irreparable loss. Here, Judge Wang characterized deletion as ongoing spoliation (continuing destruction of evidence). Even if an appellate court ultimately narrows the scope, OpenAI must comply in the interim. That creates immediate operational obligations long before any relief from a higher court is possible.

The Immediate Technical Re-Architecture

Until mid-May, OpenAI’s retention tiers were simple. Free-tier and ChatGPT Plus users could toggle “chat history & training.” Deleted items were hard-deleted on a schedule. “Temporary Chat” never reached long-term storage. Full account deletion triggered a 30-day purge, and most enterprise API traffic expired within days. The order forces OpenAI to build a parallel evidence vault that ingests every byte of prompt and response data in real time, stamps it immutably, encrypts it, and stores it under strict chain-of-custody controls. Because the vault cannot feed back into model training without contaminating the litigation record, OpenAI must split data flows that were never designed to be separated. Engineers now face a trade-off. They must migrate quickly and risk bugs that leak sensitive data or move carefully and risk contempt if the vault is not complete. Either path consumes scarce senior engineering capacity just as the company is scaling new multimodal products.

Collision with Privacy Promises and Commercial Contracts

OpenAI built market share on the assurance that users control their own data. For consumers, the “trash can” icon meant erasure. For enterprise customers, the API documentation promised no retention, no training use, and location-bound storage. The preservation order nullifies those promises overnight. In the enterprise channel, data processing agreements and HIPAA business associate contracts often contain explicit service-level objectives for deletion. If OpenAI now holds data indefinitely, counterparties can argue breach of contract, triggering indemnification clauses or the right to terminate. For consumer trust, the symbolism is equally stark: a feature labeled “delete” now simply changes an icon in the interface, while the underlying record persists under judicial seal.

Regulatory Headwinds, Especially in Europe

The EU’s General Data Protection Regulation allows retention when “necessary for the establishment, exercise, or defense of legal claims.” However, GDPR still requires data minimization, purpose limitation, and geographic safeguards. If an American court requires a U.S. company to store EU residents’ conversations in a U.S. data center and to disclose them to a third-country court, the company faces a classic conflict-of-laws scenario similar to the Schrems cases. Irish and French regulators have already shown a willingness to fine U.S. technology firms up to four percent of global revenue for lesser incompatibilities. Unless OpenAI builds a sovereign EU evidence vault or secures an explicit derogation, supervisory authorities could open parallel investigations. That would add layers of cost and delay for every customer with European traffic.

From a corporate customer’s viewpoint, the new archive is a discoverable trove that plaintiffs in other disputes can subpoena. Imagine a trade secret lawsuit where opposing counsel claims an engineer typed confidential design notes into ChatGPT. Under U.S. rules, that log is presumptively within OpenAI’s “possession, custody, or control,” and therefore discoverable. Judges are often sympathetic to requests for third-party cloud logs when a litigant shows reasonable need. The result is that confidential internal brainstorming – once thought ephemeral – may surface years later in unrelated litigation. Risk managers therefore have to treat each ChatGPT prompt as if it were an email stored on a corporate server with a seven-year retention policy.

Financial and Engineering Cost to OpenAI

OpenAI told the court that compliance will require “months” of work. Industry veterans translate that to tens of millions of dollars in storage, key management, and audit logging costs – expenses that do not improve model accuracy or latency but instead insure against contempt citations. In a capital-intensive artificial intelligence race, every unplanned dollar allocated to compliance leaves less funding for GPU leases and research hires. If margins shrink, OpenAI may revisit pricing, especially for high-volume enterprise customers who generate most of the storage burden.

Early Signs of Customer Flight and Competitive Positioning

Within days of the order, security leaders at several Fortune 500 companies reported that they had blocked chat.openai.com and paused deployments pending legal review. Consultancy briefs circulated recommending that clients treat any external large language model as a public cloud for purposes of privilege and data classification. Rival providers seized the moment. Mistral emphasized its French jurisdiction and optional on-premises deployment. DeepSeek promoted open-weight checkpoints that remain entirely inside a customer’s firewall. Google highlighted Gemini’s region-locking in certain cloud regions. Even Microsoft, OpenAI’s closest partner, began fielding urgent questions about whether Azure OpenAI instances are covered by Judge Wang’s order. Until a court clarifies that point, legal departments will assume the broadest possible exposure.

Reopening the Copyright Economics Debate

Behind the discovery dispute lies the unresolved business question of who pays for training data. Publishers assert that OpenAI built a multibillion-dollar model on unlicensed text and now profits by providing subscription access to derivative outputs. OpenAI and its supporters counter that large-scale text analysis is transformative, fair use and socially beneficial. The preservation order does not resolve that question, but it does shift leverage toward publishers. The more evidence they can extract from preserved logs, the larger the potential damages calculation under statutory multipliers for willful infringement. That threat could push OpenAI and other model vendors to pursue retrospective licensing deals, reshaping the economics of generative artificial intelligence.

International Political Fallout and Data Sovereignty Rifts

European policymakers view Judge Wang’s order as another example of extraterritorial reach by U.S. courts. Some members of the European Parliament have already cited the case while arguing for stricter “AI data sovereignty zones” in forthcoming AI Act implementation guidelines. Meanwhile, U.S. legislators sympathetic to local newsrooms frame the order as a necessary counterweight to unlicensed scraping. The trans-Atlantic privacy struggle that began with Safe Harbor, continued through Privacy Shield, and now operates under the Data Privacy Framework has a new front in generative artificial intelligence logs. Companies operating on both continents must prepare for a scenario in which regional data silos are not just compliance options but regulatory mandates.

Practical Mitigation Pathways Organizations Are Considering

Legal and engineering leaders now discuss several concrete mitigation measures. Encrypt-and-escrow, where a court holds the only key, preserves evidence without granting OpenAI personnel routine access but complicates key rotation and disaster recovery. Statistical sampling preserves a defensible subset but may satisfy neither side. Plaintiffs worry about missing rare events, and privacy authorities dislike even random indefinite retention. Geofencing EU data requires duplicate infrastructure and can cause model quality to drift if training corpora diverge. Local deployment of open-weight models solves the retention issue at the expense of higher compute costs and slower update cycles. No option is painless, so boards must weigh litigation certainty against operational efficiency.

Short- to Medium-Term Outlook

Expect OpenAI to petition for a narrower order within the next quarter, perhaps proposing a hybrid approach of sampling plus encryption. Plaintiffs will resist, citing the ongoing risk of opaque deletion. If the order remains in effect through year-end, OpenAI is likely to introduce a premium “litigation-safe” service tier whose higher price covers storage and audit overhead. Enterprise adoption of on-premises large language models will accelerate, driving a secondary market in inference-optimized hardware and managed service wrappers. Parallel GDPR investigations could begin in 2026, forcing separate European preservation regimes. In the long run, whichever settlement or judgment emerges will set an industry precedent. Either courts will confirm that full conversational logs are discoverable assets, or they will accept that some balance – sampling, anonymization, regionalization – better aligns evidence collection with privacy principles.

Strategic Conclusion

For C-suite executives, the message is clear. From May 13, 2025, forward, any interaction your personnel have with ChatGPT or an OpenAI-powered tool is presumed to be a permanent, discoverable business record stored in a repository you do not control. That reality overrides user-interface features labeled “delete,” supersedes standard API retention language, and inserts U.S. litigation risk into every conversation your brand conducts with an external large language model. Boards that respond early by auditing data flows, renegotiating contracts, and piloting alternative model stacks – will experience the least disruption when the next subpoena or regulatory letter arrives.

How Enterprise AI Teams Can Benefit

Enterprise risk teams increasingly demand assurance that their data remains fully under their control. Vendors such as Belitsoft offering on-premise, single-tenant, or sovereign-cloud LLM deployments are well-positioned to meet this need. Available solutions include:

  • Full on-prem LLM packages that operate on customer-owned GPUs or within private-cloud environments
  • Privacy-first platforms with built-in encryption, audit logging, and jurisdictional data fencing
  • Hybrid architectures that keep sensitive prompts local while routing non-sensitive workloads to the cloud for scalability and cost savings

Migration Services

For large-scale enterprise clients, tailored migration services are also available:

  • Consultants can rewrite prompts, chain-of-thought logic, and embeddings to be compatible with open-source models like Llama or Mistral
  • Integration toolkits enable seamless replacement of OpenAI endpoints with alternative solutions—without disrupting downstream systems or workflows

Share this post

Comments (0)

    No comment

Leave a comment

All comments are moderated. Spammy and bot submitted comments are deleted. Please submit the comments that are helpful to others, and we'll approve your comments. A comment that includes outbound link will only be approved if the content is relevant to the topic, and has some value to our readers.


Login To Post Comment