In my last post, I walked through the consumer-versus-commercial divide in how major LLM providers handle data — and why that divide carries real legal consequences after the Southern District of New York's decision in United States v. Heppner. The takeaway was that consumer AI products operate under terms that were not designed with legal privilege, confidentiality, or regulatory compliance in mind.
A reasonable follow-up question is: What about the API?
If the consumer chatbot is the problem, the thinking goes, then switching to API access should be the solution. And there is something to that. API tiers offered by OpenAI, Anthropic, and Google operate under fundamentally different data-handling regimes than their consumer counterparts — regimes that are, by almost every measure, more protective of user data. But "more protective" is not the same thing as "compliant," and the distinction matters more than many organizations seem to realize.
What the API actually changes
The previous post compared consumer and commercial tiers in detail for Anthropic's Claude. The same structural divide exists across providers, and the API sits squarely on the commercial side. Here is what that means in practice.
Anthropic's commercial API retains zero data by default — inputs and outputs are not stored after the response is delivered. OpenAI's API retains data for 30 days for abuse monitoring but does not use it for model training, and offers Zero Data Retention for eligible endpoints. Google's Vertex AI operates under a Cloud Data Processing Addendum with contractually defined retention and no training use. In each case, the API provider acts as a data processor rather than a data controller, meaning the customer — not the provider — determines the purposes and means of processing.
These are meaningful differences. A consumer chatbot conversation may be retained for months or years, used to train future models, and governed by a privacy policy the user never read. An API call, properly configured, may leave no trace on the provider's systems at all. For anyone whose data-handling concerns begin and end with "I don't want my inputs in someone else's training set," the API is a substantial improvement.
But regulatory compliance does not begin and end there.
Why the API is not enough
Every major regulatory framework governing sensitive data — FERPA, HIPAA, state student-privacy laws, professional-conduct rules — imposes obligations that go well beyond what the API's data-handling defaults can address. The API solves one problem (provider-side data retention and training) while leaving most of the compliance architecture untouched.
Consider what a framework like HIPAA actually requires. A covered entity processing protected health information through an API must execute a Business Associate Agreement with the provider. That BAA must specify permissible uses and disclosures, require the provider to implement administrative, physical, and technical safeguards, and establish breach-notification obligations. The API's zero-retention default is a helpful technical control, but it does not substitute for the BAA itself. And the BAA, once signed, typically imposes configuration requirements — specific endpoints, disabled features, audit logging — that the organization must affirmatively implement and maintain.
FERPA presents a parallel structure. An educational institution using an API to process student education records must establish that the provider qualifies under the "school official" exception, which requires a written agreement specifying the provider's function, its relationship to the institution's use of the data, and the institution's direct control over the data's use. The API's default against training on customer data is necessary but not sufficient — the institution still needs the agreement, the access controls, and the governance to ensure that student records do not flow into the API in ways the agreement does not contemplate.
The pattern repeats across regulatory contexts. State biometric-privacy statutes require informed consent and retention schedules that no API default can satisfy. Professional-conduct rules governing lawyer confidentiality — sharpened considerably by Heppner — demand not just favorable vendor terms but documented due diligence, competence in evaluating the technology, and ongoing supervisory obligations. An API key does not discharge any of those duties.
The architectural gap
There is a subtler problem that the "just use the API" approach tends to obscure. When an organization integrates an LLM through an API, the API handles the model-inference layer: data goes in, a response comes back, and the provider's data-handling policies govern what happens on their end. But most real-world deployments involve considerably more than a single API call.
Data passes through preprocessing pipelines, prompt templates, logging systems, vector databases, retrieval-augmented generation stores, and output caches — all of which sit on the customer's side of the line. The API provider's zero-retention commitment says nothing about what happens in those layers. An organization can use a zero-retention API and still retain every input and output indefinitely in its own infrastructure, expose sensitive data through poorly secured retrieval stores, or inadvertently log protected information in application-level monitoring.
This is the architectural gap that a provider-side compliance posture cannot close. The API governs data handling at the model layer. Regulatory compliance governs data handling end to end.
What "more protective" actually means
None of this is an argument against using the API. The data-handling improvements are real, and for many use cases they represent the minimum viable starting point for responsible deployment. An organization that uses the consumer chatbot for work involving sensitive data has a serious problem. An organization that uses the API has a less serious problem — but it still has a problem if the API is the beginning and end of its compliance strategy.
The useful framing is not "consumer versus API" as a binary compliance decision. It is "API as a necessary but insufficient component of a compliance architecture." The API provides a defensible data-handling posture at the provider layer. Everything else — the agreements, the access controls, the internal data governance, the training, the monitoring, the documentation — remains the organization's responsibility.
For institutions and professionals operating under regulatory constraints, the practical question is not whether to use the API. It is whether you have built the rest of the compliance architecture around it — and whether you can demonstrate that you have if someone asks.
Provider-specific data-handling policies referenced in this post draw on the same sources cited in the previous post, supplemented by Anthropic's Privacy Center, OpenAI's API data usage documentation, and Google's Vertex AI data governance documentation. This post is not legal advice. Compliance obligations vary by jurisdiction, regulatory framework, and organizational context. Consult qualified counsel for guidance specific to your situation.