Who Owns AI Generated Content?

What Your Vendor Contract Actually Says

by

Teddy Ellison

AI & Data Agreements

Summary

Contractual ownership, copyright protection, and IP indemnity are three different things. Your vendor contract addresses the first. The second depends on how much human judgment went into the output. The third has carve-outs specific enough that most production AI products fall outside the coverage in at least one place. Conflating them is the most common and most expensive mistake founders make with AI vendor agreements.

AI vendor contracts have changed substantially over the past few years. OpenAI, Google, Microsoft, and AWS have all moved toward broader output assignment and more explicit IP indemnity, and the contractual position for paying customers is better than it was back in 2023. 

The problem is that the contractual position is only one of three big things that matter. Too many founders are signing enterprise AI agreements without a clear picture of what the contract actually covers and where it stops. As a result, a meaningful number of them are already in breach of the competing-model restrictions in the agreements they are building on, even as they make IP representations to customers, and raise capital. 

To understand where your actual IP exposure sits, it’s important to understand that contractual ownership, copyright protection, and IP indemnity are three different things. The vendor assigns the first. The second depends on how your product was built. The third often looks like protection until you read what it excludes. 

Let’s unpack the implications of each, and how founders can position themselves best before signing off of their next AI vendor contract.

What most contracts say you own

The output assignment language in the major vendor agreements is now fairly consistent. OpenAI's Services Agreement assigns to the customer all of OpenAI's right, title, and interest in outputs. Anthropic's Commercial Terms do the same, with the additional commitment that Anthropic may not train models on customer content from paid services. Google's Gemini API, Azure OpenAI, and AWS Bedrock all follow similar structures for paying customers.

This assignment means your vendor is not in a position to claim ownership of what your product generates. But this does not mean that you have copyright protection in those outputs, or that the vendor will defend you if the outputs turn out to infringe something else.

The clause founders miss most often is the competing-model restriction, not the assignment clause. As of June 2026, OpenAI’s terms of service prohibits using outputs to develop AI models that compete with OpenAI's products and services, with a Permitted Exception for internal embeddings and classifiers, and fine-tuning of OpenAI's own models. Building a smaller proprietary model on synthetic data generated by GPT sits outside that exception.

Anthropic goes further by prohibiting access to all services to build a competing product, not just outputs. A lot of founders are building AI products with this kind of architecture right now, and many of them are unwittingly in breach of one or both contracts. Common examples of this include using frontier model outputs as synthetic training data for a smaller proprietary model and building distillation pipelines on top of GPT or Claude to reduce inference costs at scale. Both are breach, although neither looks like it on the surface. 

A second category that routinely surprises founders involves fine-tuned models. When you fine-tune a model on OpenAI's or Anthropic's infrastructure, you do not receive the weights. You receive a vendor-hosted inference endpoint, meaning the model isn’t portable. This means you own the training data and the prompts, but the deployed artifact lives on someone else's infrastructure and cannot be transferred. A pitch deck that describes a "proprietary fine-tuned model" as the core IP asset should expect this question in a Series A diligence process, because a thorough investor's counsel will ask it.

What copyright law actually protects

The contract question and the copyright question are separate, and conflating them is the most consequential error founders make when assessing their AI-generated IP.

US copyright law requires human authorship. The D.C. Circuit's decision in Thaler v. Perlmutter (2025), left standing after the Supreme Court declined review in March 2026, established that purely machine-generated outputs are not copyrightable. The US Copyright Office's January 2025 report confirmed that using AI as a tool does not forfeit that protection, but the human contribution has to be perceptible and creative. Typing a prompt and accepting the output does not meet that bar.

That legal baseline has been further complicated by what is happening in training-data litigation. The June 2025 summary judgment ruling in Bartz v. Anthropic held that training on lawfully purchased books is "quintessentially transformative" fair use, but downloading pirated copies from sites like LibGen is not. Anthropic's subsequent $1.5 billion proposed settlement resolved the resulting damages claims. A vendor with documented, lawful training data provenance is a fundamentally different diligence conversation than one without. This means the vendor your product is built on carries its own exposure, and founders are increasingly expected to have an answer for it. 

In practice, this means that both layers of the copyright question (i.e. your own outputs and your vendor's training data) require an affirmative answer before a capital raise. For your own outputs, a fully automated pipeline with no human selection or editorial input produces work that no one owns. This means anyone can copy it. Your vendor's contractual assignment gives you rights to use it, but it cannot assign copyright that does not exist.

The best fix is proper documentation. Prompt iteration, output-selection criteria, substantive human editing, and human-authored structure surrounding AI-generated portions are what create registrable, protectable work. A prompt log and an editorial review process are not expensive to build, but an IP rep that falls apart under scrutiny is. 

Where IP indemnity starts and stops

IP indemnity is the clause founders point to when an enterprise customer asks whether the outputs are safe to use. It is worth knowing exactly what it covers before that conversation happens. 

The major vendors have expanded IP indemnity significantly since 2023. OpenAI's Copyright Shield, Anthropic's commercial indemnity, Microsoft's Customer Copyright Commitment, and Google's generated-output indemnity all cover infringement claims arising from model outputs. The coverage is real, and it is meaningfully better than what existed two years ago.

The carve-outs are where most founders' actual situations fall outside the coverage. OpenAI and Anthropic's indemnity does not apply where safety or filtering features were disabled, where the output was modified or combined with non-vendor materials, where the inputs or fine-tuning files were not rights-cleared, or where the claim is for trademark use in commerce. Anthropic explicitly excludes patent claims, and Azure voids its Customer Copyright Commitment if specific mitigations like Prompt Shield, protected material detection, and abuse monitoring are disabled.

Most founders building production AI products are operating in conditions that touch at least one of these carve-outs. Perhaps the output was modified, the fine-tuning data provenance is uncertain, or certain filters were adjusted to reduce latency. Under these conditions, the indemnity that exists on paper may not apply to the specific product being shipped.

The gap this creates is most visible when a founder tries to pass that indemnity downstream. Enterprise customers increasingly require IP warranties in master service agreements, and a founder who makes that warranty assuming their vendor's indemnity backs it is carrying liability their vendor may not pick up. A sophisticated buyer's counsel will ask for the specific indemnity language. What the vendor covers and what the founder warranted are rarely identical, and that difference becomes the founder's legal exposure. 

Vendor tier matters here too. Below roughly $100,000 in annual vendor spend, the indemnity terms are generally non-negotiable. Above that threshold, the indemnity scope, zero-data-retention provisions, and audit rights become live negotiation points in enterprise agreements. AWS Bedrock layers its own uncapped indemnity over selected Amazon-owned models, which produces a different risk stack than using the same underlying model through a direct vendor API.

A framework for founders

The gap between what a founder's IP structure appears to be and what the contracts actually say tends to surface unexpectedly down the line, when a founder realizes the IP structure they have been describing does not match what the contracts actually say.

Knowing which decisions require a lawyer, which require careful self-review, and which are genuinely straightforward is what keeps this gap from opening in the first place. Our Tech Founder's DIY Legal Guide offers a general framework for how to approach these decisions, which we’ve translated to some AI vendor-contract specific guidance below.

What founders can handle themselves

You don't need a lawyer to read through your vendor agreements and build proper documentation processes around your AI workflows. 

Start with the three clauses that matter most: the output assignment provision, the no-training commitment, and the competing-model restriction. In most major vendor agreements these appear in the intellectual property and restrictions sections. Also, confirm you're on a paid API or business tier for any workflow touching customer data. Consumer-tier tools have materially different training defaults and don't belong in production workflows.

The copyright piece is also self-serve, even though most founders skip this. Keep records of how you prompt, how you select and edit outputs, and how human-authored content surrounds the AI-generated portions. That documentation is what creates a registrable copyright position. Without it, the vendor's assignment gives you contractual rights to outputs that nobody legally owns.

Finally, write a one-page internal policy prohibiting employees from putting customer data into consumer-tier tools. Most companies haven't done it, and the consumer-tier training defaults mean it's only a matter of time before sensitive data ends up somewhere it shouldn't.

Where it gets complicated

Legal support becomes important when the decisions you're making have compounding consequences that aren't visible at the time you're making them.

Fine-tuning on vendor infrastructure is a good example. You can set it up yourself, but before you pitch it as proprietary technology to an investor, understand that what you have is a hosted endpoint on someone else's infrastructure, not a transferable IP asset. Those are different things and experienced investors will know the difference.

Founders can also get into trouble when they give IP warranties to enterprise customers. Map what you're promising against your vendor's actual indemnity coverage before the contract is signed. The carve-outs covered earlier apply to most production configurations, and warranting something your vendor won't back is a liability you're taking on personally. 

If your product involves training a smaller model on outputs from a frontier model, find the competing-model restriction in every vendor agreement you're building on before the pipeline is built. Every major vendor has one, the exceptions are narrower than they look, and rearchitecting after the fact is expensive. 

While not applicable to every AI builder, it’s also worth calling out EU AI Act exposure. If you have EU customers, the Act applies to your product regardless of where your company is incorporated. The first question to answer is whether your system qualifies as high-risk under Annex III, because the answer determines the documentation and transparency obligations you're subject to. 

Where expert counsel becomes mandatory

Outside counsel earns its place when the legal decisions you're making are specific enough that getting them wrong has a defined, irreversible cost.

Raising a Series A is the most common trigger. Investors and their counsel now ask specific questions about vendor tier, output ownership structure, indemnity carve-outs, and EU AI Act status as a matter of course. Having a memo that addresses each of those before the process starts means you're answering from a position of clarity rather than reconstructing your IP stack under deadline.

M&A is similar. AI-specific reps and warranties covering training-data provenance, model licensing, and output ownership are now standard in acquisition diligence. Getting those reps right requires someone who can read your vendor agreements and your customer contracts at the same time and identify where they don't align. That is not a solo exercise.

Enterprise contracting with customers in financial services, healthcare, or government adds another layer. The IP warranty, indemnity, and AI Act conformity obligations in those agreements interact with each other in ways that are easy to miss if you're negotiating them separately. Serotonin Legal works through exactly this kind of stack for companies at this stage.

Final Thoughts

AI vendor terms get amended quickly and quietly. The major vendors changed their agreements multiple times between 2023 and 2026, and the litigation repricing what indemnity actually covers is still moving. Founders who have a clear picture of their vendor stack, their copyright posture, and what they have warranted downstream can close enterprise deals and raise capital without the legal review becoming the bottleneck.

If any of the situations in the framework above are live for your company, getting specific about which ones and what the exposure looks like is worth doing before the next major conversation, not during it.

FAQs

Who owns AI generated content?

The answer has three layers that must be addressed separately. Contractually: every major AI vendor (OpenAI, Anthropic, Google, Azure, AWS) now assigns outputs to paying customers in their business or API tier agreements. As a copyright matter, outputs are only protectable if there is sufficient human authorship. This means pure machine outputs are not copyrightable under current US law regardless of what the contract says. Regarding indemnity, vendors cover infringement claims arising from outputs, but subject to carve-outs (disabled filters, modified outputs, uncleared inputs, trademark claims) that apply to many real-world product configurations. Treating the ownership clause in an AI vendor contract as the complete answer is a very common and expensive mistake founders make.

Can you copyright AI generated content?

It depends on how much human judgment actually shaped the output. Pure machine outputs aren't copyrightable under current US law, regardless of what your vendor contract says. The Copyright Office's January 2025 guidance confirmed that using AI as a tool doesn't forfeit protection, but typing a prompt and accepting the result doesn't meet the human authorship bar either. What does meet it is documented prompt development, substantive editing, selection among outputs, and human-authored structure around the AI-generated portions. A fully automated pipeline with no editorial layer produces work nobody owns. Build the documentation process before you need to prove it exists.

What happens if my AI vendor uses my data for training?

If they do, your proprietary prompts, outputs, and any customer data you've passed through the API could end up influencing a model that your competitors use too. That's the actual exposure. But in practice, the major vendors don't train on paid API or business-tier data by default. Anthropic's commercial terms contain a flat prohibition. OpenAI requires an explicit opt-in. Google's paid Gemini API defaults to no training. The risk is consumer-tier tools like ChatGPT Free and Claude.ai free being used by employees with sensitive or customer data, not an enterprise API. Those tiers have materially different defaults and don't belong in any production workflow.

Who owns a fine-tuned AI model?

Not you, at least not in the way most founders assume. When you fine-tune on OpenAI's or Anthropic's hosted infrastructure, you get a vendor-hosted inference endpoint, not the weights. The training data and prompts are yours. The deployed model lives on their servers and can't be transferred or self-hosted. If portability matters to your product or your IP story, build on open-weight models like Llama or Mistral that you can run on your own infrastructure. This comes up almost every time a pitch deck describes a proprietary fine-tuned model as a core IP asset.

What should I look for in an AI vendor contract IP clause?

There are five sections worth reading before you sign. The output assignment clause tells you whether outputs are actually assigned to you unconditionally. The no-training provision tells you whether your data is protected by default or requires an opt-in. The competing-model restriction is where distillation and synthetic-data strategies can become a contract breach. The indemnity carve-outs tell you which conditions void the coverage you're counting on. The fine-tuned model provision tells you whether you get weights or a hosted endpoint. Most founders read the first one and stop. The other four are where the actual IP position lives.

Curious to learn more about Serotonin Legal? —

Get in Touch

Who Owns AI Generated Content?

What Your Vendor Contract Actually Says

by

Teddy Ellison

AI & Data Agreements

Summary

Contractual ownership, copyright protection, and IP indemnity are three different things. Your vendor contract addresses the first. The second depends on how much human judgment went into the output. The third has carve-outs specific enough that most production AI products fall outside the coverage in at least one place. Conflating them is the most common and most expensive mistake founders make with AI vendor agreements.

AI vendor contracts have changed substantially over the past few years. OpenAI, Google, Microsoft, and AWS have all moved toward broader output assignment and more explicit IP indemnity, and the contractual position for paying customers is better than it was back in 2023. 

The problem is that the contractual position is only one of three big things that matter. Too many founders are signing enterprise AI agreements without a clear picture of what the contract actually covers and where it stops. As a result, a meaningful number of them are already in breach of the competing-model restrictions in the agreements they are building on, even as they make IP representations to customers, and raise capital. 

To understand where your actual IP exposure sits, it’s important to understand that contractual ownership, copyright protection, and IP indemnity are three different things. The vendor assigns the first. The second depends on how your product was built. The third often looks like protection until you read what it excludes. 

Let’s unpack the implications of each, and how founders can position themselves best before signing off of their next AI vendor contract.

What most contracts say you own

The output assignment language in the major vendor agreements is now fairly consistent. OpenAI's Services Agreement assigns to the customer all of OpenAI's right, title, and interest in outputs. Anthropic's Commercial Terms do the same, with the additional commitment that Anthropic may not train models on customer content from paid services. Google's Gemini API, Azure OpenAI, and AWS Bedrock all follow similar structures for paying customers.

This assignment means your vendor is not in a position to claim ownership of what your product generates. But this does not mean that you have copyright protection in those outputs, or that the vendor will defend you if the outputs turn out to infringe something else.

The clause founders miss most often is the competing-model restriction, not the assignment clause. As of June 2026, OpenAI’s terms of service prohibits using outputs to develop AI models that compete with OpenAI's products and services, with a Permitted Exception for internal embeddings and classifiers, and fine-tuning of OpenAI's own models. Building a smaller proprietary model on synthetic data generated by GPT sits outside that exception.

Anthropic goes further by prohibiting access to all services to build a competing product, not just outputs. A lot of founders are building AI products with this kind of architecture right now, and many of them are unwittingly in breach of one or both contracts. Common examples of this include using frontier model outputs as synthetic training data for a smaller proprietary model and building distillation pipelines on top of GPT or Claude to reduce inference costs at scale. Both are breach, although neither looks like it on the surface. 

A second category that routinely surprises founders involves fine-tuned models. When you fine-tune a model on OpenAI's or Anthropic's infrastructure, you do not receive the weights. You receive a vendor-hosted inference endpoint, meaning the model isn’t portable. This means you own the training data and the prompts, but the deployed artifact lives on someone else's infrastructure and cannot be transferred. A pitch deck that describes a "proprietary fine-tuned model" as the core IP asset should expect this question in a Series A diligence process, because a thorough investor's counsel will ask it.

What copyright law actually protects

The contract question and the copyright question are separate, and conflating them is the most consequential error founders make when assessing their AI-generated IP.

US copyright law requires human authorship. The D.C. Circuit's decision in Thaler v. Perlmutter (2025), left standing after the Supreme Court declined review in March 2026, established that purely machine-generated outputs are not copyrightable. The US Copyright Office's January 2025 report confirmed that using AI as a tool does not forfeit that protection, but the human contribution has to be perceptible and creative. Typing a prompt and accepting the output does not meet that bar.

That legal baseline has been further complicated by what is happening in training-data litigation. The June 2025 summary judgment ruling in Bartz v. Anthropic held that training on lawfully purchased books is "quintessentially transformative" fair use, but downloading pirated copies from sites like LibGen is not. Anthropic's subsequent $1.5 billion proposed settlement resolved the resulting damages claims. A vendor with documented, lawful training data provenance is a fundamentally different diligence conversation than one without. This means the vendor your product is built on carries its own exposure, and founders are increasingly expected to have an answer for it. 

In practice, this means that both layers of the copyright question (i.e. your own outputs and your vendor's training data) require an affirmative answer before a capital raise. For your own outputs, a fully automated pipeline with no human selection or editorial input produces work that no one owns. This means anyone can copy it. Your vendor's contractual assignment gives you rights to use it, but it cannot assign copyright that does not exist.

The best fix is proper documentation. Prompt iteration, output-selection criteria, substantive human editing, and human-authored structure surrounding AI-generated portions are what create registrable, protectable work. A prompt log and an editorial review process are not expensive to build, but an IP rep that falls apart under scrutiny is. 

Where IP indemnity starts and stops

IP indemnity is the clause founders point to when an enterprise customer asks whether the outputs are safe to use. It is worth knowing exactly what it covers before that conversation happens. 

The major vendors have expanded IP indemnity significantly since 2023. OpenAI's Copyright Shield, Anthropic's commercial indemnity, Microsoft's Customer Copyright Commitment, and Google's generated-output indemnity all cover infringement claims arising from model outputs. The coverage is real, and it is meaningfully better than what existed two years ago.

The carve-outs are where most founders' actual situations fall outside the coverage. OpenAI and Anthropic's indemnity does not apply where safety or filtering features were disabled, where the output was modified or combined with non-vendor materials, where the inputs or fine-tuning files were not rights-cleared, or where the claim is for trademark use in commerce. Anthropic explicitly excludes patent claims, and Azure voids its Customer Copyright Commitment if specific mitigations like Prompt Shield, protected material detection, and abuse monitoring are disabled.

Most founders building production AI products are operating in conditions that touch at least one of these carve-outs. Perhaps the output was modified, the fine-tuning data provenance is uncertain, or certain filters were adjusted to reduce latency. Under these conditions, the indemnity that exists on paper may not apply to the specific product being shipped.

The gap this creates is most visible when a founder tries to pass that indemnity downstream. Enterprise customers increasingly require IP warranties in master service agreements, and a founder who makes that warranty assuming their vendor's indemnity backs it is carrying liability their vendor may not pick up. A sophisticated buyer's counsel will ask for the specific indemnity language. What the vendor covers and what the founder warranted are rarely identical, and that difference becomes the founder's legal exposure. 

Vendor tier matters here too. Below roughly $100,000 in annual vendor spend, the indemnity terms are generally non-negotiable. Above that threshold, the indemnity scope, zero-data-retention provisions, and audit rights become live negotiation points in enterprise agreements. AWS Bedrock layers its own uncapped indemnity over selected Amazon-owned models, which produces a different risk stack than using the same underlying model through a direct vendor API.

A framework for founders

The gap between what a founder's IP structure appears to be and what the contracts actually say tends to surface unexpectedly down the line, when a founder realizes the IP structure they have been describing does not match what the contracts actually say.

Knowing which decisions require a lawyer, which require careful self-review, and which are genuinely straightforward is what keeps this gap from opening in the first place. Our Tech Founder's DIY Legal Guide offers a general framework for how to approach these decisions, which we’ve translated to some AI vendor-contract specific guidance below.

What founders can handle themselves

You don't need a lawyer to read through your vendor agreements and build proper documentation processes around your AI workflows. 

Start with the three clauses that matter most: the output assignment provision, the no-training commitment, and the competing-model restriction. In most major vendor agreements these appear in the intellectual property and restrictions sections. Also, confirm you're on a paid API or business tier for any workflow touching customer data. Consumer-tier tools have materially different training defaults and don't belong in production workflows.

The copyright piece is also self-serve, even though most founders skip this. Keep records of how you prompt, how you select and edit outputs, and how human-authored content surrounds the AI-generated portions. That documentation is what creates a registrable copyright position. Without it, the vendor's assignment gives you contractual rights to outputs that nobody legally owns.

Finally, write a one-page internal policy prohibiting employees from putting customer data into consumer-tier tools. Most companies haven't done it, and the consumer-tier training defaults mean it's only a matter of time before sensitive data ends up somewhere it shouldn't.

Where it gets complicated

Legal support becomes important when the decisions you're making have compounding consequences that aren't visible at the time you're making them.

Fine-tuning on vendor infrastructure is a good example. You can set it up yourself, but before you pitch it as proprietary technology to an investor, understand that what you have is a hosted endpoint on someone else's infrastructure, not a transferable IP asset. Those are different things and experienced investors will know the difference.

Founders can also get into trouble when they give IP warranties to enterprise customers. Map what you're promising against your vendor's actual indemnity coverage before the contract is signed. The carve-outs covered earlier apply to most production configurations, and warranting something your vendor won't back is a liability you're taking on personally. 

If your product involves training a smaller model on outputs from a frontier model, find the competing-model restriction in every vendor agreement you're building on before the pipeline is built. Every major vendor has one, the exceptions are narrower than they look, and rearchitecting after the fact is expensive. 

While not applicable to every AI builder, it’s also worth calling out EU AI Act exposure. If you have EU customers, the Act applies to your product regardless of where your company is incorporated. The first question to answer is whether your system qualifies as high-risk under Annex III, because the answer determines the documentation and transparency obligations you're subject to. 

Where expert counsel becomes mandatory

Outside counsel earns its place when the legal decisions you're making are specific enough that getting them wrong has a defined, irreversible cost.

Raising a Series A is the most common trigger. Investors and their counsel now ask specific questions about vendor tier, output ownership structure, indemnity carve-outs, and EU AI Act status as a matter of course. Having a memo that addresses each of those before the process starts means you're answering from a position of clarity rather than reconstructing your IP stack under deadline.

M&A is similar. AI-specific reps and warranties covering training-data provenance, model licensing, and output ownership are now standard in acquisition diligence. Getting those reps right requires someone who can read your vendor agreements and your customer contracts at the same time and identify where they don't align. That is not a solo exercise.

Enterprise contracting with customers in financial services, healthcare, or government adds another layer. The IP warranty, indemnity, and AI Act conformity obligations in those agreements interact with each other in ways that are easy to miss if you're negotiating them separately. Serotonin Legal works through exactly this kind of stack for companies at this stage.

Final Thoughts

AI vendor terms get amended quickly and quietly. The major vendors changed their agreements multiple times between 2023 and 2026, and the litigation repricing what indemnity actually covers is still moving. Founders who have a clear picture of their vendor stack, their copyright posture, and what they have warranted downstream can close enterprise deals and raise capital without the legal review becoming the bottleneck.

If any of the situations in the framework above are live for your company, getting specific about which ones and what the exposure looks like is worth doing before the next major conversation, not during it.

FAQs

Who owns AI generated content?

The answer has three layers that must be addressed separately. Contractually: every major AI vendor (OpenAI, Anthropic, Google, Azure, AWS) now assigns outputs to paying customers in their business or API tier agreements. As a copyright matter, outputs are only protectable if there is sufficient human authorship. This means pure machine outputs are not copyrightable under current US law regardless of what the contract says. Regarding indemnity, vendors cover infringement claims arising from outputs, but subject to carve-outs (disabled filters, modified outputs, uncleared inputs, trademark claims) that apply to many real-world product configurations. Treating the ownership clause in an AI vendor contract as the complete answer is a very common and expensive mistake founders make.

Can you copyright AI generated content?

It depends on how much human judgment actually shaped the output. Pure machine outputs aren't copyrightable under current US law, regardless of what your vendor contract says. The Copyright Office's January 2025 guidance confirmed that using AI as a tool doesn't forfeit protection, but typing a prompt and accepting the result doesn't meet the human authorship bar either. What does meet it is documented prompt development, substantive editing, selection among outputs, and human-authored structure around the AI-generated portions. A fully automated pipeline with no editorial layer produces work nobody owns. Build the documentation process before you need to prove it exists.

What happens if my AI vendor uses my data for training?

If they do, your proprietary prompts, outputs, and any customer data you've passed through the API could end up influencing a model that your competitors use too. That's the actual exposure. But in practice, the major vendors don't train on paid API or business-tier data by default. Anthropic's commercial terms contain a flat prohibition. OpenAI requires an explicit opt-in. Google's paid Gemini API defaults to no training. The risk is consumer-tier tools like ChatGPT Free and Claude.ai free being used by employees with sensitive or customer data, not an enterprise API. Those tiers have materially different defaults and don't belong in any production workflow.

Who owns a fine-tuned AI model?

Not you, at least not in the way most founders assume. When you fine-tune on OpenAI's or Anthropic's hosted infrastructure, you get a vendor-hosted inference endpoint, not the weights. The training data and prompts are yours. The deployed model lives on their servers and can't be transferred or self-hosted. If portability matters to your product or your IP story, build on open-weight models like Llama or Mistral that you can run on your own infrastructure. This comes up almost every time a pitch deck describes a proprietary fine-tuned model as a core IP asset.

What should I look for in an AI vendor contract IP clause?

There are five sections worth reading before you sign. The output assignment clause tells you whether outputs are actually assigned to you unconditionally. The no-training provision tells you whether your data is protected by default or requires an opt-in. The competing-model restriction is where distillation and synthetic-data strategies can become a contract breach. The indemnity carve-outs tell you which conditions void the coverage you're counting on. The fine-tuned model provision tells you whether you get weights or a hosted endpoint. Most founders read the first one and stop. The other four are where the actual IP position lives.

Curious to learn more about Serotonin Legal?

Get in Touch