Table of contents
Most teams that adopt AI in healthcare, legal, or financial services pick one provider — usually whichever one their counsel could get a BAA signed with first — and build everything against it. It’s the path of least resistance: one vendor security review, one contract, one integration. The trade‑off is invisible until the day it isn’t.
The invisible trade‑off has three parts. First: when your single provider has an outage, the AI features your team depends on stop working — and your team is now stuck mid‑task with no fallback. Second: when a different provider ships a meaningfully better model six months later (which they will), you’ve architected yourself into a corner — switching means a new BAA, a new security review, a new audit story to tell. Third: a provider that knows you can’t leave has no reason to keep its pricing honest, and no reason to let you match each job to the cheapest model that can do it.
All three are solvable. They’re solvable by not running your own multi‑provider rotation — and instead consuming AI through a layer that already does.
Provider outages are not hypothetical
Anthropic, OpenAI, and Google all run public status pages. The reason they run public status pages is that they have incidents. Anthropic’s history and OpenAI’s history are both readily browsable, and both show a steady cadence of degraded performance, elevated error rates, and full outages. In a typical month each provider has at least one incident that would visibly affect a customer relying on them for production traffic.[1]
For a consumer asking ChatGPT a question, an outage is annoying — they try again in twenty minutes. For a clinical operations team running prior‑authorization drafting through an AI workflow during business hours, an outage is a workflow stoppage. The intake team can’t summarize charts. The billing team can’t pull clinical justification language. The work piles up until the provider recovers.
The fix is conceptually simple: when your default model is unavailable, route to a different one. The problem is that “a different one” almost always means “a different provider’s model.” Claude Sonnet and Claude Haiku ride on the same Anthropic infrastructure; when Anthropic has a major incident they’re both affected. Real failover means cross‑provider failover.
Cross‑provider failover is a compliance problem if you DIY it
If you wanted to run your own multi‑provider rotation today, here’s what you’d need:
- A Business Associate Agreement with Anthropic, signed and current.
- A Business Associate Agreement with OpenAI (via Azure or directly through their enterprise API), signed and current.
- A BAA with Google Cloud if you wanted Gemini in the mix.
- A way to route requests at runtime based on which provider is healthy right now — and the audit log to prove which provider handled which request.
- A documented incident‑response procedure for “provider X had an outage so PHI for request Y went to provider Z instead,” because the auditor is going to ask.
- A way to keep all of those agreements in sync as each provider updates their data processing terms (which they do, usually with 30 days’ notice).
Most regulated organizations don’t have a single BAA signed with one frontier AI provider yet. The number that have BAAs signed with all three, and have built the audit infrastructure to route between them safely, is vanishingly small. However, the move toward multi-model diversity is already visible in the broader market: 37% of enterprises now use five or more AI models in production to avoid the exact lock-in risks described above.[2] It’s not that it’s impossible — it’s that the compliance, legal, and engineering work to make it work is more than the AI workflows themselves are worth.
This is the gap that a compliance platform fills. When HASP signs one BAA with the customer, our agreement covers the entire provider rotation underneath. The customer doesn’t need to hold a BAA with Anthropic or OpenAI directly — they hold one with us, and our BAA names every inference sub‑processor we route to. Cross‑provider failover happens under the same compliance umbrella that covered the primary request, with the same audit chain, the same PHI handling, the same access controls.
The compliance point isn’t that fallback itself is a compliance feature. It isn’t — fallback is a resilience feature. The compliance point is that for most teams, fallback is only available if a compliance platform makes it available, because DIY multi‑BAA juggling is not a serious option.
Picking your default — and changing it — should be a setting, not a project
The other half of single‑provider lock‑in is harder to see day to day but matters more over the long run. The frontier of model quality moves. A year ago the answer for clinical summarization was clearly one model; six months from now it will clearly be another. If your AI stack is built directly against one provider’s API, swapping defaults means a vendor change — new contract, new security review, new integration testing, new compliance documentation.
If your AI stack is built against a compliance platform, swapping defaults is a dropdown. The HASP admin who decides “we want our team on the new GPT‑5 instead of Sonnet starting Monday” changes one setting in /settings/workspace/ai-controls and every surface — Assistant, Studio, Public API, Agent SDK — picks it up on the next call. The provider underneath changes; the BAA, the audit chain, the access controls, the PHI gateway, and the billing relationship don’t.
That dropdown is what model‑agnostic actually means in practice. It isn’t a marketing claim — it’s a property of where the gateway sits in the stack. The provider is a routing detail, not the product.
When a provider raises prices, you are not trapped
Lock‑in is not only a quality problem. It is a pricing problem. A provider that knows your whole stack is built against its API has every reason to let prices drift the wrong way at renewal — and a single‑provider team has no answer but to pay.
Routing every model through one platform changes that. Inference in HASP is billed in a single normalized credit unit, the same unit whether the request lands on Anthropic, OpenAI, or a hosted model. Because the unit doesn’t change with the provider, moving a workload to a better‑priced model is a policy decision, not a re‑architecture project. You change a setting; the billing math stays consistent.
It also means provider pricing improvements reach you. When a provider drops its rates — as seen in 2024 when OpenAI cut the price of GPT-4o by 50% compared to its predecessor — that improvement flows through the credit multiplier to every customer on that model — without anyone renegotiating a contract.[3] The leverage of being able to move work toward the cheaper option is exactly the leverage a single‑provider deployment gives away.
Run a cheap model and a strong model — at the same time
Picking a default is one decision. The other is realizing you do not need the same model for every job.
Providers lead at different tasks, and within a single provider the tiers trade cost against capability. A lightweight model is more than enough for high‑volume, low‑judgment work — classifying messages, extracting fields from a document, tagging records. Spending a frontier‑model rate on that work is waste. Hard reasoning — drafting a clinical justification, analyzing a contract — is where the high‑capability model earns its cost. The productivity impact of choosing correctly is substantial: workers using AI effectively for these tasks report saving between 40 and 60 minutes per day.[4]
A single‑provider, single‑model deployment forces one choice for everything: either overpay on the easy work or under‑resource the hard work. Routing through HASP lets each workload pick its own model — a cheap model for the bulk, a strong model for the reasoning — and all of it still lands on one bill, one audit chain, one BAA.
What “fallback” should look like to an admin
A few decisions worth making explicit, because we made them in HASP and they shape the product.
Fallback should be on by default. The default behavior for a new org is “if your default model is unavailable, route to another one.” That choice doesn’t require any configuration; resilience is the floor, not a feature you remember to turn on.
Admins shouldn’t have to rank models. Forcing every admin to manually order three or four fallback models is a decision burden disguised as a control. We pick a sensible default order — cross‑provider before same‑provider, since same‑provider models go down together — and let power users override it in an advanced panel they almost never need to open.
The audit trail captures every failover event. Every time HASP routes a request to a fallback model instead of the customer’s default, the audit log records both the attempted model and the resolved model, plus the reason (circuit breaker open, upstream status page red). When the compliance officer asks “did we send PHI to OpenAI when we usually send it to Anthropic, and if so why,” there’s a record to answer with.
HASP owns provider health, not the customer. HASP polls each provider’s public status page and runs its own per‑provider circuit breaker on real request outcomes. When a provider degrades, fallback kicks in automatically — the customer’s admin doesn’t have to be paged, doesn’t have to flip a switch, doesn’t have to be awake. Provider health is operational concern for HASP, not a thing every customer team learns to monitor.
The thing most teams underestimate
The framing that gets this wrong is “we’ll just pick a good provider and stick with it.” It’s the framing that worked for picking a CRM in 2015. The frontier of AI quality moves on a timescale that makes that framing obsolete; the rate of provider incidents makes “stick with one” a reliability bet on a market that hasn’t stabilized.
The framing that works is “we’ll pick a platform that handles the provider question for us.” The platform is what stays constant — the BAA, the audit chain, the integration. The provider underneath is what changes, and changes a lot. HASP is built for that — not because we have a strong opinion about which provider should win, but because we don’t think anyone does yet, and an architecture that bets on it is the wrong architecture for regulated AI.
If your team is currently single‑provider — Anthropic, OpenAI, or otherwise — the next outage will tell you whether your stack is built to absorb it or to be stuck with it. The deployments that survive that day without operational pain are the ones that already chose to consume AI through a layer that owns the provider question on their behalf.
Sources
IsDown. AI Status History (2025). Tracking over 290 incidents for OpenAI and 320 for Anthropic since early 2025. isdown.app
Andreessen Horowitz. How 100 Enterprise CIOs Are Building and Buying GenAI in 2025 (2025). Survey finding 37% of enterprises use 5+ models in production. a16z.com
OpenAI. “New models and developer settings,” May 2024. Announcement of GPT-4o pricing and capability updates. openai.com
OpenAI. The State of Enterprise AI 2025 (2024). Survey of 9,000 employees finding 40-60 minutes in average daily time savings. openai.com