Who Validates the Validator?

The Design Failures Inside AI Prior Authorization

Apr 30, 2026

This is the extended analysis behind the Pressure Test piece “Same Technology. Different Incentive. Opposite Outcome.” If you haven’t read that one, it covers the structural comparison between Anterior and the CMS WISeR pilot. This piece goes deeper on the incentive mechanics, the regulatory landscape, and the evaluation framework that separates AI prior authorization that works from AI prior authorization that destroys trust.

The setup

Two numbers define the current state of AI prior authorization.

155 seconds. That’s how long Anterior’s platform takes to approve a cancer care authorization at Geisinger. 99.24% clinical accuracy, validated by KLAS. 50 million lives covered. Staff satisfaction above 90%.

15 to 20 days. That’s how long providers in Washington state are waiting for authorization decisions under the CMS WISeR (Wasteful and Inappropriate Services Reduction) Model, a pilot launched January 1, 2026, across six states. UW Medical System reports nearly 100 patients waiting for epidural steroid injections. Procedures that took two weeks now take four to eight.

The technology category is the same. The outcomes are opposite. In the Pressure Test version, I identified three design variables that explain the gap: incentive structure, workflow ownership, and validation transparency. Here I want to go further into each one, map the full competitive landscape, and build the evaluation framework that should exist before any health system or investor touches this category again.

How the money actually works

This is the section that matters most and gets discussed least, because the incentive mechanics are buried in contract structures that don’t make it into press releases.

Anterior’s model: Anterior charges health plans for platform deployment and adoption. Their revenue is tied to the number of lives covered and the volume of authorizations processed. When a claim is approved in 155 seconds, that’s the system working as designed. Anterior’s compensation does not increase when a claim is denied. The economic incentive and the clinical incentive point in the same direction: process accurately, process fast, keep providers satisfied so they keep using the system.

The WISeR model: Under the WISeR pilot, CMS contracts with third-party administrators to review Medicare Part B services. The contractors are compensated based on what CMS calls a share of “averted expenditures.” In practice, this means the contractor receives a percentage of the dollar value of services that are denied or not performed after review. The more claims that do not proceed, the higher the contractor’s compensation.

This is not an obscure detail. It is the structural core of the program. When a contractor’s revenue model rewards denial, every downstream decision — the AI model’s tuning, the physician reviewer’s incentives, the portal’s workflow design, the transparency of the rationale — is shaped by that economic gravity.

One detail makes this sharper. Under WISeR, the AI can only affirm a request. All non-affirmations are reviewed and decided by a board-certified physician. There is a human in the loop on every denial. And yet the Cantwell report documents 15-20 day delays, denials inconsistent with clinical criteria, and no clear rationale provided to providers. The human review layer did not prevent those outcomes. The incentive architecture was strong enough to produce them anyway.

This is worth sitting with. The argument for human-in-the-loop review in AI-assisted clinical decisions is that the human provides a check on the AI’s errors. But if the human reviewer is operating inside a system where the organization’s revenue increases with denials, the human-in-the-loop argument collapses. The physician is not checking the AI. The physician and the AI are both responding to the same incentive structure. The check is structural, not procedural, and the structure rewards denial.

Cohere Health’s model: For comparison, Cohere Health ($200M raised, $90M Series C led by Temasek in May 2025) processes over 12 million prior authorization requests annually and auto-approves up to 90% of them. Their positioning statement is explicit: the technology is designed to accelerate approvals, not to deny care, and denial decisions always remain with a human clinician. Geisinger Health Plan reported a 63% reduction in PA denials and a 15% reduction in total medical expenses after deploying Cohere. That’s a model where reducing denials is the value proposition, not reducing approvals.

EviCore and Carelon (legacy UM): Traditional utilization management firms have operated on shared savings and per-review fee models for decades. The shared savings variant has the same structural problem as WISeR: the vendor’s compensation is tied to the volume of “avoided” spending. The per-review fee model is neutral on outcomes but creates an incentive to maximize review volume. Neither model is new. What is new is applying AI to accelerate the throughput of a system whose incentive structure already rewards denial. AI doesn’t change the incentive. It scales it.

The pattern: The question for any AI prior authorization program is not “does the AI work?” It’s “what does the money reward?” If the money rewards approval speed and clinical accuracy, the AI gets tuned for approval speed and clinical accuracy. If the money rewards averted expenditures, the AI gets tuned to maximize the volume of cases routed to denial. The technology is agnostic. The incentive is not.

The regulatory landscape nobody is reading together

There are at least five regulatory threads touching AI prior authorization right now, and I have not seen anyone read them as a single picture. They should be.

Thread one: CMS-0057-F, the Interoperability and Prior Authorization Final Rule. Released January 17, 2024, with key compliance dates beginning January 1, 2026 (some delayed to January 2027). This rule requires impacted payers (Medicare Advantage organizations, Medicaid managed care plans, CHIP entities, and QHP issuers on the federal exchanges) to implement FHIR-based Prior Authorization APIs, publicly report prior authorization metrics including approval rates and turnaround times, and respond to standard prior auth requests within seven calendar days and urgent requests within 72 hours.

This is the transparency infrastructure that WISeR conspicuously lacks. CMS-0057-F was designed to make prior authorization decisions visible, auditable, and comparable. WISeR operates outside this framework because it applies to traditional Medicare Part B, not to the payer categories covered by CMS-0057-F. The result is that the newest AI prior authorization program in CMS’s portfolio has less transparency than the regulatory standard CMS itself finalized 18 months earlier.

Thread two: The Texas Gold Card law (HB 3459, 2021, amended by HB 3812, 2025). Texas created a prior authorization exemption for physicians who achieve a 90% or higher approval rate on PA requests for a given service over a 12-month evaluation period. The physician earns a “gold card” — an exemption from prior authorization for that service. The law applies to state-regulated health plans (roughly 20% of the Texas market).

Texas is one of the six WISeR states. That means Texas physicians who have earned gold card exemptions from state-regulated plans are simultaneously subject to new CMS prior authorization requirements for the same procedures under traditional Medicare. The state moved to reduce prior authorization burden. The federal pilot moved to add it. Nobody seems to be tracking this conflict publicly. A health system running innovation in Texas needs to understand that their prior auth environment just became more complicated, not less, and that the two policy directions are structurally opposed.

Thread three: The Improving Seniors’ Timely Access to Care Act. This bipartisan legislation has been reintroduced multiple times and would require Medicare Advantage plans to streamline prior authorization, establish electronic prior authorization processes, and make approval criteria publicly available. It has broad support from provider organizations. It has not passed. But its requirements — transparency, electronic processing, public criteria — describe the exact capabilities that are absent from WISeR.

Thread four: State-level AI regulation in healthcare. At least a dozen states have introduced legislation governing AI use in healthcare decisions, including prior authorization. Colorado’s SB 169 (2024) requires disclosure when AI is used in insurance decisions. California, New York, and Illinois have active legislative proposals. The regulatory environment for AI prior auth is fragmenting at the state level at the same time CMS is rolling out a federal pilot with minimal transparency provisions.

Thread five: WISeR itself, and its political status. The pilot launched January 1, 2026, in six states: Arizona, New Jersey, Ohio, Oklahoma, Texas, and Washington. Senator Cantwell’s April 2026 snapshot report, based on Washington State Hospital Association survey data, documents care delays, administrative burden increases, and denials without clear clinical rationale. The pilot’s future is politically contested. Legislative opposition is building. CMS faces pressure from both supporters who see the pilot as necessary cost containment and critics who see it as an AI-driven barrier to care.

Reading these together: CMS is simultaneously (a) requiring payers to make prior authorization more transparent, faster, and electronically accessible under CMS-0057-F, and (b) running a pilot under WISeR that has no public transparency mechanism, produces 15-20 day delays, and compensates contractors based on denial volume. Whether this contradiction is intentional or a byproduct of different CMS divisions operating on different timelines, it creates a confusing environment for health systems trying to build a coherent AI prior authorization strategy. The systems that track all five threads and design their evaluation frameworks accordingly will be better positioned than the ones that track any single thread in isolation.

One caveat on the evidence base: the most detailed public data on WISeR outcomes comes from Washington state, via the Cantwell report and the Washington State Hospital Association survey. The other five states may show different patterns — different contractors, different procedure mixes, different provider responses. The structural argument (incentive design shapes outcomes regardless of the technology) holds across states because the compensation model is the same. But the specific outcome data — the 15-20 day delays, the 100 patients waiting for epidural injections — is documented in one state so far. As more data surfaces from Arizona, New Jersey, Ohio, Oklahoma, and Texas, the picture will either confirm or complicate what Washington is showing.

Thanks for reading Operating in Healthtech by Arvita Tripati! This post is public so feel free to share it.

The Olive AI parallel

Olive AI raised $902 million and reached a $4 billion valuation selling prior authorization automation to health plans. The product was marketed as AI-powered. In practice, it relied heavily on RPA screen-scraping bots and required more human intervention than the pitch acknowledged. Revenue projections were later exposed as fabricated. The company wound down in October 2023 and sold its prior authorization assets to Humata Health.

Three parallels to WISeR are worth noting.

First, the technology description gap. Olive marketed RPA as AI. WISeR describes a program that uses “AI” but the AI can only affirm — every denial goes through a human physician. In both cases, the label “AI” is doing more work than the technology. The marketed description overstates the role of automation in the actual decision process.

Second, the validation gap. Olive’s accuracy and cost projections were not independently verified until KLAS and Axios investigated. WISeR has no published accuracy metric for its clinical determinations, no third-party validation, and no transparency mechanism for providers to understand the basis for a given decision. In both cases, the claims went unchallenged longer than they should have because the evaluation processes in place did not include independent verification.

Third, the incentive structure. Olive’s shared savings model with health plan clients created pressure to demonstrate cost reduction. WISeR’s averted expenditure model creates pressure to maximize denial volume. Different mechanisms, same structural problem: the vendor’s financial incentive diverges from the patient’s clinical interest.

I advised a company building voice-based prior authorization earlier this year. The technology worked. The design questions that determined whether it would survive enterprise procurement were the same ones visible in both the Olive failure and the WISeR pilot: how does the money flow, who validates the claims, and what happens when the system gets it wrong?

The comparative framework

The prior authorization AI category is not a single market. It is at least five distinct deployment models, each with different incentive structures, transparency mechanisms, and failure modes.

Model 1: Payer-side approval acceleration (Anterior, Cohere Health). The AI is deployed by the health plan to speed up approvals, reduce administrative burden, and improve provider satisfaction. Revenue is tied to deployment and adoption, not to denial volume. Clinical accuracy is validated by third parties (KLAS). Transparency is a competitive advantage because providers need to trust the system to use it.

Model 2: Government-contracted cost containment (WISeR / Virtix Health). The AI is deployed by a CMS contractor to reduce Medicare spending. Revenue is tied to averted expenditures. Transparency is not a competitive requirement because the contractor’s buyer is CMS, not the provider. The provider is the subject of the system, not the customer.

Model 3: Legacy utilization management (EviCore, Carelon). Human reviewers using rules-based systems and clinical guidelines. Shared savings or per-review fee models. AI is being added incrementally, but the core process remains human-driven. The incentive structure varies by contract but shared savings models have the same structural problem as WISeR at lower throughput.

Model 4: Provider-side exemption (Texas Gold Card, Humana Gold Card program). The approach exempts high-performing providers from PA requirements entirely, based on historical approval rates. No AI is involved in the exemption decision — it’s a performance-based waiver. The incentive structure rewards clinical accuracy over time. The limitation is that it only works for providers with sufficient volume and history, and it only covers state-regulated or MA plans, not traditional Medicare.

Model 5: Platform integration (Epic, Oracle Health). EHR vendors building prior authorization workflow tools directly into the clinical system. The AI component varies. The value proposition is workflow integration, not clinical decision-making. The incentive structure is neutral on outcomes because the EHR vendor is paid for the platform, not for the authorization result.

Each model has a different answer to the three design variables I identified in the Pressure Test piece. The evaluation mistake most health systems make is treating “AI prior authorization” as a single category and evaluating all entrants against the same criteria. The five models above require five different evaluation frameworks.

Where to start depends on where you sit. If your institution is in a WISeR state, the immediate move is twofold: begin Model 4 evaluation (can your high-approval-rate physicians qualify for state-level gold card exemptions that reduce your PA burden on the commercial side?) while simultaneously running Model 1 vendor outreach (who can offer a payer-side alternative that your MA and managed care partners would adopt?). If you are not in a WISeR state, your first move is the criteria definition exercise from the CINO section below, because it applies regardless of which model you eventually evaluate, and having it documented before CMS expands the pilot gives you months of advantage.

The budget math matters here. I wrote in an earlier Pressure Test piece about the Sage Growth data showing that 51% of health system C-suite leaders now require 110% or better ROI within 18 months. That compressed payback window applies to AI prior auth evaluation too. Model 1 vendors (Anterior, Cohere) can demonstrate ROI in that window because reduced PA turnaround time, lower denial rates, and decreased administrative staffing needs all translate to dollar values a CFO recognizes. Model 2 (government-contracted, WISeR-style) is imposed, not purchased, so the ROI question is moot — the health system bears the cost without choosing the vendor. Model 4 (Gold Card exemption) has the fastest payback because the cost is near zero and the savings are immediate, but coverage is limited to state-regulated plans. Model 5 (EHR platform integration) may already be in your existing contract and budgeted, making the incremental cost conversation simpler. Mapping the five models against your CFO’s 18-month threshold before the first vendor call is the difference between running an evaluation and running a budget exercise that happens to involve vendors.

For founders building in this category: the WISeR backlash is creating a positioning window, but it won’t last. The health systems evaluating AI prior auth vendors right now are doing so with WISeR as a negative reference point. The founders who walk into the next enterprise conversation with a pre-built brief — here is our compensation model, here is why it does not reward denial, here is our KLAS-validated accuracy, here is how we compare to each of the five models above — have an advantage this quarter. If you are building in Model 1 and can demonstrate the anti-WISeR case with data, your positioning is stronger right now than it will be in six months when the backlash normalizes and the category comparison becomes routine.

Six diligence questions

These apply whether you are a health system evaluating a vendor, an investor evaluating a company, or a policy team evaluating a program.

1. How is the vendor or contractor compensated, and does the compensation model create an incentive to deny, delay, or affirm?

Map the money flow. If the vendor’s revenue increases when claims are denied or “averted,” the system will produce more denials over time regardless of the technology’s capability. This is not a prediction. It is a description of how incentive structures work.

2. What is the AI’s actual decision authority — can it deny, or only affirm — and what is the human review process for non-affirmations?

The answer “a human reviews every denial” is not sufficient. Ask who the human works for, what their compensation structure rewards, how many cases they review per hour, and what happens when they disagree with the AI’s routing. A human-in-the-loop who is reviewing 40 cases per hour inside an organization compensated for averted expenditures is not providing meaningful clinical oversight.

3. What transparency mechanism exists for providers to understand the rationale behind a given decision?

If the answer is “the provider can call a phone number” or “the provider can log into a portal,” ask what information is available through those channels. A portal that shows the decision but not the rationale is not a transparency mechanism. It is a notification system.

4. What is the average time from submission to final determination, broken out by procedure type, and how does that compare to the pre-program baseline?

Averages conceal distribution. Ask for the median and the 90th percentile. A program that resolves 80% of cases in 24 hours and takes 30 days on the remaining 20% will report a favorable average that conceals a serious access problem.

5. What validation data exists on the accuracy of the program’s clinical determinations, and who conducted the validation?

Self-reported accuracy metrics from the vendor are starting points, not evidence. Ask whether KLAS, an academic institution, or an independent auditor has validated the clinical accuracy claims. If no independent validation exists, ask why and what the timeline is. Anterior and Cohere both submit to KLAS validation. If a vendor in this category does not, that tells you something.

6. What is the appeal process, what is the appeal success rate, and what is the average time to resolution on appeal?

A high appeal success rate is not good news. It means the initial decision process is producing incorrect denials that are only caught when providers invest the time and cost to appeal. The appeal success rate is a measure of the system’s error rate, not its quality.

What this looks like from the CINO seat

If I were running the innovation function at a health system right now, AI prior authorization would be on my board agenda this quarter for three reasons.

First, it’s coming whether you choose it or not. Between CMS-0057-F’s payer requirements, the WISeR pilot, MA plan prior auth automation, and EHR platform features, AI will be involved in your prior authorization workflow within 18 months. The question is whether you define the terms or respond to someone else’s terms.

Second, the WISeR experience is contaminating trust in AI broadly. When clinical staff see AI-assisted prior auth delay pain management for seven weeks, that skepticism doesn’t stay in the prior auth category. It bleeds into every AI evaluation conversation. Your board will ask whether the AI vendor you brought in last year could produce a similar outcome. Having a clear answer — and a clear framework for why your vendors are structurally different from WISeR — is a defensive necessity.

Third, there is a positioning opportunity. The health systems that define their own evaluation criteria, run a competitive process for AI prior auth vendors using the six diligence questions above, and publish their results will set the standard other institutions adopt. That’s a thought leadership position that matters when you’re competing for talent, grant funding, and industry partnerships.

The innovation team that treats this as a compliance exercise will build a committee. The innovation team that treats it as a strategic priority will build an evaluation framework that applies across the entire AI vendor portfolio, not just prior authorization, and will use the framework to earn budget authority from the CFO. Those are different outcomes from the same trigger.

The question underneath

The CMS WISeR pilot, Anterior’s deployment, Cohere’s platform, the Gold Card laws, CMS-0057-F — these are all different answers to the same question: who should decide whether a patient receives a medical service, and what should that decision be based on?

The traditional answer was “a payer, based on medical necessity criteria reviewed by a physician.” The AI-era version of that answer introduces two new variables: the speed of the decision and the incentive structure of the decision-maker.

When the incentive structure rewards denial and the AI scales throughput, you get WISeR. When the incentive structure rewards accuracy and the AI speeds approval, you get Anterior. The technology is not the variable. The design is.

The founders and health system leaders who understand that will shape how this category develops. The ones who treat “AI prior authorization” as a single thing will be shaped by it.

If you’re building in AI prior authorization, evaluating vendors in this category, or running an innovation function at a health system in one of the six WISeR states, I’d be interested in hearing what you’re seeing on the ground. What does your evaluation process actually look like? And for investors: how are you assessing incentive structure risk in companies building in this category? Reply to this email or reach out on LinkedIn.

Sources: Cantwell Senate snapshot report (April 20, 2026), CMS WISeR Model documentation, Virtix Health FAQ, Anterior press releases, Fierce Healthcare, MedCity News, AlleyWatch, CMS-0057-F Final Rule (January 17, 2024), Texas Insurance Code Chapter 4201 Subchapter N (HB 3459, HB 3812), Cohere Health press releases and case studies, KLAS Research, Washington State Hospital Association survey data.

PS: I do product and technical diligence on healthcare AI companies for PE/VC firms, including companies in the prior authorization category. If you’re evaluating a target in this space and want an independent assessment of their incentive structure, validation evidence, and competitive positioning, reach out on LinkedIn or reply here.

Operating in Healthtech by Arvita Tripati

Discussion about this post

Ready for more?