The honest read on virtual try-on in beauty was, for most of the last five years, that it was a nice piece of customer experience that did not change much commercially. PDP feature, decent engagement rate, marginal lift on conversion, no real effect on returns at category level, hard to justify the implementation cost beyond brand-experience credit.
That read is no longer accurate.
Two things changed in the last 18 months. Perfect Corp and Revieve, the two dominant infrastructure providers, embedded themselves inside the AI shopping interfaces that Ulta and Sephora now route customers through. And the returns data on shade-driven makeup categories, where try-on is deployed well, started moving meaningfully. Not the marginal one or two percent improvement that justified marketing budget. Returns reductions in the 8 to 14 percent range on foundation, concealer and lip categories. That is structural margin.
The combination matters. The discovery surface and the conversion infrastructure are now the same thing. The customer in the AI shopping flow tries a shade on, the system surfaces the match, the customer buys with higher confidence, the return rate drops. The brands inside that loop are compounding. The brands outside that loop are losing share inside the moment that used to belong to swatches on a PDP.
For a makeup brand at £500k-£5m, the question is no longer whether virtual try-on works. It is whether the brand's product data is structured so the try-on layer can surface them. Most brands are still answering that question with "we are on Perfect Corp" or "we have the Sephora app integration" and stopping there. That is not enough.
What "data structured for try-on" actually means
Three layers determine whether a makeup brand appears inside the new visual AI search flow.
The first layer is shade data with structured undertones and references. Not a PDF shade card. Not an image of swatches the customer has to interpret. A structured catalogue entry per shade with named undertone (warm, neutral, cool), named depth band, hex value, comparable shade from at least one reference brand the AI is trained on, and an explicit "best for skin tones in this range" descriptor. The AI uses all of those signals to match the customer's complexion scan against your shade range. If your shade data is unstructured, the system cannot include you in the match.
The second layer is pigment formulation data the AI can render accurately. Try-on works by simulating how the product looks on the customer's actual skin. That simulation needs to know whether the product is sheer, medium or full coverage, whether it has a matte, satin or dewy finish, how it shifts colour against different undertones, and how it interacts with common other-product layering. Brands with this data structured cleanly get a more accurate rendering, which converts better, which earns them more surface time in the algorithm. Brands with marketing copy alone ("buttery soft finish") get a rougher rendering, which converts worse.
The third layer is use-case copy that maps to how customers describe their problem. The AI is asked questions like "what foundation works for combination skin with a hint of redness," "what blush gives a flushed look without orange undertones," "what lipstick works for cool undertones in autumn." Brands whose PDPs explicitly answer those use-case questions in body copy get pulled into the matches. Brands whose copy describes the product without naming the customer situation it solves get filtered out.
Why returns reduction is the metric that matters
The temptation when evaluating virtual try-on is to look at conversion lift first. That is the wrong metric to lead with.
Conversion lift on a try-on enabled SKU is real but moderate. The commercial story is returns reduction. Foundation returns in particular are catastrophic for makeup brand contribution margin. Returned foundation is usually opened, often used, frequently not resellable, and the customer is typically frustrated enough by the wrong-shade experience that the brand also loses the second-purchase opportunity.
A meaningful returns reduction on foundation, concealer, and shade-matched lip is a structural margin improvement that funds the rest of the operating model. Brands deploying try-on well are reporting reductions in the 8 to 14 percent range on those categories. That delta is large enough to change the unit economics of the channel and to justify investment in the data work above.
The brands that are not yet seeing returns reduction at that level are usually deploying the try-on widget without restructuring the shade data behind it. The customer tries on a shade, the rendering is imprecise because the underlying data is incomplete, the match feels wrong, the customer buys the shade the influencer recommended instead, and the return shows up two weeks later anyway. The try-on layer needs the data behind it to deliver the returns improvement that justifies it.
The strategic mistake colour brands keep making
The single recurring mistake we see in colour brands evaluating virtual try-on is treating it as a customer experience initiative rather than a discoverability initiative.
The CX framing puts the project inside the brand or digital team, with a marketing budget and a brand-experience metric. The team picks the prettiest implementation, integrates it on the PDP, runs a launch campaign, reports the engagement rate, and moves on. The data work behind the try-on never gets done because it is not what the project was scoped to do.
The discoverability framing puts the project inside the commercial team, with operational ownership and a returns-and-conversion metric. The team starts with the data audit (shade structure, pigment formulation data, use-case copy), restructures the catalogue, integrates the try-on layer second, and treats the result as a permanent operating capability that needs maintenance the way SEO content does.
The brands compounding on virtual try-on in 2026 are the ones running the second project. The brands stuck reporting flat engagement numbers are running the first.
The practical move this quarter
For a colour brand evaluating its posture, the three-question audit is simple.
First, can you pull a structured shade-data export for your foundation or complexion range that includes undertone, depth, hex, comparable shade from at least one major reference brand, and best-for descriptor per shade? If the answer is no or "kind of," the data layer is the bottleneck regardless of which try-on provider you use.
Second, can a basic text crawl of your top complexion PDPs extract the use-case copy that maps to the questions a customer would actually ask an AI agent? If the copy reads as brand poetry rather than customer use-cases, the AI cannot rank you against the prompt.
Third, what is your current foundation returns rate, and is it being measured cleanly enough that an 8 to 14 percent reduction would show up in the contribution margin report? If you cannot see the metric clearly today, you will not be able to justify the investment in the data work that drives it.
The brands that build this discipline now are the brands that surface in the next 24 months of makeup AI search. The brands that keep treating try-on as a PDP gimmick will keep wondering why their share inside Ulta and Sephora's app is drifting toward competitors with weaker product but stronger data.