Generative AI for product copy: what's working in 2026, and what's still hand-rewritten

A practical look at where retailers have replaced human copywriters with LLMs, where they've quietly reverted, and what the unit-economics actually look like.

Two years after every retailer with a PIM and an OpenAI key announced they were "AI-first on content," the dust has settled enough to see what actually stuck. The short version: LLMs own the long tail. Hero SKUs are still being touched by humans, and a growing minority of merchandisers we spoke to are pulling generated copy back into review queues that they had quietly disbanded in 2024.

This is not the failure story the skeptics wanted, and it's not the clean win the platform vendors keep selling. It's a more boring middle: AI copy works where the cost of being wrong is low, and falls apart where it isn't.

Where it's working: the unglamorous middle of the catalog

The clearest wins are in catalogs above roughly 50,000 SKUs where the long tail had been running on supplier-supplied copy or - worse - nothing. A category lead at a mid-market home goods retailer put it bluntly: "We had 180,000 SKUs and four copywriters. Generated copy on the bottom 80% of the assortment isn't competing with human writing. It's competing with empty fields."

The unit economics there are not subtle. Operators we spoke to are landing somewhere between $0.01 and $0.04 per SKU for a full pass (title, bullets, description, meta) when they batch through a frontier model with prompt caching turned on. A human copywriter at $35/hour writing 8-12 SKUs an hour comes out 100-300x more expensive. Once you accept that the alternative is "no copy," the ROI conversation ends.

Three patterns have emerged as table stakes:

Attribute-grounded prompting: feeding the model the PIM record, not just the product name, cuts hallucination rates dramatically. Retailers who skipped this step in 2024 are the ones now running cleanup projects.
Category-specific prompt libraries: a generic "write product copy" prompt produces generic product copy. Operators report measurable conversion lift only after building per-category prompt templates with tone, length, and feature-priority rules baked in.
Human-in-the-loop only at thresholds: review queues triggered by margin, expected traffic, or brand sensitivity - not by SKU count. One merchant told us they auto-publish below a $40 AOV and route everything above into review. That single rule cut their copy backlog by 70%.

Where retailers are reverting

The reversions are concentrated in three places, and they're consistent across the operators we interviewed.

The first is the PDP hero copy on top-100 SKUs. The math, according to one head of e-commerce we interviewed at a specialty apparel chain, is straightforward: "Our top 100 SKUs do something like 35% of revenue. A human writer costs us maybe 12 hours of work a month to keep those tight. The downside of letting the model drift on those is not worth the saved labor."

The second is regulated and claim-heavy categories - supplements, cosmetics with efficacy language, baby and infant goods, anything with FDA or equivalent oversight. Several retailers have rebuilt manual workflows here after compliance reviews flagged generated language that crossed into claim territory. The model wasn't lying; it was paraphrasing supplier copy in ways that lost the carefully-lawyered hedging.

The third, and least expected, is brand voice on owned-brand and private-label lines. Multiple operators told us they initially generated owned-brand copy alongside everything else, then walked it back after merchandising leadership noticed the house brand had started sounding like every other brand on the site. As one VP of merchandising told us: "The whole point of the private label is that it doesn't sound like the rest of the catalog. We were undoing our own differentiation."

What the unit economics actually look like

The headline number - sub-cent per SKU - is real but incomplete. Operators running this at scale report total cost of ownership closer to $0.08 to $0.15 per SKU once you load in:

Prompt engineering and template maintenance (treat it like a product, not a project)
Eval infrastructure - sampling generated copy for quality drift, especially after model upgrades
Translation and localization layered on top, which roughly doubles the per-SKU cost for each additional locale
Review labor for the 10-20% of SKUs that route to human eyes

That's still an order of magnitude cheaper than human-only, but it's not the "free copy" pitch that platform vendors led with in 2024.

The honest read

The retailers who are happiest with generative copy in 2026 are the ones who stopped framing it as "replacing copywriters" and started framing it as "filling the long tail and freeing your best writers to work on the SKUs that move the business." The ones still complaining are usually the ones who fired the copy team and now can't figure out why their hero PDPs read like everything else on the internet.

The technology works. The org design is what most retailers got wrong.

Retail to See

Generative AI for product copy: what's working in 2026, and what's still hand-rewritten

Where it's working: the unglamorous middle of the catalog

Where retailers are reverting

What the unit economics actually look like

The honest read

Virtual try-on is finally getting good enough to matter for fashion e-commerce

Retail media networks: the margin story behind the hype

Computer vision for shrink: Q1 pilot results are in, and the cost-benefit is messy