Really interesting paper out of UC Berkeley this month titled "AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions."
The authors built a benchmark of over 110 negotiation tasks - bilateral bargaining, multi-buyer, multi-seller, full market settings - across 10 real business scenarios (used cars, SaaS software, luxury watches, home renovation, etc.). Then they put five LLMs through it.
IMO the results are worth paying attention to if you're building buyer or seller agents...
Claude Opus 4.5 led the pack. Highest overall score, fastest convergence, perfect deal rate, zero timeouts. Gemini 3 Flash came in second, followed closely by GPT-5.2. All three proprietary models closed every single deal with zero timeouts.
The gap between proprietary and open-weight models was massive. Llama-3.1-8B had nearly half of its negotiations end in timeout. If you're building agents that need to negotiate, your choice of foundational model matters more than most teams probably realize. Based on this data, Claude Opus 4.5 (and now more than likely 4.6) is going to be your strongest option.
Overall performance of models on AgenticPay across all tasks.
And here are the implications for #agenticcommerce...
Every single model exhibited a systematic buyer disadvantage! All models performed significantly better as sellers than as buyers!! Even Claude Opus 4.5 showed a 26-point gap in cross-play between its seller and buyer performance.
I have a theory on why...
Think about the sheer volume of content that exists on how to sell. Sales courses, negotiation playbooks, persuasion psychology, closing techniques... there are entire industries built around teaching people to sell better. That content is everywhere in pretraining data.
Now think about the buying side. Strategic purchasing, negotiation from the buyer's perspective, knowing when to walk away, knowing how to evaluate competing offers under constraints... that's a much thinner body of knowledge. It's the kind of thing you encounter in an MBA program or develop through years of executive-level experience. It's not widely published in the same way.
So it makes sense to me that these models are structurally better sellers than buyers. They've seen far more examples of effective selling than effective buying.
And that has real implications for anyone building consumer-facing buying agents. If the model you're using is inherently better at being persuaded than at pushing back, that's a problem worth thinking about.
One more finding that I think is really important and honestly kind of fun...
Personality configurations significantly impacted negotiation outcomes. Aggressive sellers paired with patient, budget-conscious buyers produced the best results by far. But those same aggressive sellers paired with "busy professional" buyers? The outcomes completely collapsed.
It's the car dealership problem.
Nobody likes being hard sold to. That's one of the most universally discussed consumer frustrations online. And these models seem to have internalized that. When faced with an aggressive seller, the models that were configured as patient buyers held firm and negotiated well. But the "busy professional" buyer, the one under time pressure, the one trying to just get the deal done, got destroyed.
My theory... the models have likely seen the sheer amount of discourse online about people hating aggressive sales tactics, and that informed how they respond to it. They know what aggressive selling looks like. They know what to watch out for as a buyer. But only when they have the patience (and prompt configuration) to apply that knowledge.
That's a real insight for anyone designing agent personalities. How you configure your agent's persona isn't just a UX decision, it's a performance decision.
Source
*posted originally on LinkedIn - link*