From Rethinking AI Evaluation to a $1.7b Valuation
In a world where trust is becoming the defining currency of the AI era, the founders behind Berkeley-born LMArena are pioneering a new way forward – placing millions of users at the center of AI’s evolution through real-time use cases and feedback.
Challenging the Status Quo
LMArena – a platform that lets users publicly rank AI models based on the quality of results – began as an experiment at UC Berkeley’s Sky Computing Lab.
Originally called Chatbot Arena, the project was co-founded by Anastasios Angelopoulos and Wei-Lin Chiang, who set out to challenge whether traditional AI evaluation methods were truly the best way to shape technology’s future.
What if AI models, like ChatGPT and Gemini, were assessed not by opaque benchmarks, but by the people who actually use them – openly and at scale?
Little did they know, that question was going to supercharge a billion-dollar business.

The Power of Curiosity
For Angelopoulos, the idea was rooted in years of academic work. With a PhD in electrical engineering and computer science, he had long argued that automated benchmarks often failed to capture whether AI systems were actually useful to people in real-world settings. Human judgment, he believed, wasn’t noise – it was the signal.

Paving the Way for Global Scale
But turning that belief into something that could operate at global scale required a complementary skill set. With deep expertise in AI systems and tooling, Angelopoulos teamed up with Chiang to turn a research project into a resilient platform that now allows millions of users to compare AI models in real time.
The Human Touch AI Needed
Today, LMArena serves more than 5 million monthly users across 150 countries, generating roughly 60 million head-to-head AI model comparisons every month.
Those comparisons power a trusted AI leaderboard that ranks the usefulness and reliability of those models across text, code generation, vision, and text-to-image tasks.
Unlike static benchmarks, this human-in-the-loop approach captures the nuanced preferences of real users – how people actually experience AI, rather than how models perform in isolation.

It wasn’t long, however, until that same demand soon extended beyond individual users. LMArena’s human-driven approach proved just as valuable to companies looking for credible, real-world evaluation, leading to the launch of AI Evaluations, its paid product for enterprise model testing.
Within months, the service reached nearly $30 million in annualized revenue, underscoring a growing appetite for independent, human-centered AI assessment.
Where Ideas Break Free – and Possibility Reigns
LMArena’s rise suggests that the next generation of AI infrastructure may not be built behind closed dashboards, but in public, shaped by human judgment and collective feedback.
For founders, perhaps the lesson behind the vision here is deceptively simple – always question the assumptions others accept.
Because at the end of the day, LMArena didn’t invent better benchmarks. It asked whether benchmarks were the right answer at all.





