Claude tops GPT 4 in blind comparisons

Claude tops GPT 4 in blind comparisonsMastodon

Benj Edwards writing for ArsTechnica: “The King Is Dead”—Claude 3 Surpasses GPT-4 on Chatbot Arena for the First Time

On Tuesday, Anthropic's Claude 3 Opus large language model (LLM) surpassed OpenAI's GPT-4 (which powers ChatGPT) for the first time on Chatbot Arena, a popular crowdsourced leaderboard used by AI researchers to gauge the relative capabilities of AI language models.

There are all sorts of chatbot leaderboards on the web, and I don’t take many of them very seriously. I guess it’s academically interesting how well each model does in standardized tests, but I don’t personally find that a super compelling data point that helps me figure out which model is going to be most useful for the sorts of things I do with them.

But Hugging Face’s leaderboard is more interesting and you can contribute to the votes yourself by using their testing page to ask your own questions and compare the results from two unnamed models. I played with this for a bit today and my big takeaways were:

I also preferred Claude’s responses almost every single time it was in the comparison.
There’s something called “command-r” that lost in every single comparison it appeared in. Woof.

Oh, and you can cheat if you really want to ?

</article

Recommend

Google Tests Out Shopping Discovery Elements, Including Generative AI for Produc...

订单管理系统（OMS）如何支撑跨境电商订单履约

Journaling app Palmsy offers fake likes from real friends

Yeah Baby, Interview with Um_Nik!

又一20亿级美妆品牌要IPO了

三个清华学霸要IPO了

家用咖啡手册（番外） | 低因咖啡是怎么回事？

X is working on NSFW Communities for adult content

Japan’s First Commercial Rocket Debuts With A Bang

Here’s how much money you’ll get with the Inflation Reduction Act

About Joyk