Crowdsourced AI benchmarks have serious flaws, some experts say

AI labs are increasingly relying on crowdsourced benchmarking platforms such as Chatbot Arena to probe the…

Debates over AI benchmarking have reached Pokémon

Not even Pokémon is safe from AI benchmarking controversy. Last week, a post on X went…

OpenAI launches program to design new ‘domain-specific’ AI benchmarks

OpenAI, like many AI labs, thinks AI benchmarks are broken. It says it wants to fix…

Meta’s benchmarks for its new AI models are a bit misleading

One of the new flagship AI models Meta released on Saturday, Maverick, ranks second on LM…

XSMO: A Momentum Fund Outperforming Small-Cap Benchmarks

XSMO: A Momentum Fund Outperforming Small-Cap Benchmarks #XSMO #Momentum #Fund #Outperforming #SmallCap #Benchmarks

People are using Super Mario to benchmark AI now

Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario…

Did xAI lie about Grok 3’s benchmarks?

Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out…

This Week in AI: Maybe we should ignore AI benchmarks for now

Welcome to TechCrunch’s regular AI newsletter! We’re going on hiatus for a bit, but you can…

DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks

Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called reasoning model, that…

DeepSeek claims its reasoning model beats OpenAI’s o1 on certain benchmarks

Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called reasoning model, that…

AI isn’t very good at history, new paper finds

AI might excel at certain tasks like coding or generating a podcast. But it struggles to…

AI researcher François Chollet is co-founding a nonprofit to build benchmarks for AGI

Former Google engineer and influential AI researcher François Chollet is co-founding a nonprofit to help develop…

Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024

When a company releases a new AI video generator, it’s not long before someone uses it…

Factor Portfolios and Cap-Weighted Benchmarks: Bridging the Tracking Error Gap

Despite a brief return to normalcy in 2022, equity factor strategies have experienced performance challenges relative…

Navigating Net-Zero Investing Benchmarks, Incentives, and Time Horizons

Many asset owners are adopting net-zero objectives to manage their investment exposure to climate change risk.…