ASI or Bust

If you pay as much attention to the world of AI as I do, it’s become impossible to ignore the hype surrounding the imminence of AGI (artificial general intelligence). A lot of researchers at the big AI labs are convinced they already know how to “solve” AGI, and that it’s only a matter of time (and not very much time at that).

The newest big news comes from China, where DeepSeek has released a model name R1, which can be thought of as an open-source replication of OpenAI’s o1.

This comes shortly after OpenAI’s announcement of o1’s successor, o3, which achieved new state-of-the-art performance on a couple of very challenging benchmarks.

So, does this mean the hype is real? Is Skynet about to take over the world? Personally, I remain skeptical of most of the hyperbolic claims we see on a daily basis these days.

For starters, consider Goodhart’s Law. As an ML researcher, if you focus on a particular benchmark, you will often find that mastering that benchmark is easy (relatively speaking), but that once you’ve achieved greatness on that benchmark, your model rarely holds up out-of-distribution. So even though OpenAI’s newest “reasoning” models have impressive SOA results, there’s no guarantee these models would do as well on a completely new dataset of similar difficulty.

Furthermore, there now appears to have been some shenanigans regarding OpenAI’s achievement on FrontierMath:

ASI > AGI

I’ve come around to the view that no benchmark, at least in the way we think about benchmarks today, will ever tell us when we’ve achieved AGI. Smart AI researchers will continue to come up with newer and more challenging benchmarks, and other researchers will continue to master them, even with models that clearly aren’t AGI. Eventually, AGI will be achieved, but no standard benchmark will be able to tell us that it has arrived.

So what are we left with? My belief now is that we will have to produce ASI (artificial super intelligence) before we can actually be sure we’ve reached AGI. Only in retrospect will we able to know with certainty that someone has produced AGI.

This may seem counterintuitive, as ASI must reach higher levels of intelligence than AGI. But I think that testing for ASI will actually be much easier than AGI. How can we do that?

I’ll borrow from theoretical computer science as an analogy: P versus NP problems.

It’s believed that P != NP, but no one has actually been able to prove this. If the conjecture is true, problems that fit into the NP set are easy to verify, but incredibly difficult to solve. So, if someone much smarter than you (an ASI for example), gives you, a dumb human, the answer to a certain type of problem you couldn’t solve, you can still verify the solution is correct.

So now we just need to define a set of problems we believe are solvable, but which no one, not even the smartest among us, has been able to solve. We just need to be able to verify the ASI-generated solution is correct.

And you might’ve already guessed that P vs NP is one of said problems, and probably the one I’d put at the top of the list. Here’s a list of possible problems I asked for from ChatGPT.

My best guess is that we’ll be able to solve these problems with ASI within the next 10 years, though 15 is probably a safer bet. Even that’s not certain, though, and I think we’ll see AI’s that give the appearance of AGI as part of everyday life even before that, even if they aren’t truly intelligent.