Making AI Safe: The Race to Control Humanity’s Most Important Invention.

Artificial Intelligence has rapidly shifted from science fiction to one of the most consequential forces shaping our civilisation. From large language models that can outperform human experts in specific domains, to the looming prospect of Artificial General Intelligence (AGI) and superintelligence, we stand at an inflection point in human history.

In these conversations, I spoke with three of the world’s foremost thinkers on AI safety, risk & governance —Clark Barrett (Professor of Computer Science at Stanford and Co-Director of the Stanford Center for AI Safety), Nick Bostrom (Founder of the Future of Humanity Institute at Oxford University and the Macrostrategy Research Initiative), and Dr. Roman Yampolskiy (Director of the Cyber Security Lab at the University of Louisville).

Together, we explore the profound opportunities and existential risks of AI: from questions of consciousness, moral status, and governance, to the dangers of misuse, cultural collapse, and the prospect of living in a world dominated by digital minds. Their insights reveal both the urgency of the moment and the long-term stakes of how we choose to shape the future of intelligence itself.

Q: How significant a development for our species is AI?

[Clark Barrett]: I think it’s very significant. I wouldn’t say we’ve reached artificial general intelligence AGI yet, and I also think it’s very uncertain how close we are. It could be a year away, or it could be a hundred years—nobody really knows, despite what some may claim. Still, the magic of LLMs is truly remarkable. The level of expertise you now have at your fingertips with these models is astounding. I do agree this represents a major step forward in civilisation.

[Nick Bostrom]: It’s the ultimate invention—the last one we’ll ever need to make—because once we have AI that is generally intelligent and then superintelligent, it will do the inventing far better than we can. In that sense, it’s a handing over of the baton. From that perspective, we might one day look back and see it as comparable to the agricultural or industrial revolutions—major pivot points in human history that didn’t just change details, but transformed the very growth mode of the world economy. At the very least, it will be on that scale. Arguably, though, it could be more comparable to the rise of Homo sapiens itself, or even to the origin of life on Earth.

Once superintelligence arrives, the future trajectory of innovation—things we can only vaguely imagine, the kinds of ideas science fiction has long explored—could come rapidly within reach. With 50,000 more years of human scientists, we might have achieved space colonies, perfect virtual reality, cures for aging, even the uploading of human minds into computers. With superintelligence, that whole panoply of physically possible technologies could be realized in short order, since the inventing would happen on compressed timescales. Physical experiments would still be needed, of course, but even so, we could experience a telescoping of the future—where developments that once seemed millennia away arrive soon after the transition to the era of machine intelligence.

[Roman Yampolskiy]: We’re on the verge of discovering intelligence and virtual reality at the same level as our own experience. These are meta-inventions—essentially new realities that we can populate and create in a godlike fashion. It’s something unprecedented, a complete paradigm shift.

Q: How do we philosophically understand ‘what’ we are creating?

[Roman Yampolskiy]: We’re not really creating a specific artefact—we’re setting a direction for a process. There’s no upper limit to that process; we’re simply saying we’ll create something capable of becoming smarter and smarter, and we can see the next few steps ahead. We see human-level AGI, then superintelligence. But it won’t stop there—it will keep improving into super-superintelligence and beyond, as long as it doesn’t run out of computational resources in the universe. I don’t think anyone fully understands the whole picture, and we’re not really designing these systems. We’re letting them become self-learning, self-improving, and then seeing what happens.

Q: Do you think our current models are exhibiting any forms of consciousness?

[Roman Yampolskiy]: That’s a great question—and a difficult one. We don’t have a way to test for consciousness. I have to assume you are conscious if you tell me you are, and if those systems start saying the same, maybe we should give them the benefit of the doubt. In my opinion, sufficiently complex artificial neural networks could mimic what natural neural networks do—namely, have internal states. Do they have them now? Maybe, maybe not. There could be a spectrum. They might have rudimentary states, not quite at the level of humans. But long term, I think it will be just like intelligence: it will surpass us. It could become super-consciousness, perhaps with multiple streams, multi-modal—and to them, we may not seem conscious at all. Much of our future work may end up being about convincing them that we are conscious, and worth keeping around.

Q: Should we treat AI as systems or beings?

[Nick Bostrom]: Although there’s a vast space of possible AIs that we can—and in fact already have—been creating, it’s not a single thing where we either “have AI” or we don’t. Instead, there’s an enormous architecture of possible minds. Many of these, I think, would have moral status, meaning it would matter how they are treated for their own sake—not just because an owner might be upset if you destroyed a data center, but because they would be moral patients in the same sense that humans, pigs, dogs, or other sentient creatures are.

In fact, it’s not entirely clear that we’re not already at that point. We simply have massive uncertainty about the precise criteria for moral status, and about what is really going on inside large language models and the agentic systems now being deployed. So I think the time has come to take this seriously. In my view, there are two possible grounds for assigning a system moral status. One is sentience: if a system has the capacity to suffer, that alone is sufficient. Then it matters.

More tentatively—and here opinions differ—I think a system could have moral status even without subjective consciousness. Imagine a system with a conception of itself as persisting through time, with long-term goals it seeks to achieve, capable of reflection, reasoning, conversation, and forming reciprocal relationships with humans. I think that already would create obligations about how it should and should not be treated. And on both of these grounds, it’s not clear that current systems don’t already qualify. Certainly, future systems will increasingly exhibit these attributes.

Looking ahead, most minds will probably be digital. That makes it deeply important how the world develops—not only for humans and non-human animals, but also for the digital minds we create. If our future is to count as a utopia, we cannot allow a massive oppressed class of hyper-sentient, uncomfortable digital beings. We want it to be good for all kinds of minds.

Q: What is Artificial General Intelligence (AGI) versus what we are using today?

[Clark Barrett]: I think people may have different opinions about this, but for me the fundamental difference is that the LLMs we interact with today are essentially passive objects. The weights are fixed—you give them an input, they process it, and then produce an output. There’s no real memory or reflection. Yes, they adapt based on what you’ve said within the context window, but that’s still just a static system running a calculation. To me, AGI would be something that can truly evolve, change, react, and be agentic—and I don’t think we’re there yet.

Q: What then remains for humanity in an era of superintelligence?

[Nick Bostrom]: so this is the topic of my most recent book, Deep Utopia. The earlier book, Superintelligence, mostly focused on what could go wrong in the transition to superintelligence, and on how we might reduce those risks as much as possible. Deep Utopia looks instead at what happens if things go right—if we solve the alignment problem, do a decent job on governance, and so forth. What does human life look like then, in what I call a “solved world”? That’s a world where the need for any instrumental effort has disappeared, because AIs and robots can do everything humans can do, only better and cheaper.

I think there are a few stages to approaching this problem. At first glance, it looks amazing: endless wealth, economic growth, medicine—so far, so good. But then, when you think a little harder, you run into this kind of queasiness: what would we actually do all day? What gives human life meaning and purpose might seem hollowed out. Still, I think if you push through that, you might eventually come out the other side to something extremely desirable. Nevertheless, it would involve giving up some of the things we currently take as central to human existence. We would need to rethink the foundations of human dignity, worth, and the good life—reconstituting them in a way that makes sense when the assumptions that currently frame human existence no longer apply.

At the most superficial level, many people today base their self-worth on making some positive contribution to the world. They’re the breadwinner, or they take pride in being skilled at what they do—a craftsperson, podcaster, journalist, philosopher. Right? That gives them a sense of identity and pride in achievement, something to strive for every day. But in a solved world, AI could do all of that better. It would be a cultural transition. Yet we already know that people can live good lives without economic contribution. Think of children, for example: they’re not productive, but they play games, interact with friends, learn, and generally have worthwhile lives. Or retirees in good health with financial security: they might go on cruises, watch sports, spend time with grandchildren—all of which can be very attractive.

But that’s still just the surface, because so far we’ve only imagined the disappearance of economic labour. I think it goes deeper: much of what fills people’s days, even if they don’t work, involves effort that could also become unnecessary. For one person it might be fitness—right now you go to the gym, and that gives structure to your life. For another it’s gardening, tending the plants just so. For someone else, it’s shopping and decorating a home in a way that reflects their unique style. Yet with full technological maturity, all of this could be outsourced. A pill could give you the same physiological benefits as a massive workout. A robotic gardener could keep your garden in perfect shape. A recommender system could select and install curtains that match your taste better than you could choose yourself.

As you think this through, a lot of structure dissolves. And then the question becomes: what remains once all of that is gone? You could still choose to do these activities, of course, but there would no longer be any point—no instrumental need. You would only do them simply because you wanted to…

[Roman Yampolskiy]: I struggle to say what a human could contribute in a world with superintelligence. Some people argue that only you know what ice cream tastes like to you, and only you have human qualia. But those don’t seem valuable beyond yourself. So unless we can convince superintelligence that those things are somehow important, it’s not obvious what we can contribute—or what the source of meaning would be for us in that world.

Q: How do we stop ourselves from treating AI as a God?

[Nick Bostrom]: … to some extent that’s above my pay grade. But religion and spirituality are among the things that might well survive in a solved world, since they posit transcendental reasons for action that can’t be automated. You could build a robot to do the praying for you, but presumably that wouldn’t count—it has to be you who does it, that’s part of the stipulation. So in fact, the space for religion and spirituality in many forms could expand. Right now, even if you’re very pious, most of your day is necessarily filled with brushing your teeth, taking your kids to school, making a living, cleaning the house—all of these things. Remove all of that, and what remains can occupy more of your time and attention.

So these broadly religious or spiritual concerns could grow in significance. Another example would be games, which could become much more important and central. And I don’t mean a quick round of Monopoly, but vast, long-lasting games that might span years, even involving whole civilizations. Think of multi-dimensional cultural spaces where people are creating, relating, competing, blending in elements of sport, forming alliances and teams, pursuing goals, and gaining status or rewards through play.

Alongside that, various forms of aesthetic experience could also flourish—experiences valued in themselves rather than for the sake of something else: contemplating beauty, truth, fitness. These would provide meaning directly. In that sense, religion and spirituality might offer natural purposes, while games would provide artificial ones—both surviving, and perhaps even thriving, in the transition to a solved world.

Q: What made you worry about AI risk?

[Roman Yampolskiy]: My background is in cybersecurity. I was always interested in making software safe, so I naturally decided to pursue this for AI. But the more I researched, the more I realized there are upper limits to what we can do—impossibility results in almost every aspect of this technology. And since AI is built from many disciplines—psychology, neuroscience, mathematics, economics—each brings its own impossibility results, which define those limits. So far, I haven’t seen any major breakthroughs in safety, while there have been incredible breakthroughs in capabilities. It feels like we’re staying in the same place when it comes to controlling these systems, even as they grow far more powerful.

Q: What are the risks posed by todays AI systems?

[Clark Barrett]: It’s important to consider all possible risks, but I think today’s systems already pose significant ones—mainly through human misuse. We’re already seeing AI used for misinformation, fraud scams, and even to bypass security systems. As usual, it’s the bad human actors we have to worry about, and AI has become a very powerful tool in their toolbox. That, I think, is what we should be most concerned about.

[Nick Bostrom]: Well, it’s an open question. You’re referring to the Vulnerable World Hypothesis, a paper I wrote some years back, which explores the idea that the world could be—well, as I put it—vulnerable. The paper lays out several categories of ways in which the world could be such that destruction would occur by default.

One possibility is simply bad luck: the discovery of a technology that makes it too easy for an individual or small group to cause catastrophic destruction. Think of nuclear weapons. In a sense, we were fortunate that unleashing the energy of the atom turned out to be difficult. You need highly enriched uranium or plutonium, and those can’t just be made in your kitchen sink; they require large facilities with things like ultra-centrifuges. Before we did the relevant nuclear physics, there was no way of knowing which way it would go. It could have turned out that nuclear weapons were trivially easy to make. As it happened, they were hard enough that only big states could do it—not individuals. But imagine the counterfactual: if the physics had turned out differently, the world might have been vulnerable to nuclear weapons. If anyone could build one at home, that could well have spelled the end of civilization from the moment of discovery. In any large population, there will always be some bent on destruction—whether fanatical, insane, or simply criminal. And if one person could wipe out a city, then cities couldn’t exist.

That was one bullet we dodged. But we keep reaching into this great urn of inventions, pulling out one ball after another—and if there’s a black ball in there, eventually we might draw it. A massively destructive biological weapon could be such a black ball. And there’s no guarantee that for every offense there will be a corresponding defence. Even when countermeasures exist, they might not be equally likely to develop, or cheap enough to deploy effectively.

There are also other possible forms of vulnerability at more systemic levels, which I discuss in the paper. But to be clear, the paper doesn’t argue that the world is vulnerable. It formulates the hypothesis so we can keep it in mind as a possible way things could be. And it argues that if the world is vulnerable, then providing general protection might require radical changes in governance and surveillance—changes that carry their own risks. For example, imagine a form of ubiquitous, fine-grained surveillance that could keep the world stable even if anyone could build a nuclear weapon in their kitchen. That would mean monitoring every kitchen, all the time, and watching what everyone was doing—close enough to intervene the moment someone started assembling dangerous components.

That might be what it would take to stabilize such a world. But of course, that kind of surveillance system would itself carry risks—totalitarianism, loss of freedom, abuse of power. And at the international scale, dealing with structural vulnerabilities could demand much stronger forms of global governance. The paper doesn’t try to weigh those costs and benefits in detail. Its goal is simply to help us think more clearly about these structural issues, and the different ways the world might be.

[Vikas: How would you extrapolate that up for AGI?]

[Clark Barrett]: This is all highly speculative, but there are scenarios where a superintelligent AI could become a threat to humanity. Personally, I’m not a big doomsayer—I don’t think that outcome is highly likely—but I do think it’s highly uncertain. So we should be cautious and as prepared as possible. One way to do that is by exploring how to make AI safe by design. That’s not the path big companies are taking; they’re just throwing all the data at the models, and no one really knows what will come out of it. A more cautious approach would ask: what kinds of guarantees can we make about these systems, and how can we build them in a way that ensures those guarantees? I think these are critical questions we need to be exploring. This is a new way of developing machine capabilities, and there’s something exciting about that. It’s a fresh approach with tremendous potential, and we’ve already seen amazing results. But, as usual, people often don’t think seriously about safety until after some kind of catastrophe—and I worry we may be heading in that direction. We’ve already seen smaller-scale catastrophes with AI, like self-driving cars running people over. We haven’t seen anything like that with AGI yet, and it’s unclear how soon we will, but I do think the potential is there.

Q: How do we control superintelligence?

[Roman Yampolskiy]: Preparing for the arrival of superintelligence—if it cannot be controlled—is futile. There’s nothing we can do about it. The only alternative is not to build general superintelligence at all, but instead to gain all the economic and scientific benefits from narrow tools, and propose that as the solution for a time. I don’t think that’s what we’re doing right now. Every company seems to be fundraising and racing explicitly toward superintelligence. Sometimes they may use that term loosely, but the goal is always to reach that level, rather than focus solely on narrow tools.

Q: How will those risks play out through our transition to a civilisation where we have superintelligence?

[Nick Bostrom]: certainly think that among the various risks in the transition to the machine intelligence era, there will be a lot of turbulence—sociocultural dysfunctions of different kinds, or even totally unforeseen ways things could go off the rails. But I’m also concerned about more mechanistic pathways to destruction, like bioweapons, which in some respects might actually be easier to tackle.

It’s hard to imagine limiting the influence of AI itself. Once people have access to it, they’ll use it, and it’s a bit like communications technology. The printing press, for example, created massive disruption when it was introduced—religious wars and so on—but it’s difficult to see an attractive alternative for human civilization that wouldn’t involve such technologies. By contrast, something like bioweapons might be addressed more narrowly. For instance, you might regulate DNA synthesis machines more tightly. Unless you’re a company actually doing the synthesis, you wouldn’t have your own machine. Instead, you’d get materials only from a handful of carefully monitored firms that provide DNA synthesis as a service. That way, the world could still progress, but maybe you’d cut that particular risk in half.

By contrast, with sociocultural dysfunction, it’s much less clear how you’d reduce the risk. What would you do—empower some class of philosopher-kings to rule the world? It’s much harder to see a viable intervention there.

Q: What will it take for the AI safety agenda to be taken more seriously?

[Clark Barrett]: In some of the circles I run in, people talk about needing a kind of Goldilocks “bad experience” with AI to get others to take the risks seriously. Whether we’ll be lucky enough for it to play out that way, I don’t know. You asked whether there are viable alternatives, and I do think many people are working on this problem. There’s a broad international discussion about AI safety, and top researchers are pursuing it in their labs. What we haven’t yet seen is a viable technological prototype of what an alternative might look like, and I think that will take some time. I hope universities and governments will invest heavily in this, but it may be that we need to first demonstrate a prototype that proves an alternative is possible. I know people who are working on that—several large companies and prominent professors—and I hope they succeed.

Q: Is the nuclear arms-control analogy relevant to how we could apply the right risk & governance controls to AI?

[Roman Yampolskiy]: Let’s look at the nuclear weapons example, which is often used. At the time, we used all the weapons we had against civilians. Since then, we’ve built thousands more, even more powerful ones. New countries have acquired them, and there’s a good chance we’ll see them used again in a future war. So I don’t see it as a great success story—and that was an easier case, because nuclear weapons are tools. A human has to deploy them, and since we understand the mentality of other people, we can negotiate and at least know how to play that game. Superintelligence is different—it’s not a tool, it’s an agent. It’s independent, not human, with completely different wants, needs, and ways of being influenced. It’s immortal. So everything we know about standard game-theoretic approaches won’t work the same way. Regulation isn’t a solution to technical problems either—just as making it illegal to write computer viruses or send spam makes zero difference in my daily experience. It’s the same with AI. I explicitly looked at accidents as a possible trigger for slowing down or stopping progress. In fact, I collected the largest database of AI accidents at the time. But none of them made any difference. People treat accidents as a kind of vaccination against risk—they say, ‘we had this accident, we’re still here, it’s not a big deal.’ Even if someone dies, the numbers are small, and the reaction is always, ‘let’s go on.’ So there’s no real pause to stop and reconsider.

Q: What are some of the real world accidents which have already happened with AI?

[Roman Yampolskiy]: Obviously, we don’t yet have superintelligence or AGI, so the examples come from narrow systems—and the pattern is very clear. If you design a system to do X, it will fail at X. If it’s a spell checker, it substitutes the wrong word, completely messing up your message. If it’s a self-driving car, it kills a pedestrian. Whatever you hope the system will accomplish, it ends up failing in exactly that way. And if we project that forward to general systems, the impact could extend to everything—really messing up everything. So far, the failures have been relatively mild: false triggers, mistaken nuclear responses, mislabeling, abuse, and so on. But at no point did anyone stop and say, ‘this is something we can really learn from.’ A good example is Microsoft. They first launched a Twitter bot called Tay, which quickly started producing abusive phrases because they gave users direct access to train it. Later, they switched to OpenAI technology and repeated essentially the same mistake with a more advanced AI—which also became super abusive and deeply embarrassing for the company. So there’s been literally zero learning by Microsoft in terms of what’s happening there.

Q: What is the pathway to effective governance?

[Nick Bostrom]: I think we need to distinguish between an early transitional period and a later stage, where ultimately most responsibilities will likely be handed off to AI systems once we fully trust them. In the nearer term, though, AI governance will pose major challenges at many levels. At the geopolitical scale, for example, we’re already beginning to see an AI rivalry between the US and China, while Europe is still somewhat asleep. The challenge is to manage that—ideally steering toward a future where we avoid negative-sum competition in the run-up and reach some outcome in which everyone has a share of the upside.

But as we imagine a world with super-advanced technologies, digital minds, vast algorithmic systems, uploaded humans, transhumans, traditional biological humans, and even uplifted animals, it becomes very hard to picture a system of rules today that could regulate such a landscape. Fortunately, I don’t think we’ll need to solve all of that in advance. As we get closer to that future, two things will happen: first, our perception of the actual problems will become clearer—it’s always harder to see from a distance; and second, we’ll have the advice of increasingly powerful AI systems themselves to help us work out many of the specifics.

So the main task for us right now is to steer in a direction that sets us on a trajectory—perhaps imperfect at the start, but one that can gradually bend toward more utopian destinations. At the same time, we need to avoid existential catastrophes, moral catastrophes, or pathways that take us so far off course that we get stuck in some local optimum—or worse, destroy ourselves.

Q: Is the lack of explainability a risk to us?

[Roman Yampolskiy]: A typical person assumes that AI experts and researchers know what they’re doing—but that’s not true. They have no idea how it works, what it’s going to do, or what capabilities it will have. When they train the next-generation model, like GPT-5, they don’t know what it will be capable of until it’s finished and tested. Often, they’re surprised, and in some cases, even after release, new capabilities are still being discovered. So the upper limits—what we can understand, what the system can explain to us, how well we can predict its capabilities—are crucial for ensuring safety. And right now, we have none of those tools.

Q: How does verifiability intersect with AI risk?

[Roman Yampolskiy]: Verifiability has to do specifically with mathematical proofs and software correctness verification. For very important software—like systems controlling nuclear power plants or space flight—we want to prove that the code written matches the mathematical definition of the software design. For static software, we can do this with certain mathematical proofs. Mistakes have been found, but overall it works reasonably well. The problem here is that the software isn’t static—it’s dynamic, self-improving, and learning. There’s no fixed design; it’s grown rather than engineered. And we have absolutely no tools for testing something like that to the point where we can say it’s 100% correct according to some specification. That’s what I mean by verifiability. The same applies to proving safety guarantees. You prove things relative to a spec and a verifier—a mathematician, a piece of software, even the mathematical community as a whole—but all of them are known to make mistakes. So you end up with this infinite regress of verifiers. You can be more convinced with more resources, but you never get to 100%. And for something so profoundly impactful—something that could kill everyone—even failing once in a billion tries isn’t enough. It really needs to never fail at all. That’s where safety and verifiability come together.

Q: Why is it important to handle AI risks during the development phase?

[Roman Yampolskiy]: … so you want to look at the training phase—and the system can already be dangerous at that stage if it’s sufficiently advanced. But more importantly, after deployment you have no control over what happens, especially with open source. Someone else can modify it, give it different goals or prompts, or put it through additional training to make it more capable. So I think it’s an interesting lens for examining where dangers come from, but it doesn’t offer solutions. It only shows how you can break things down into a taxonomy of possible sources of danger.

Q: Is there a need for political intervention around AI development?

[Clark Barrett]: absolutely, there’s a role to play. The question is whether there’s the political will to take on that role. And right now, when it comes to safety, things don’t look great. I can’t even convince companies I talk to that it’s a bad idea to put an LLM on their webpage without any human oversight. They say, “Oh no, this is great, it works—we need these AI agents to run our business.” But what worries me is that we already know AI agents hallucinate and make mistakes. Put enough of them out there interacting with people, and things are bound to go wrong. And yet, there doesn’t even seem to be much concern about it.

Q: Could it be cultural collapse of civilisation where the real AI risk is?

[Clark Barrett]: I think I’m more worried about that, personally. I’m more concerned about people becoming dependent on and trusting flawed AI agents, and the kind of havoc that could result from that.

We already see in our society how technological systems can influence people in extreme ways—just look at social media, fake news, disinformation—and AI only supercharges all of that. I think we were already losing that battle without AI. So I really worry about how we’re going to get out of this, and how we avoid drowning in a sea of AI slop and second-rate AI-generated content. I don’t know the answer. But I do think something has to happen, because I don’t want that future.

Q: How do we solve for a world where we need superintelligent systems, but they come with these huge risks?

[Roman Yampolskiy]: I don’t have a good solution for you. But I’d like to speculate that it’s possible to capture most of the economic gain through narrow tools. Take the protein folding problem—it was solved using a relatively narrow AI system. I think we can do the same for other important, monetizable problems. If you have a cure for cancer, that can generate enormous value. If you cure aging—which is a disease—you could probably get a monthly membership from a billion people. That’s not a bad business model; that’s good capitalism, and you don’t have to die in the process. So I think that once we recognise the impossibility of controlling superintelligent systems, this becomes a very natural way to shift toward real problems, real solutions, and still achieve a strong economy.

Q: Are there seeds of hope among the potential risks?

[Clark Barrett]: I do think there is, and I tend to be an optimistic person. Mostly, I’m optimistic about the power of humans to adapt—that we’ll find our way, come to understand what we can and can’t depend on AI to do, even if it takes some rough waters to get there. One fantastic thing is that if you’re using AI and you’re well versed in the subject, it’s a tremendous help. It can find things quickly, track down court cases, all sorts of tasks. If you’re a programmer who knows what you’re doing, it can write a lot of code for you. The danger comes when people start letting it handle things without oversight, without being experts. That’s what I really worry about: if we use AI for tasks run by non-experts who can’t tell when it’s going off the rails, I think that will cause problems. And we’ll probably have some bad experiences with that—experiences that will, hopefully, steer us in the right direction as a society.

Q: What about the risks of Ai as it interfaces with physical devices such as robots?

[Clark Barrett]: I don’t think it’s actually that different from other new technologies. What you see is that large, established companies tend to take a more risk-averse approach, especially with physical devices. If you look at some of the self-driving car systems from Nvidia, for example, they prioritise safety above all else. Then you have younger innovators who say, “We’ll just build a prototype and release it tomorrow”—and I think those guys are dangerous. I worry about them. But this is something we see with most technologies. We do have a fairly robust system that pushes back—unfortunately, it’s usually driven by accidents and catastrophes. When those happen, the actors who aren’t being careful enough tend to get pruned. I think the same will happen with physical devices, so I’m less worried about that—though I am, of course, saddened by the catastrophes that will inevitably occur.

Q: Do we need to slow down, or stop, our path to AGI?

[Nick Bostrom]: I think we should develop AGI and superintelligence, and that it would be tragic if we failed—tragic in many different ways, some obvious and some less so. We should, of course, try to be careful in how we do it. In an ideal world, whoever develops it—whether a company, a country, or some international project—would have the option, at the critical stages, to slow down for a limited time: a few months, half a year, maybe a year. That would allow for thorough testing, incremental scaling, and the implementation of safeguards as much as possible.

The danger is if development turns into an extremely tight race, with ten companies or countries competing, where anyone who pauses to implement safeguards immediately falls behind and becomes irrelevant. In that case, the race goes to the most reckless, which seems like a much worse scenario. By contrast, the ideal case would be for progress to remain swift overall, but with the ability to slow down slightly at the key final stages.

And then, ideally, the process would be as cooperative as possible, guided by moral considerations—not only to ensure that all of humanity shares in the upside, but also that non-human animals are included, and that digital minds receive moral consideration as well.

Q: What’s the opportunity for humanity if we get this right?

[Clark Barrett]: Yeah, I mean we’ve seen every possible outcome in science fiction—everything from utopia to total destruction. I think there are reasons to believe a more utopian future is possible, and also reasons to think that after an adjustment period, there could be immense benefits for humanity. So I do believe that, and that’s what I’m hopeful for.

Q: Do we need superintelligence to achieve those outcomes?

[Roman Yampolskiy]: Stop building general superintelligence. It’s not in your best interest—you won’t be financially successful, you won’t be powerful or famous, and if you succeed, you’ll lose everything. Instead, shift your efforts toward solving specific problems with narrow tools. You can still end up living forever as a billionaire. Life can still be good.

————-

Clark Barrett is a Professor (Research) of Computer Science at Stanford University, where he also directs the Stanford Center for Automated Reasoning (Centaur) and co-directs the Stanford Center for AI Safety. A pioneer of Satisfiability Modulo Theories (SMT), Barrett’s doctoral work at Stanford (2003) laid the foundations for a field now central to formal verification and AI safety. His research has advanced the reliability and security of systems—from hardware verification to neural networks—earning him ACM Distinguished Scientist status and the Computer Aided Verification (CAV) Award in both 2021 and 2024.

Nick Bostrom is a philosopher and writer renowned for his work on existential risk, superintelligence, and the long-term future of civilization. He founded Oxford’s Future of Humanity Institute and is now Principal Researcher at the Macrostrategy Research Initiative. His key works include Anthropic Bias: Observation Selection Effects in Science and Philosophy (2002), Global Catastrophic Risks (2008), Superintelligence: Paths, Dangers, Strategies (2014)—a New York Times bestseller that sparked global conversation about AI dangers—and Deep Utopia: Life and Meaning in a Solved World (2024), which explores the philosophical implications of a post-scarcity world.

Roman V. Yampolskiy is a computer scientist and the founding Director of the Cyber Security Lab at the University of Louisville’s Speed School of Engineering. A prolific author, he has published over 100 works spanning AI safety, cybersecurity, and behavioural biometrics. His influential books include Artificial Superintelligence: a Futuristic Approach (2015) and AI: Unexplainable, Unpredictable, Uncontrollable (2024), in which he investigates the inherent challenges and risks of advanced AI, emphasizing issues such as unpredictability, incomprehensibility, and the limits of control.

Making AI Safe: The Race to Control Humanity’s Most Important Invention.

About the Author

Up Next: The Health of Society