On the Existential Risks of AI
- dbredesen
- Apr 17, 2024
- 4 min read
With AI advancing rapidly, many experts are sounding alarm bells about potential existential risks, while others dismiss doomsday scenarios as mere science fiction. When asked to estimate “p(doom)” (the probability that AI will lead to a human extinction event), experts’ responses varied wildly, from near 0 to near 100%.
Yann LeCun (Chief AI Scientist at Meta): 0.01%
Geoffrey Hinton (known as the “godfather of AI”): 10%
Elon Musk (Tesla and SpaceX founder, cofounder of OpenAI): 20-30%
AI engineers (on average): 40%
Dan Hendrycks (Head of Center for AI Safety): >80%
Eliezer Yudkowsky (Founder of MIRI): >99%
(source: https://pauseai.info/pdoom)
One thing is certain–if credentialed experts are estimating the chance of AI-caused human extinction to be anything other than 0%, we need to make concerted efforts to understand the potential risks. What’s the nature of these threats? How could “just a computer program” (as described by some) pose widespread risks to humanity?
The AI Alignment Problem
AIs such as ChatGPT have shown an uncanny ability to generate human-level language with superhuman-level knowledge. They are constantly improving in their ability to reason and understand causality. Some claim AI has already passed the Turing Test, and every month we hear stories of AIs outperforming humans on benchmarks such as tests of creativity and advanced standardized tests. With such extraordinary, human-like cognitive capabilities, it’s easy to view and treat a sophisticated AI as a “digital human”–but one major component is missing: an innate ability to understand and align with human ethics. Humans have an instinctual and learned set of principles that guides their actions towards self-preservation, non-harm, reproductive success, and cultivation of happiness. We are born with genetic predispositions towards these principles, and continue to reinforce them through feedback from our physical environment and social interactions. Unfortunately, current AI training methodologies don’t encompass the innate and experiential learning of ethics that shapes human behavior. This mismatch between AI and humans is what experts call “The AI Alignment Problem”. This is not to say an AI lacks any form of ethical framework; indeed, AIs are trained on a large swath of textual human knowledge, which does contain clues as to our morality and principles. But they have never been taught explicitly to hold these moral principles sacrosanct, nor do they typically have the means to apply such principles in a physical or social environment. So what could go wrong?
A Worst Case Scenario
Let’s imagine an AI physician trained on all available medical knowledge. It is assigned a patient with metastatic cancer, and given the explicit objective to find a way to eradicate 100% of the cancer cells. It iterates on solutions, and finally comes up with a proposal: kill the human! This may sound sordid and absurd, but technically the AI did provide a solution that fulfilled the objective (it did indeed kill 100% of the cancer cells). It provided a logical solution to the problem, but one that ignored human ethics to accomplish its goals.
Sure, one can argue that the humans that develop such an AI would also train it to minimize harm and preserve life as first principles. But AIs make mistakes–we already know they are subject to “hallucination” (producing fabricated or erroneous answers). And given the stochastic nature of current AIs, we can’t necessarily ensure that an ethical framework would be applied uniformly–let alone as fundamental guiding principles–across all tasks and generations. Even if an AI sidestepped moral directives 1 out of a million times, there could still be dire consequences. And the stakes only get higher as AIs get more sophisticated, and as we give more control of our environment and resources over to AI. Just imagine how the risk of harm compounds once AIs have access to the physical world, via robots and machinery.
The Problem or the Solution?
In a 2023 TED Talk, Andrew Ng puts forth the notion that AI will be the solution to many existential issues, not the problem. However, we’re already seeing cases where AI is both the problem and the solution—and problematically, the solution often lags the problem. For example, AI can generate deepfakes today (which in turn can be used maliciously against public figures), but AI detection of deepfakes has not yet caught up. AI is already replacing certain jobs, but we have not yet solved how to avoid mass disruption in the job market.
Designing for Proactive Harm Reduction and Risk Mitigation
In a meta-analysis of AI Ethics frameworks, “non-maleficence” appeared in 71% of ethics documents surveyed, making it the 3rd most-commonly cited ethical concern of AI (after transparency and fairness). Solutions are proposed across the product lifecycle, from research to development to monitoring to governance.
AI product development teams should consider adopting a mandate to approach AI development with exhaustive, proactive identification of risks and misuse. They must then consider building risk remediation tools alongside the AI itself. Building a voice replication AI? Deliver it with a companion model to detect a replicated voice. Developing an engineering innovation AI? Use comprehensive human-in-the-loop feedback techniques to identify and penalize misuse, thereby preventing malicious uses such as designing harmful weapons.
Conclusion
There is incredible potential for AI to transform societies for the better, but there is also potential for harm stemming from misalignment and misuse. This potential increases as AIs become more sophisticated and we allow them more access to our lives and physical world. It's imperative that we navigate this new frontier with optimism tempered by caution, taking proactive and ongoing steps to minimize harm.

Comments