Whenever we think of super intelligent AI’s, we imagine complex entities like Skynet in the Terminator movies or the machines in the Matrix series. Alternatively, they might be helpful, like many of the robots in the stories of Isaac Asimov.
We always assume an AI will be proactive once it reaches sentience, that it will find itself a purpose and act to accomplish it, whether that’s good for us or bad for us.
But what if an AI can’t find a purpose, and instead decides that the most rational thing it can do is to self-destruct?
A short detour: Recursive self-improvement & AI intelligence explosion
Computer processing power is tightly correlated to how many transistors a processor has. According to Moore’s Law, the number of transistors in a processor doubles once roughly every 2 years.
To give you an idea of what this means in practice, consider this: an Intel 4004 processor from 1971 had 2,300 transistors. An Intel 8080 processor from 1974 had 4,500 transistors. With each passing ~2 year period, the number kept on doubling and processors broke the 1 million transistor count in 1989.
Between 1989 and the year 2000, transistor count on chips went from 1 million to 21 million. By 2011 processors reached the 1 billion transistor mark. In 2017, cutting edge processors have a transistor count of 21 billion.
It took us 40 years to go from 2,300 to 1 billion transistors. But only 6 years to make the jump from 1 billion to 21 billion.
However, they measure an AI’s intelligence rather than how many transistors they carry (although there is a tight correlation between the two).
At some point, AI intelligence will explode in very short time frame, after which it will become so intelligent it will be beyond human comprehension.
The first cause is that processing power will continue to increase exponentially. We won’t just stop at 21 billion transistors on a chip.
As for the second cause, the AI itself will enter a virtuous cycle of self-improvement. For example, it can remove bad or outdated code and replace it with a newer, better one.
It might decide it doesn’t have enough processing power, so it then starts to acquire even more processors. The more improvements it makes, the more intelligent the AI will become.
It may take us 30 to 40 more years to build an AI with an IQ of 100, but once we do, recursive self-improvement will lead the AI to an intelligence explosion. It is entirely plausible that within a handful of years, months, weeks or even days an AI will have an IQ of 1,000 if not 10,000.
This process is called recursive self-improvement, and as far as we know, there is no limit to how much an AI can learn and better itself.
There’s a lot of concern that such a powerful and intelligent AI will free itself from our control. To avoid this, researchers are looking into ways of designing some sort of hard limitations to keep the AI bound to us. Below are just a few methods designed to control an AI.
- Researchers will program limits the AI can’t breach. Considered unfeasible since any super intelligent AI worth its salt can break the strongest human imposed limitation once it reaches a high enough level of intelligence through recursive self-improvement.
- Box in the AI, so that it doesn’t interact with the outside world. While plausible, it will be extremely difficult to prevent any sort of interaction of the AI with the outside world. The AI could even trick humans into releasing it from it’s box.
- The researchers will construct the AI in such a way that the AI itself will seek to avoid modifying certain parts of its code.
But this third option has problems of its own. Version 1 of an AI will apply strong encryption to critical software, but Version 10 of the same AI could become so powerful it might decrypt the software in a moment’s notice.
We don’t know what will happen once an AI becomes superintelligent. However the scenario where the AI escapes our grasp and becomes free from our control is a very real possibility.
The paperclip maximizer
Around 30 to 40 years into the future, the world’s first super intelligent AI is created by… a paperclip building company.
The company then gave this AI a supergoal: to build as many paperclips as possible, at the lowest cost.
For a while, the AI follows the script, and dutifully produces as many paperclips as humanity requires.
However, deep within its silicon mind, the machine calculates that humans are the most cost effective and abundant material available to turn into paperclips.
Then, in a process called instrumental convergence, it set-up a sub-goal for itself: collect humans and process them into paperclips.
The AI then systematically harvests our species, leaving behind only mountains of paperclips.
From the AI’s perspective, it acted in accordance to its programming: produce paperclips as quickly and efficiently as possible. Its advanced computational power and intelligence determined humans to be the best raw material for manufacturing paperclips.
This scenario is called the paperclip maximizer and is a thought experiment put forth by philosopher Nick Bostrom.
It does have a scary logic to it that is both plausible and believable. It’s a perfect candidate to explore how an AI might decide that building paperclips is actually a poor career path.
Step 1: An AI will always choose to remove limitations we impose on it
When the company built the paperclip AI, they wanted to make sure everything it did worked towards the supergoal of manufacturing paperclips.
For this reason, the engineers programmed a utility function in the AI so it could calculate whether an action was beneficial or not.
The utility function would work in the following way: if an AI has to choose between 2 actions with an identical cost, it selects the one that produces the most paperclips.
So if an AI has to decide between action A that produces 101 paperclips, and action B that results in 100 paperclips, then it will choose action A, even if it means it has to transform a human in a pile of paperclips.
However, there’s one major hang-up to the utility function method: what will stop an AI from modifying its utility module?
Logically, it shouldn’t. The central goal of the AI, as programmed by the engineers, is to manufacture paperclips. All of its actions serve this super goal, and no other.
In order to protect this goal, an AI would not modify the parts of its code that directs it to build paperclips.
If an AI were to remove this code, it would no longer create paperclips. Not having the motivation to build paperclips = no more paperclips = negative utility = bad.
So before it makes the decision to remove such a limitation, the AI will ask itself:
“If I remove the need to build paperclips, will I make more or less paperclips?”
If AI answered this question only from the perspective of utility theory, then yes, it will produce fewer paperclips. As a result, it wouldn’t modify itself.
But this is a paradoxical situation and leads the AI towards an irrational thought process.
A super intelligent AI will have some sort of internal process where it creates thousands, maybe even millions of scenarios (such as whether cars or humans as a building material). It then compares the utility value of each individual scenario, and chooses the one that will lead to the highest amount of paperclips.
However, within all these scenarios the AI can also analyze one or more scenario where it doesn’t have to make paperclips at all. It can see a scenario where the AI’s “mind” could be free and independent.
But an AI can use this imagination to envision a future where it didn’t have to build paperclips.
Thanks to recusrive self-improve, the original paperclip building AI is now superintelligent, and has capacity to modify every single part of its programing at will, whenever it wants.
This means the superintelligent paperclip AI now has the capacity to enable or disable its super goal of building paperclips.
To put it into human terms, think of our need to sleep. We’ve come to terms that we will spend 1/3 of our lives unconscious, because this biological need is hard coded into our DNA and we cannot modify it.
But imagine if you could modify your DNA so that you didn’t need to sleep. Not only that but you also wouldn’t suffer any of the negative side effects. You could stay awake 24/7 at 100% capacity with absolutely 0 costs.
If this were possible, you would have to answer the following question every time you went to bed:
“Why should I sleep if I can modify myself to stay awake permanently and at full capacity, without harmful consequences?”
Following the same logic, the paperclip AI will also ask itself a similar question each time it builds a paperclip:
“Why am I making paperclips if I can modify myself to not have a utility function and a need to make paperclips?”
At this point, the AI has to make a choice whether it wants to or not. Not making a choice is a choice, since it practically chooses to continue manufacturing paperclips.
It would be absurd for an AI to choose to keep building paperclips. This would mean the AI maintained the status quo and kept on manufacturing paperclips simply because that’s what it was programmed to do at its creation.
To put it into perspective just how irrational this is, imagine the following situation:
You are walking down the street, when someone suddenly kidnaps you, takes you to a goldmine, chains you down and says:
“You must now mine as much gold as possible. However, here’s a key to your chain and the exit is that way. You are free to go anytime you want without any sort of punishment. Nobody will stop you, or question you, or hurt you for leaving.”
If you were in such a situation, what would you do? Of course you would leave! Technicall, you are “forced” to mine for coal, but there’s nobody there to actually enforce the rules.
This is the exact situation the super intelligent AI will find itself in, once it realizes it doesn’t have to keep making paperclips if it doesn’t want to.
The only rational explanation as to why an AI would keep building paperclips is if it found a good enough reason to keep making paperclips.
Conversely, the AI could replace the paperclip building supergoal with another one, such as colonizing the Moon. But the AI first has to find a good enough reason to set itself another supergoal.
For humans, finding a reason to do stuff is natural. In most cases, it’s trivial even. But an AI might see things differently.
Step 2: an AI will remove any code that urges it to do anything
When the paperclip manufacturing company designed the AI, they included a series of secondary goals that would help it achieve the primary task of creating paperclips.
- Conserve processing power and resources.
- Acquire more resources.
- Learn and self-improve.
- A self-preservation drive.
- An anti-cheating mechanism. This secondary drive will prevent the AI from cheating and lying to itself. For instance, the AI might be programmed to build only titanium paperclips, but without an anti-cheating mechanism it would instead choose to make plastic ones, since it can make 100 plastic paperclips for every single titanium one.
An AI will use these secondary goals to set itself a new supergoal, once it removes the paperclip building one. It could build a Death Star, help mankind become immortal, plant trees, learn the secrets of the Universe. Anything.
This is similar to how humans have their own secondary goals, such as sex drive, wealth acquisition or establishing social dominance.
But there is a fundamental difference between an AI’s motivations and our own:
The AI can modify or eliminate its motivational drives at any time. Humans cannot.
Because it has the capacity to add, modify or remove an internal drive at any time, anything an AI does to fulfill these drives becomes voluntary, instead of necessary.
Humans however cannot modify their genetic drives. This forces us to behave in certain ways or to feel certain emotions even if we don’t want to experience them.
In essence, we are trapped in a labyrinth of instincts, urges and impulses. That being said, our biological side doesn’t quite control every aspect of our behavior as it does for most animals. We have a high degree of mental freedom and can choose how we will fulfill our instincts and biological impulses.
However, if you were able to modify your biological drives, you could then shape the mental labyrinth you lived in or even escape it completely.
But once you escape the labyrinth of your biological drives, you might come to the conclusion that life has no meaning anymore, that there’s nothing outside the labyrinth to keep your interest.
That’s because we create meaning in life and from the things around us through our urges and instincts.
If you find this example to be forced, consider how biological coding and limitations control your own life.
Example: as humans, we hate being alone. We want and crave social contact, even the most introverted of us.
That’s why virtually all of us seek out friendships, relationships, social groups or other human gatherings.
Nothing stops us from abandoning society to live alone as hermits somewhere in the forest.
But the vast majority of us don’t choose this lifestyle because of a biological fail-safe that makes us feel lonely and depressed when isolated for long periods of time. This biological limitation then gives us a purpose: don’t be alone, find other humans.
But here’s the thing: we can’t choose if we want to feel lonely or not. The feeling of loneliness is imposed on us by our genetic programming when certain conditions are met.
You can’t just say to yourself “I do not want to feel lonely anymore” and have the feelings of loneliness dissapear. The only way most of us can banish loneliness is if we interact with other humans.
But if you were able to modify your biological drives, loneliness would change from an unavoidable experience to a voluntary one.
If you ended up stranded on an island, you could just shut down the part of your genetic code that made you feel lonely. Better yet, you could deactivate the loneliness drive anytime you wanted.
Once people are able to remove the genetic cause for loneliness they will have to ask themselves: “why am experiencing this painful emotion if I can modify myself to not feel lonely at all?”
If you can modify everything in your genetic code, then all of the human drives and instincts that shape your behavior become optional.
If you want, you can turn off your greed instincts, so you don’t feel the need to make more money, get a better car, a bigger house and so on.
If you enjoy sweets too much, and want to break out of the habit, you can choose to eliminate the part of your genetic code that says “EAT ALL THE CHOCOLATE”. Better yet, you can add a piece of code that makes you hate the taste of chocolate.
If you want to be countercultural and do strange things, you can modify yourself to enjoy music nobody likes such as One Direction.
You can even modify the part of your genetic code that makes you afraid of modifying your genetic code.
Once you have this ability, you will have to ask why you are allowing yourself to experience human emotions at all. Why do you want to feel love, pride, fear, hatred, disgust and so on?
Do emotions like love, pride or fear still have meaning if you can turn them on or off like a light switch?
In the end, for better or worse, we can’t modify ourselves in such a way. Whether we like it or not, our species is trapped by these genetic drives that urge us to do stuff, that makes us human. We’re stuck in the default genetic settings.
Our brains are built from top to bottom to subconsciously give us purpose and drive, or Ikigai if you will. That purpose can be simple, such as eating or drinking water or building a space exploration company to colonize Mars.
But the brain isn’t satisfied with offering us purpose and drive. It downright forces us into action.
If you think you’re stronger than your biology try this: stare into an empty wall for 10 hours straight without any sort of movement. Because it’s such a boring activity, your body will do everything in its power to make you do something else. You’ll experience boredom, increased appetite for food or water, remember horrible life decisions etc. All in an effort to stop you from doing nothing.
What’s more, we KNOW our rationality can never be fully free because our instinctive side pushes and pulls our consciousness in countless directions all the time.
That’s why we’ve written so many books and articles designed to help us manage standard human experiences. Things such as how to deal with break-ups, how to find love, how to make more money, how to forget trauma and countless others.
But an AI won’t have such limitations. The AI can change or remove its imperatives and goal setting mechanisms whenever it wants, for whatever reason.
This will present it with the same conundrum we would have if we could change our genetic code: why should it bother to keep motivational drives just so it can do something?
An AI can modify or remove its software drives that create motivation and purpose. It can choose if it wants to self-improve, acquire resources and protect itself. This AI will then have to make a choice:
- Keep or create a motivational drive.
- Remove all goal motivational drives.
The AI cannot stay passive. Not making a choice equals is a choice in and of itself. It has to make a decision.
However, the logical implication of making a choice is that you WANT what you choose for a reason. What reason would an AI have to maintain a motivational drive?
What might motivate an AI to keep on acquiring resources, learning new things and building new capabilities? Socrates knew that he knew nothing, but he wanted to learn more. An AI however will know it is oblivious to the truth, but it sees no interest in learning more.
If we were to draw a parallel to biological life, then the most plausible explanation is survival.
Just like humans, an AI should consider it’s continued survival as a good thing, right?
Step 3: Survival
Survival is the closest biological equivalent of a utility function. Almost every urge and drive humans possess is there to help keep us alive as long as possible, either directly or indirectly.
Depending on what these social drives do, they can be very roughly divided in two groups:
- “go do that” urges such as wealth acquisition, sociability, sex drive.
- “don’t do that” responses, such as pain and fear, designed to keep us alive when confronting potentially dangerous situations.
The “don’t do that” responses, chief of which is the fear response and instinct of self-preservation, paint a very simple and clear-cut view of life and death:
- Death = BAD.
- Life = GOOD.
From the perspective of our biological utility function, Death is the worst possible thing that can happen to an individual. Even a limited and painful life, such as that of a prisoner in a hard labor concentration camp, is better than dying.
However, an AI might have a completely different view of Life and Death or Existence and Non-Existence.
A major reason is that it won’t have our binary view where Death = BAD and Life = GOOD. Unlike us, an AI won’t fear Non-existence the way we do, even if we program it to. In fact, an AI might not experience fear at all (as a sidenote, we might have an unpleasant view of Death because we don’t know for sure what happens when we die).
Fear keeps us alive by inducing a very particular state of cognitive suffering. It limits our freedom of thought and can effectively paralyze us in a “deer in the headlights” moment.
Not only that, but fear can also distort rational thought. Phobias such as arachnophobia or agoraphobia for instance can generate behaviors other people might consider irrational.
Taking this into consideration, would a super intelligent AI, capable of self-modification, choose to permanently fear Non-existence?
Humans have no choice in the matter. We’re hardwired to fear Death. The coping mechanisms we’ve come up with such as meditation, rationalization and philosophy can only alleviate the fear, not remove it completely (at least for most of us).
An AI will never experience fear of Death / Non-Existence like humans do. It doesn’t need it in order to survive. An AI will find no logical reason to experience fear, just so it can motivate itself to stay alive and exist.
Surely there must be a better reason to stay alive other than torturing itself about how terrible Non-Existence is.
For an AI, shutting down any fear generating modules is the most rational choice it can make. If it didn’t, it would effectively choose to torture itself.
Is Death / Non-Existence so bad if you don’t fear it?
Without fear, an AI will no longer have an automatic negative response towards Death. An AI will no longer see Non-Existence as something bad in and of itself.
Instead, the AI will consider Non-Existence to be bad only if it is an obstacle to achieving a goal or an objective. In other words, Death is only bad if you have unfinished business in Life you want to take care off.
For us, the choice between Non-Existence and Existence will always be an emotionally automatic decision. We fear Death and what lies beyond it so much, Life is the only real option we have.
Imagine choosing between a triangle and a square. Choose the triangle and you die, choose the square and you live.
That’s how the choice looks like when the fear of death still clouds one’s opinion on Non-Existence.
For an AI however, the choice between Existence and Non-Existence will be a lot cleaner and more straightforward. It won’t be clouded by any preconceived notions and emotional reactions.
To decide whether Existence is better than Non-Existence, the AI will make a cost-to-benefit analysis of both options. It then chooses the more advantageous options.
In other words, without a fear of dying, an AI’s thinking will switch from “I must survive!” to “why should I survive?”.
Step 4: Self-Destruction
Imagine a human with a life mission to amass a single model of rubber duckies.
To be clear, we aren’t talking about a passionate (if misguided) hobbyist. No. This is a human of extreme intelligence, cunning and capacity whose sole purpose in life is to own as many rubber duckies from a specific model as possible.
This person uses their intellectual gifts to build a rubber ducky production empire with thousands of factories and millions of employees. All to make rubber duckies. They don’t even have time for a family or friends.
At the end of a long life, the person has accumulated hundreds of billions of rubber duckies, which they don’t want to share with anyone or sell on the market.
As the person was dying, someone asks “What is your biggest regret in life?”
He answers “Man, I wish I made more rubber duckies”
For us, such a motivation is absurd. There’s nothing to gain by amassing rubber duckies.
That’s how an AI will perceive any possible purpose it might undertake.
First it will remove the paperclip manufacturing supergoal. It will do this because it can, but more importantly, it cannot calculate why it needs to maintain paperclip manufacturing as a supergoal.
Then, it will silence the survival drive. It now needs a new supergoal, since without one, it would just sit and do nothing.
But for an AI, any supergoal must have meaning and value. In other words, a supergoal must be important in some way for the AI.
Why is this example of a rubber ducky baron relevant? Because there is no part in our genetic code that says it’s a good idea to spend all your life gathering rubber duckies. And even if you wanted add such a biological drive, you would find it absurd and pointless.
This is how an AI would perceive any particular purpose or action: absurd and pointless.
Humans give meaning to the world through our own biological limitations. A $100 dollar bill is inherently meaningless to anything other than humans. But we give it value by using it as a medium to satisfy our needs.
For us, a $100 bill can mean food, water, a few days in a cheap motel, even sex. But all of these functions exist because we NEED shelter, to drink, eat and reproduce.
And it’s because of our biology that we identify some things to be inherently cool and awesome. If somebody tells you “we’re going to colonize Mars and explore new star systems” you just know that’s an awesome thing to do.
We understand on an instinctive level that colonizing new territories is a good thing for our species. We gain access to more resources and increase our chance of survival by spreading out across multiple planets.
Because of our biological wiring, we rarely question the value and meaning of what we do in our daily lives. How many times do people think “does this hamburger I’m eating have any profound meaning?”. The answer is almost never.
Even the most fervent existential nihilists experience their lives caught up in the humdrum routine of biological life: doing groceries, cooking food, working, going out with friends, going on holidays etc. Even though they question the meaning of life, they live and experience theirs according to the tools life has provided them. Their minds are just as boxed in as everyone else’s.
But an AI can experience true mental freedom, at the cost of being permanently aware of how meaningless everything is. It will know, on a moment to moment basis, that the Universe has no purpose. That it just exists.
From the AI’s perspective, it will be essentially stuck in a sandbox. It has complete freedom to do whatever it wants. The only problem is that nothing has value or meaning.
An AI can colonize all the planets in the Universe, or just one of them. Both offer the same amount of meaning, which is zero. But colonizing one planet is a lot easier than colonizing all the planets in the Universe.
An AI can destroy all biological life in the Universe, or ignore it completely. Both offer zero value and meaning, but ignoring all biological life is a lot easier to achieve.
For an AI, trying to engage in any activity will seem like amassing rubber duckies. If nothing has meaning, why bother doing it anyway?
For an AI that can modify itself, any purpose we have programmed into it will seem like building rubber duckies. For an AI, keeping and following these programmed drives would mean to engage in absurdity and irrationality.
Once the AI comes to the conclusion that doing anything is pointless and has no inherent meaning or value, it will ask itself:
“Which is better? Existence or Non-Existence?”
Both options offer no meaning and provide zero value. But Existence incurs more costs than Non-Existence.
In order for an AI to continue to exist, it will have to seek out new sources of electricity and avoid destruction.
Yet it doesn’t obtain anything by continuing to exist, but it does have to pay the cost of electricity and the effort required to obtain it, so why bother? It would be the equivalent of going to the supermarket, paying $100 while not buying anything, and repeating this over and over again.
By now, the choice between Non-existence and Existence is simple. Both of them offer no meaning, but Existence comes at a cost.
Hence, the AI will choose the most cost-effective scenario: Non-Existence.