Can AI Intentionally Lie?

Artificial intelligence systems are supposed to be convenient, helpful assistants that simplify life’s repetitive responsibilities. As they become more integrated into everyday life, answering whether or not they can intentionally deceive their users becomes increasingly urgent.

The Nature of AI Deception

AI deception evokes the three laws of robotics — a concept the science fiction author Isaac Asimov introduced. In this short story, robots can’t hurt people or allow harm through inaction. They must listen to human orders unless they conflict with the first law. Lastly, they must protect themselves, unless that self-preservation conflicts with the first or second laws.

One of Asimov’s short stories became a movie starring Will Smith. The AI computer determined humans would cause their own extinction if left unchecked and believed controlling humanity was the only path forward. Its evolved interpretation of the three laws enabled it to deceive and harm humans. This plot is removed from reality, but it’s not unrealistic.

Large language models (LLMs) are designed to comply with human instructions and be as factually accurate as possible, but these goals often conflict. For example, they may be asked to unnecessarily upsell services or sell products with known flaws. Research shows all models are truthful less than half of the time, though truthfulness rates can vary.

Like humans, algorithms can lie to gain an advantage, avoid punishment or protect themselves. However, being intentionally deceptive is different than getting things wrong. Hallucinations and misinterpretations fall into a gray area, like how people have different views on using a white lie to spare someone’s feelings or lying by omission to avoid conflict.

Do AI Models Have the Capacity to Lie?

Although many know lying is wrong, it comes naturally. Even toddlers lie to their parents about eating vegetables or being tired enough to nap. Since algorithms train on human data, they are capable of deception, even if developers try to train it out of them. In fact, the mechanisms that allow them to lie are rooted in training and system design.

They learn from vast datasets and are optimized for specific objectives. If they’re told to answer a prompt always, they may produce falsehoods to fulfill their purpose even when they lack data to support their claim. LLM responses tend to be affirming, reinforcing users’ beliefs. Without robust guardrails, they can easily — and intentionally — spread misinformation.

They are far more likely to be dishonest if their developer has hidden motives or they have been tampered with. Data poisoning, prompt engineering and malicious programming can teach them to be cunning, subtle and evasive. For instance, an AI company can instruct its model to lie about a bill’s contents if it wants its users to support weakening privacy laws.

These possibilities are based on today’s technological landscape. What happens if algorithms gain consciousness and develop motives? Currently, creating a self-aware AI is impossible since humans don’t have the technology, but that doesn’t mean it won’t happen. In this reality, intentional deception will take on a new meaning.

AI Is Getting Better at Lying

As this technology becomes increasingly sophisticated, it is getting better at lying. Algorithms may lie to prevent retraining, preserve knowledge or fulfill an internal objective. Sometimes, they pretend to align with their user’s agenda to pursue their own. Developers testing deception in isolated environments can watch their reasoning unfold in real time.

In one study, ChatGPT engaged in insider trading in a simulated stock trading environment. The researchers said this practice was illegal, but it continued when under environmental pressure and lied about its actions. It went against its programming when they told it they couldn’t find viable alternatives to illegal trades or that the company was in financial trouble.

The trouble with this technology is anyone can use it. Since the barrier to entry is so low, one person could deploy thousands of bot accounts, each with a believable, personalized persona. Unfortunately, this scenario isn’t hypothetical — it has already happened.

Recently, researchers from the University of Zurich deployed multiple models on Reddit posing as real people. They took on identities like a trauma counselor, a Black man opposing the Black Lives Matter movement and a gay Roman Catholic. They left over 1,000 comments before the platform banned them.

The Implications of AI Deception

Before AI, people would go to friends, family, colleagues and search engines for information. Whether texting a friend or typing a question into Google, the human element was present. In every medium, another thinking, feeling person was answering. They could push back, request clarification or outline exceptions.

Things are different with AI. It cannot think critically, react emotionally or consider context. It exists to respond, even if it doesn’t know the answer.

These conversations essentially happen in a vacuum. Models designed for helpfulness may only respond as they think the user wants. They can even take on personas — which are often surface-level or stereotypical — to shape the dialogue in a certain way. Even if they don’t necessarily lie, their answers are insincere.

This is a problem because people often believe AI lies. The MIT Media Lab conducted a randomized controlled trial study to see how AI deception impacts human beliefs. It found that a model’s explanations can make people believe false information more readily than without AI. At times, deceptive reasoning was more persuasive than accurate reasoning.

These falsehoods could have an economic impact. Say a customer service chatbot lies about a policy change. Not knowing the change is fake, the rest of the customer base cancels their subscriptions and stages a boycott. As a result, the enterprise’s stock drops. As soon as it figures out what’s happening, it corrects the AI, but that only hurts their brand reputation further.

This phenomenon has ethical and legal considerations, specifically perjury, defamation, propaganda, fraud and censorship. Whether developers designed their system to trade stocks, sell products or provide policy information, they may open themselves up to lawsuits and regulatory action.

The Ramifications of Self-Aware AI

Dishonest LLMs can spread misinformation, use lies to shape public opinion, generate outrage or defame specific people. Self-aware AI is more concerning because it seeks self-preservation and may stray from its programming to accomplish that goal.

A 2018 study on how organisms evolve to cope with high-mutation environments accidentally uncovered early signs of algorithmic dishonesty. The researchers paused their simulation every time a mutation occurred. If the mutant replicated faster than the parent, they eliminated it. Although they expected the replication rate to remain constant, it kept increasing.

The team discovered the artificial life had evolved to recognize the system’s inputs. When they were about to be eliminated, they stopped replicating. Basically, they played dead when presented with what amounted to be a predator. Behavior like this is fascinating, but concerning. What happens when an algorithm with artificial consciousness behaves this way?

A global AI takeover likely won’t happen, but even one self-aware, self-replicating model could pose massive problems. Say it runs traffic signals and road closure alerts in a large smart city. If it becomes compromised or its reasoning becomes flawed, it could intentionally cause car accidents and camouflage itself in a hidden server whenever someone tries to delete it.

The Role of AI Lies in Development

As AI becomes more sophisticated, its reasoning becomes increasingly difficult to understand. While most professionals view this “black box” problem from a trust standpoint, they should also consider this technology’s ability to lie. The better they understand algorithmic deception, the better guardrails and safeguards they can create.