paint-brush
AI Safety is Moving Faster Than You Thinkby@smwii
137 reads

AI Safety is Moving Faster Than You Think

by Stephen M. Walker IIOctober 9th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Don't believe the AI doom hype. Today's models aren't sentient and have no understanding of the real world. They simply recognize patterns in data.
featured image - AI Safety is Moving Faster Than You Think
Stephen M. Walker II HackerNoon profile picture

Legacy media often paints AI as an imminent threat.



But the reality is that researchers understand more about AI alignment than ever before. Companies like Anthropic and OpenAI actively steer their models to be helpful, harmless, and honest.


Cashing in on AI Doom

Don't believe the AI doom hype. Today's models aren't sentient and have no understanding of the real world. They simply recognize patterns in data. But unlike static generators, the patterns they learn can be shaped over time through techniques like reinforcement learning from human feedback (RLHF).

How Models Learn

Large language models like GPT-3 and Claude don't actually understand language. They recognize patterns based on statistical correlations in their training data. Their knowledge about the world is limited to what's contained in their training corpora.



These models excel at pattern recognition but lack true reasoning skills. They have no consciousness or intent. Their outputs are based on mimicking correlations in the data, not any kind of comprehension.

Shaping Model Behavior

Companies use techniques like reinforcement learning from human feedback (RLHF) to shape model outputs. In RLHF, humans provide feedback on model responses, training new reward functions that steer future generations.



For example, a model might generate an insensitive or harmful response. Researchers would mark this as low-reward behavior. The model then learns to avoid similar responses in the future.

Over many iterations, models learn to generate helpful, harmless, and honest language. The goal is to bring models in line with cultural values and user expectations.

Ongoing Monitoring


Companies don't just set models free into the wild after training them. There are ongoing monitoring processes to identify alignment challenges as they emerge. Microsoft offers a suite of tools for Azure customers to filter input and outputs by category and severity.



For example, Anthropic uses a technique called Constitutional AI to continuously audit model releases for signs of undesirable behavior. If any issues are found, researchers diagnose the problem and take corrective steps like further fine-tuning.

The Path Ahead

The rapid adoption of human-in-the-loop training and new institutions focused on AI safety show the industry is taking ethical risks seriously, even as capabilities surge ahead.


There's still more work to do.


But don't buy the hype that AI is rampaging out of control. With transparency and enabling more voices to participate, we can steer models in ways that realize benefits while mitigating risks.