Over the past year, as I’ve been spending more time speaking to builders, policymakers, and communities about AI, one thing has become increasingly clear: AI safety isn’t fringe anymore.
How many of you have seen this research from Marc Zao-Sanders? “Therapy / companionship” ranks as the top Gen AI use case in 2025, with “organising my life”, “finding purpose”, “enhanced learning”, and “generating code” occupying the other Top 5 spots.
This technology is no longer confined to research labs or experimental demos. As we embed more AI in our lives and it advances at a rapid pace, the question is no longer just what we can build, but how we ensure these systems are developed in ways that genuinely benefit society.
AI safety has gained attention not because of catastrophic risks alone (”What if AI takes over the world!”) but because failures now risk becoming systemic.
Today, AI is embedded in
When systems like these fail, the impact isn’t limited to edge cases. It spreads at scale.
This is why AI safety industry labs and regulators are now actively grappling with how to govern and deploy these systems responsibly. Singapore, for example, introduced a new Model AI Governance Framework for Agentic AI to provide guidance on technical and non-technical measures for responsible deployment.
AI systems behave differently from traditional software. They’re probabilistic, which is why their failure modes are harder to predict. Their behaviour could also change when deployed in new contexts. If safety considerations are deferred until after deployment, debugging becomes expensive, complex, and its failures risk damaging reputations.
Incorporating safety thinking early, such as robust evaluation and appropriate human oversight, helps teams build systems that are more reliable and resilient as they scale. AI safety, in this sense, is not about slowing innovation, but about making innovation sustainable in the long-term.
You don’t need to build AI to be affected by it.
AI systems increasingly shape the information we see, the services we can access, and the decisions made about us. And here’s the kicker - these decisions usually happen in ways that are invisible or difficult to challenge. When these systems are poorly designed or insufficiently tested, consequences could include:
If you’re new to the field, these three concepts form the backbone of many AI safety discussions (I’ll dive deeper in future articles).
1) Alignment / Specification Failures: When an AI system optimises for a poorly defined or incomplete objective, leading it to behave in ways that satisfies its technical goals but conflict with human intent or values.
2) Robustness: How well a system performs under real-world conditions, including edge cases, ambiguity, and misuse, not just in controlled testing environments.
3) Interpretability: The ability for humans to understand or meaningfully inspect how an AI system produces its outputs.
As I’ve been learning more about AI safety, a few beginner-friendly resources stood out:
- Paper: Key Concepts in AI Safety by Georgetown’s CSET, which provides a layman overview of key concepts.
- Book: The Alignment Problem by Brian Christian, an accessible introduction to how AI systems can behave in unintended ways, and why aligning them with human values is such a challenge.
- Podcast: For Humanity: An AI Risk Podcast by John Sherman, featuring thoughtful conversations on threats posed by AGI and key questions in the AI Safety space.
- Course: BlueDot Impact’s AI Alignment course. A structured, non-technical introduction to modern AI safety thinking, designed for people from diverse backgrounds. For technical folks, you could also check out its Technical AI Safety course.