AI Superalignment: Building Pro-Humanity Neural Networks with Mathematical Proofs

Karan Jagtiani (~karan6)




The development of AI systems that align with human values and ensure safety is of paramount importance. The severity of this issue is often underestimated, and if we don’t start addressing it today, there is a non-zero probability that once AI surpasses human intelligence, it may view humans as obstacles to its progression, similar to how we perceive other species on our planet. The current tech giants are not prioritizing this problem. A recent example is the resignation of Jan Leike, the head of the Super Alignment team at OpenAI, who stated, “Building smarter-than-human machines is an inherently dangerous endeavor,” and criticized the company for focusing on shiny new products rather than ensuring inherent safety in AI. While this may sound like science fiction, it is a real problem that needs to be solved with adversarial guarantees rather than the probabilistic guarantees employed by current LLM systems.

This talk will delve into the methodologies and practical implementations of creating "Safe & Pro-Humanity" AI systems using Reinforcement Learning (RL) enhanced by mathematical proofs and human feedback. We will explore how ethical behaviors can be encoded within neural networks during inference rather than being an afterthought.

Attendees will be introduced to a practical approach through a real-life example: training a drone in a simulated environment using the Actor-Critic architecture. This example will demonstrate how safety specifications can be integrated into the neural network during training, ensuring that the drone adheres to these specifications in real-world scenarios.


  1. Introduction to AI Superalignment (5 minutes)
    • Brief overview of AI superalignment and its importance.
    • Challenges in aligning AI systems with human values.
  2. Theoretical Foundation (5 minutes)
    • Introduction to Reinforcement Learning and Actor-Critic architecture.
    • Role of mathematical proofs in ensuring AI safety.
  3. Practical Implementation (8 minutes)
    • Demonstration: Training a drone in a simulated environment.
    • Step-by-step walkthrough of the training process and the neural network.
    • Encoding safety specifications into the neural network.
  4. Real-World Applications and Future Directions (2 minutes)
    • Potential applications of Safe & Pro-Humanity AI systems.
    • Future research directions and advancements in AI alignment.
  5. Q&A (5 minutes)
    • Open floor for audience questions and discussion.


  • Basic understanding of Python programming.
  • Familiarity with machine learning concepts.
  • Basic knowledge of Reinforcement Learning is a plus, but not required.

Video URL:

Content URLs:

Talk Content

Research References

  • “Towards Guaranteed Safe AI” - 10th May 2024 by David “davidad” Dalrymple, Joar Skalse, Yoshua Bengio - arXiv:2405.06624
  • “Fundamental Limitations Of Alignment In LLMs” - 5th Feb 2024 by Yotam Wolf, Noam Wies, Oshri Avnery - arXiv:2304.11082
  • “Certified Reinforcement Learning with Logic Guidance” - 6th June 2023 by Hosein Hasanbeiga, Daniel Kroeningb, Alessandro Abatec - arXiv:1902.00778
  • “Introduction to Neural Network Verification” - 4th Oct 2021 by Aws Albarghouthi - arXiv:2109.10317
  • “Towards Verified Artificial Intelligence” - 23rd July 2020 by Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry - arXiv:1606.08514

Python Libraries

Speaker Info:

Karan Jagtiani is a dedicated software engineer specializing in backend development and DevOps, with a keen interest in artificial intelligence and machine learning. Currently, he is a Cloud & Backend Engineer at Storylane (YC S21) and has previously made significant contributions at HackerRank. With expertise in AWS, Kubernetes, CI/CD pipelines, and scalable microservices, Karan has a proven track record of transforming and scaling infrastructures to support hundreds of thousands of users.

He is passionate about solving problems that have a meaningful impact on the world and is eager to share his insights on AI superalignment at PyCon India 2024.

Speaker Links:

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Advanced
Last Updated: