Game On: Smart AI Agents with Reinforcement Learning
To understand AI better, I believe one of the best way is to Practice it through Metaphysical way of Thinking and Knowledge. If we Examine the below quote through the lens of Metaphysics, it tells us about the Nature of Knowledge and Understanding.
“I hear and I forget. I see and I remember. I do and I understand.” — Confucius (Kong Qiu)
This quote in one of its form (to most of us) convey — Real Understanding comes from Doing things Yourself. Just Hearing about Something or Seeing it isn’t Enough. You need to Experience it and be Actively involved to Truly Know and Understand it. In other form, this will enforce Walking the Talk vs Talking the Talk :)
By apply this quote to AI, I interpret it as highlighting the Stages of a typical Machine Learning and Understanding. Within which, Hearing could represent a Machine receiving Raw Data Inputs. Seeing might mean the Machine Processing and Analyzing this data to Recognize Patterns. Doing signifies the Machine actively Applying this Knowledge to perform Tasks or make Decisions.
To understand the intricacies, I decided to delve into the world of Reinforcement Learning through DOOM Video Game.
Why Video Games? because, Video Games have always Fascinated me from my Childhood to the Present day and I still miss Playing my old cartridge based Games. Through Video Games, we can seamlessly apply the Concepts / Principles of Reinforcement Learnings (RL) to build a Smart AI Agent that not only play Video Game but also Demonstrate Intelligence to outsmart Human level gameplay.
DOOM is a first-person shooter game made by id Software which is now owned by Microsoft :) It was released for DOS in the year 1993, and is the first game in the DOOM series.
id Software >> acquired by ZeniMax Media >> acquired by Microsoft for Xbox Game Studios :)
Tracing Reinforcement Learning (RL) History
Reinforcement Learning (RL) has been applied in various fields for quite some time and remains a topic of Discussion among Researchers worldwide in their respective Areas of Expertise. It is one of the most widely Studied area of Machine Learning (ML) and we will have a glimpse of it.
Reinforcement Learning (RL) has a very long history, in the 1960s the terms “reinforcement” and “reinforcement learning” were used in the Field of Machine Learning (ML) and Engineering Literature for the first time.
Richard S. Sutton is considered to be one of the founder of modern Computational Reinforcement Learning (RL).
As we all know, Machine learning (ML) is categorized into several types / areas, each with Distinct Approaches and Applications. To simplify, in the below image are mentioned three popular types, Supervised Learning, Unsupervised Learning and Reinforcement Learning.
Potential of Reinforcement Learning (RL): What Can it Achieve?
RL holds Phenomenal Potential across various Fields by enabling Machines to Learn from Experience, much like how we Humans Learn from Trial and Error.
Few Illustrations for better understanding
In Technology, for e.g. RL powers Self-driving Cars mastering Complex road Scenarios, Robotics Autonomously performing Complex, Precise and Systematic tasks, and modern Gaming achieving Advanced AI capabilities.
In Business, RL can be very well leveraged for Next Best Action (NBA) which is commonly used in various Industries to enhance Customer Interactions, Optimize Processes, and Improve Outcomes.
Next Best Action (NBA), which refers to a Decision-making Strategy used to Determine the most Effective Action or Recommendation to take Next, based on Current Context and Objectives.
Defining Reinforcement Learning
“Reinforcement Learning (RL) is an area of Machine Learning (ML) techniques where an Agent learns to make Decisions by Interacting with an Environment. The Agent Learns through Trial and Error, receiving Rewards based on its Actions. The goal of the Agent is to Maximize Cumulative Rewards over time.”
At its core, RL involves an Agent that Learns to achieve a Goal by Interacting with its Environment. The Agent makes Decisions, takes Actions, and receives Feedback in the form of Rewards or Penalties. Over time, the Agent aims to Maximize its total Reward.
Main Components of Reinforcement Learning
Agent: The Decision-making Entity that Interacts with the Environment. It Learns and Adapts its behavior based on Feedback Received. The Agent uses a Policy as its Brain (need to be built / created), which is a Strategy that defines the Actions it takes in various States. In practical terms, this could be a Software System or Robot programmed to perform Specific Tasks. In our case it will be DOOM Agent.
Environment: The External System with which the Agent Interacts. It includes everything that influences and is influenced by the Agent. In our case, it’s the DOOM video game, which provides the Context and Challenges the Agent must navigate.
Action: The set of possible Moves or Decisions the Agent can make. Actions are performed by the Agent and directly impact the State of the Environment. For DOOM video game, these Actions could include Moving (Left, Right, Forward), Shooting, or Jumping.
State: A representation of the Environment at a specific point in time. It captures the Current situation and Context the Agent is in, providing the necessary Information for Decision-making. In DOOM video game, this might include the Agent’s position, Visible enemies, and other Game dynamics.
Reward: The Feedback signal from the Environment in Response to the Agent’s actions. Rewards can be Positive or Negative and serve as Guidance for the Agent. Positive rewards reinforce Desirable behaviors, while negative rewards Discourage undesirable ones. For instance, In DOOM video game — Scoring points or Defeating enemies will yield Positive rewards, while Taking damage or a bullet hit could result in Negative rewards.
What makes Reinforcement Learning so Powerful (and Magical)?
In the last 10–12 years, research teams from Google DeepMind and OpenAI have Significantly Advanced the State of The Art in Reinforcement Learning, each Contributing unique Perspectives and Innovations to the field.
Needless to say this — their Collective Efforts continue to drive Progress and Adoption of RL techniques in diverse Applications across Industries.
I will summarize it in the following way —
(Google) DeepMind
- “Playing Atari with Deep Reinforcement Learning” (2013)
- “Model-Based Reinforcement Learning for Atari” (2019)
- “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm” (2017)
- * “Gemma: Open Models Based on Gemini Research and Technology” (2024)
OpenAI
- “Proximal Policy Optimization Algorithms” (2017)
- “Solving Rubik’s Cube with a Robot Hand” (2019)
- “Dota 2 with Large Scale Deep Reinforcement Learning” (2019)
- * “Training language models to follow instructions with human feedback” (2022)
*One key thing to note through these papers — both Reinforcement Learning and Reinforcement Learning from Human Feedback (RLHF) are actively Researched fields with overlapping Goals of enhancing Agent Learning abilities.
However, they differ in their Approaches and Priorities by establishing different Procedure, Methodologies and Emphasize Interactions with the Environment and Human Instructors, respectively.
Exploring Reinforcement Learning with a DOOM Video Game
Setting Up the Environment
To create an RL agent capable of playing DOOM, began by setting up the Environment using the Gymnasium. Gymnasium provides a Comprehensive and Standardized interface for RL experiments, allowing easy Integration of various Environments.
The DOOM environment was integrated through ViZDoom, facilitating Interaction with the game’s Visual and Sensory inputs. This integration allowed the DOOM RL Agent to perceive the games graphics and gather essential Information about its Surroundings, including Objects, Enemies, and Obstacles.
Choosing an Algorithm
The Proximal Policy Optimization (PPO) algorithm can be selected for Training the Agent. PPO is a popular and robust policy gradient method known for its Robustness and Efficiency. It strikes a balance between Performance and Stability, making it well-suited for Complex Environments like DOOM. PPO’s advantage lies in its ability to handle Large State and Action Spaces, which are typical in DOOM scenarios.
There are several Algorithms that are commonly used to build RL based Agents. Refer to their categorization
PPO (Proximal Policy Optimization) for RL was introduced by OpenAI researchers. The primary authors credited with inventing PPO are John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Their work on PPO was detailed in the paper titled “Proximal Policy Optimization Algorithms” published in 2017 as highlighted in the above section (Refer: What makes Reinforcement Learning so Powerful (and Magical)?)
Illustrating our AI Agent Interactions with the Environment
Here is our DOOM RL Agent performance before and after Training:
Before the Training
Before training, the DOOM agent’s Performance in the Setup (for basic.cfg) was Poor. We can see, the Agent often got stuck and could not effectively kill the enemy quickly (Monster in this case), means the Agent is spending more time and consuming a lot of Ammo / Bullets. This will result in Low Scores. Likewise, Agent actions is Random and Un-optimized, lacking any Strategy or Ability to learn from Past Experiences.
After the Training
After training, the Doom RL agent in the Setup (for basic.cfg) showed Significant Improvement. The Agent effectively Navigates the Environment and Accurately engages the enemy (Monster in this case). By learning from the Environment and Rewards, it Developed optimized Strategies, achieving Higher Scores (by killing the Monster) in a short span of time. The agent’s Actions became more Purposeful and Adaptive, reflecting its enhanced Decision-making Capabilities.
Training the Reinforcement Learning Agent
Training an RL Agent involved Several critical Components but not limited to:
- Reward Structure: Rewards need to be designed to Encourage the Agent to engage in Desirable Behaviors, such as Eliminating enemies and Completing objectives, while Penalizing harmful actions like Taking damage. This structured Reward system will guide the Agent’s learning process.
- Exploration and Exploitation: A balance between Exploration (trying New Actions) and Exploitation (using known Successful Actions) was maintained to Optimize Learning. Techniques such as epsilon-greedy strategies or entropy regularization can be employed to manage this balance effectively.
- Training Parameters: Key hyperparameters, including Learning rate, Batch size, and Discount factor, can be tuned to Enhance the Training Process. These parameters Influenced the Efficiency and Stability of Learning.
- Training Episodes: The Agent need to be Trained across Tens of Thousands of episodes (for simple environment), Millions of episodes (for highly complex environments), each providing new Experiences and Feedback to refine its Policy iteratively. Simply running Hundreds of episodes will not be Sufficient to make an RL Agent Smart.
Closing Thoughts
Reinforcement Learning (RL) represents a significant Advancement in Artificial Intelligence, demonstrating its Potential to solve Complex Problems across Various Domains. By Simulating Decision-making Processes and Learning from Interactions with Dynamic Environments, RL offers a powerful Framework for developing Intelligent Systems. This Technology holds promise for revolutionizing Numerous Industries, and driving Innovation in Technology.
Furthermore, Video Gaming with its highly Interactive and Complex environments, serves as an effective Training Ground for RL agents. The Skills and Strategies learned in these Virtual Worlds can translate to Real-world Applications, enabling RL-based Solutions to tackle Practical Challenges and Maximize Artificial Intelligence potential. As RL continues to Evolve, its Integration into diverse Tech and Business contexts is opening door to new Opportunities, driving Efficiency and Innovation across Multiple Sectors.