Game-Changer: Google’s Embodied AI Journey Through Virtual World as Sandboxes

Hemant Juyal
8 min readApr 4, 2024

--

In the quest for Artificial General Intelligence (AGI), Google is leveraging the Immersive world of Video Games to advance the development of Embodied AI.

This creative approach harnesses Video Games settings that offer Complex, Realistic, Diverse and Controlled Environments which are ideal for Experimenting and Training AI models by generating very large amount of data with Reproducibility, Scalability, Immediate feedback mechanism, Simulating real-world like scenarios to challenge AI models to Adapt and Learn various Skills.

“Embodied AI is a type of Artificial Intelligence that Interacts with and Understands its Environment through Physical Embodiment, similar to how Humans and Animals do. This means AI systems are not just Processing Information from Sensors, but they’re also Physically present in the World, enabling them to Perceive, Learn, and Act in Real-world Situations.” — A Wise Technologist

What is SIMA?

As per Google DeepMind — The Scalable, Instructable, Multiworld Agent (SIMA) project aims to build a system that can follow arbitrary language instructions to act in any virtual 3D environment via keyboard-and-mouse actions — from custom-built research environments to a broad range of commercial video games.

Let me put the above in a very simple —

SIMA project aim to develop an Instructable Embodied agent that can Accomplish anything a Human can do in any Simulated 3D Environment / Virtual World.

Through this project, Google is Dissecting and Examining the limits of AI to tackle the fundamental challenges in the cutting-edge AGI research. The goal is to apply Advanced Language Models capabilities into Environments where an AI agent can Interact with surroundings Physically or Virtually. This involves integrating LLM (Large Language Models) Understanding and Generation with Real-world Interactions in advancing fields like Human-Robot interaction and Virtual Assistants.

So what’s the rationale behind using the simulated 3D Virtual World through Video Games. Let’s find the answers through the following questions.

Are Video Games Complex Systems?

As a an avid Gamer, I can say — Undoubtedly, Video Games are one of the best demonstration of bringing Creativity in navigating Complexity.

If we see holistically, Video Games provides large scale Virtual Worlds filled with Diverse Environments, Interactive Elements, and Non-Player Characters (NPCs) showing lifelike behaviors. From vast Open-World landscapes to Highly Complicated urban settings, Video Games present miscellaneous / manifold Challenges and Opportunities for a Player to explore.

The richness of Gaming Environments lies not only in their Visual Fidelity but also in their Dynamic Nature, with Weather Patterns, Day-night Cycles, and Emergent Events shaping the Player’s Experience

Likewise, Interactions within Video Games are Multifaceted, encompassing a spectrum of actions such as Movement, Navigation, Object Manipulation, Social Interaction through Verbal and Non-Verbal Communication and Decision-making.

Game Players are often faced with Open-ended Scenarios requiring Adaptability and Creativity to overcome Obstacles and Achieve Objectives. Whether traversing / exploring terrains, engaging in Strategic Combat, or Solving puzzles, the Gameplay Complexity of Video Games mirrors the intricacies of Real-world tasks.

Why Video Games Are Ideal Training Grounds?

Video games offer a sheer variety of challenges that encompass a wide Spectrum of Cognitive Skills and Abilities. At their core, they require players to navigate Complex Virtual Environments, mastering Spatial Awareness and sharpening Skills in Spatial Navigation. Spatial Navigation in Video Games demands keen Attention to Detail and the Ability to Adapt to Changing Surroundings.

Many Video Games incorporate elements of Resource Management, requiring players to Carefully Allocate and Utilize available Resources to Achieve their Objectives. This aspect adds a layer of Strategic Depth, as players must Weigh their Decisions and Prioritize Tasks based on Available Resources.

Additionally, Video Games often feature Social Interaction mechanics, where players must Communicate and Collaborate with others, either through Direct Communication or by Observing and Interpreting Social cues like Body Language, Facial Expressions, Tone of Voice. This aspect develops Skills in Empathy, Cooperation, and Understanding Social Dynamics, all of which are extremely Important for effective Social Interaction.

Furthermore, Strategic Planning is a key Component of many Video Games, as players must work towards a plan for Long-term Strategies and Adapt their Tactics in Real-time to Overcome Obstacles and Achieve their Goals. Strategic planning in Video Games Challenges players to Think Critically, Anticipate Consequences, and make Decisions under Pressure.

Key Takeaways

Overall, the Diversity of Challenges presented in Video Games makes them an Ideal Training Ground for Developing a wide range of Cognitive Skills and Abilities, preparing AI agents for the Complexities of Real-world Environments.

By exposing agents to various Tasks and Environments, Video Games enable the acquisition of versatile Knowledge and Skills that can be applied across Diverse Domains.

Through techniques like Reinforcement Learning and Imitation Learning, AI agents can continuously Enhance their Abilities, steadily Enhancing their Performance over time. This iterative Process mirrors the gradual Learning approach similar to Human Cognition Development, where an individuals gain Skills through repeated Practice and exposure to Diverse Experiences.

Using Embodied Environments To Train SIMA agent

SIMA project has utilised over ten 3D Environments including both Commercial Video Games and Specialized Research Environments. Out of these SIMA agent is trained across all Environments except for Hydroneer and Wobbly Life (covered below), which has been used for qualitative Zero-shot Evaluation.

These 3D Environments offer broad Diversity of Worlds, rich Visual Fidelity, distinct Challenges and Interactions that extend beyond the Skillset of typical Embodied Research Environments, allowing agents to Develop a wide range of Skills.

Commercial Video Games selected for this purpose

Goat Simulator 3

Wreak havoc as a Goat with exaggerated Physics :-)

Hydroneer

Build and manage a Mining Operation to turn a Profit.

No Man’s Sky

Explore a Procedurally-generated galaxy filled with Diverse Planets.

Satisfactory

Construct a Space Elevator on an alien planet by building Complex Production Chains.

Teardown

Plan and execute High-risk heists in a Destructible voxel world.

Valheim

Survive and Thrive in a Norse mythology-inspired world.

Wobbly Life

Complete various Jobs and unlock Secrets in an Open-world sandbox.

Furthermore, SIMA agent training through Research Environments is drawn on the several prior Research Work like Playhouse, ProcTHOR, WorldLab and developing a new Custom Environment like Construction Lab.

This allows researchers to focus on specific Skills, Tasks, and Scenarios relevant to their study Objectives.

Initial Evaluation Results of SIMA agent

Covering some of the initial Evaluation of the SIMA agent that demonstrate its Performance, Capabilities and Areas for Improvement across dimensions like Environment, Skill Categories, Comparison with Human Performance are as follows.

Environment Success Rate

Across various Environments, the SIMA agent demonstrates notable Success Rates in completing Tasks. While Performance varies depending on the Complexity of the Environment, the agent shows commendable Proficiency in Navigating and Interacting within both Research and Commercial Video Game environments (stated above).

Figure below shows the agents Success Rates which vary by Environment.

To highlight, simpler Research environments yield higher Success Rates compared to Commercial Video Game settings as they have high Degrees of varying Complexity.

Skill Category Success Rate

Evaluation across different Skill Categories ( tasks grouped into color clusters) reveals Varied Degree of Performance of the SIMA agent. While some Skills are executed Reliably, others pose Challenges, particularly those requiring Precise Actions or Spatial Understanding.

To simplify, tasks that seem easy at first can actually involve Complex Actions in the Video Game for example, tasks like “look”, “walk” as these interaction varies based on the Game mechanics and Environment. Tasks like “combat”, “use tools”, or “build” are becoming more harder for the agent because they require a Deeper Understanding and Precise Actions

Human Comparison

SIMA agent has been evaluated against expert Human Performance on No Man’s Sky Video Game. No Man’s Sky includes a large amount of Visual Diversity, which poses significant Challenges for agent Perception, rich Interactions and Skills.

How Complex No Man’s Sky Video Game ?
No Man’s Sky Video Game has the largest Map Size for any Video Game. The Game revolves around Five Core Elements: Exploration, Survival, Combat, Trading, and Base Building. Players have the opportunity to Interact with a vast Procedurally Generated Open-world Universe, comprising more than 18 Quintillion Planets. It would take a Player around Five Billion Years to visit each Planet at the hypothetical rate of One Per Second :-)

Despite these Challenging Evaluations, the SIMA agent achieved 34% success. The Human Players has achieved only 60% success on this evaluation.

Closing Thoughts

While there are numerous Challenges on the path to achieving AGI, the SIMA project represents a Significant Step forward in the quest to develop Embodied AI.

By harnessing the Complexity of Video Games and leveraging Advanced Simulation techniques, Google DeepMind is pushing the boundaries of AI Research and enabling the way for a future where Embodied Intelligent Agents seamlessly Interact with the World around them. The fusion of Virtual Environments and AI technology holds Immense Promise for unlocking new possibilities.

--

--

Hemant Juyal
Hemant Juyal

Written by Hemant Juyal

I am a Technologist with a passion for innovation, I always love staying on top of the most recent advancements and developments in the field of technology.