Google DeepMind has opened access to Project Genie, an experimental tool that transforms simple text descriptions or images into explorable virtual environments. Starting this week, the prototype became available to Google AI Ultra subscribers in the United States who are 18 years or older, marking a significant expansion beyond the limited testing phase that began last summer.
Project Genie operates as a web application powered by three advanced AI systems working together: Genie 3, Nano Banana Pro, and Gemini. The platform enables users to describe environments ranging from natural landscapes like forests and deserts to fantastical settings, then navigate through them as if exploring a video game world. Unlike static images or traditional video generation, the system creates interactive spaces that respond in real time as users move through them.
How the World-Building Process Works
The experience centers on three main capabilities that guide users from concept to exploration. World Sketching allows users to input text prompts describing both the environment and a character, then preview the generated image before entering the virtual space. Users can specify movement options such as walking, flying, or driving, and choose between first-person, third-person, or isometric camera perspectives.
World Exploration transforms these static images into navigable environments. As users move their characters through the space, Genie 3 generates new portions of the world ahead in real time, simulating physics and object interactions. The system maintains consistency by remembering what it has already generated, allowing users to return to previously explored areas and find them largely unchanged.
World Remixing enables users to modify existing creations by building on top of their original prompts. The platform includes a gallery of curated worlds and a randomizer feature for inspiration. Once finished, users can download videos of their explorations to share or save.
Technology Behind the Experience
Genie 3 represents what Google calls a general-purpose world model, a type of AI system that creates internal representations of environments and predicts how they evolve based on user actions. DeepMind CEO Demis Hassabis described it as the world’s most advanced world model, capable of simulating diverse real-world scenarios for applications ranging from robotics training to animation and historical recreations.
The auto-regressive architecture means the model generates each frame based on previous frames, requiring dedicated computing power for each user session. This technical requirement directly influences the current limitations, as each session demands exclusive access to processing resources.
Current Limitations and Capabilities
Google has been transparent about the prototype’s experimental nature and existing constraints. Generated worlds are currently limited to 60 seconds of exploration time, with output capped at 720 pixels resolution and 20 to 24 frames per second. The company implemented this time restriction partly due to compute budget constraints and partly because extended sessions offered diminishing returns for testing purposes at this stage.
The system performs best with artistic prompts like claymation, watercolor, anime, or cartoon aesthetics. However, photorealistic and cinematic worlds often fall short, appearing more like video games than actual real-world settings. When users upload real photographs as starting points, the model sometimes rearranges elements rather than faithfully recreating the original layout.
Character control can experience latency issues, with keyboard navigation sometimes becoming unresponsive or sending characters in unintended directions. Physical interactions remain imperfect, with characters occasionally walking through walls or solid objects. Generated worlds may not always adhere closely to the original prompts or real-world physics.
Safety Measures and Future Development
Safety guardrails prevent the generation of inappropriate content, including nudity and copyrighted material. This comes after Disney issued a cease-and-desist letter to Google in December, accusing the company of training AI models on Disney characters and intellectual property without authorization.
Google plans to enhance realism and improve interaction capabilities based on user feedback. Future updates may include promptable events that change the world during exploration, extended generation times beyond 60 seconds, and more responsive character controls. The development team views world models as crucial stepping stones toward artificial general intelligence, with nearer-term applications in video games, entertainment, and training robots in simulated environments.
Subscription Requirements and Market Context
Access requires a Google AI Ultra subscription, which costs 250 dollars per month. The company plans to expand availability to additional territories beyond the United States in the future.
The release positions Google competitively in the emerging world model space. World Labs, founded by AI researcher Fei-Fei Li, released its first commercial product called Marble late last year. AI video-generation startup Runway has also launched a world model, and AMI Labs, started by former Meta chief scientist Yann LeCun, focuses on developing similar technology.
Potential applications extend beyond entertainment. DeepMind researchers envision uses in quickly prototyping video game concepts, visualizing scenes for filmmaking, and bringing educational ideas to life. One example highlighted was allowing students to experience different professions virtually, such as disaster recovery work, making these simulations accessible without specialized training requirements.
