Latest research

[15]

20 February 2026
Transforming Prompt to Worlds
Our prompting pipeline rewrites user input into structured signals the world model actually understands: seed images, synthetic video, and aligned controls. It started as a safety filter. It became the other half of the product.
13 February 2026
Diffusion Tokenizers: Diffusion Decoders Upgraded!
GANs are tough. Diffusion is simple. Diffusion VAEs are the next step in controllable HD reconstructions.
30 January 2026
The Immersion Gap
In a race to maximize visual fidelity, the fun factor of world models has suffered. We've arrived in an era where a full second of latency is considered playable, 4fps is considered real-time, and a rack of $50,000 GPUs is considered accessible. We want to fix this—and Project Genie reminds us why it matters.
20 January 2026
The Path to Real-Time Worlds and Why It Matters
Today we're releasing Waypoint-1, the first real time diffusion world model optimized for consumer GPUs.
2 January 2026
Scaling Up Data Collection: Lessons from OWL Control and OWL Tube
We needed a bunch of annotated game data, we were in a hurry, and we made it happen as fast as we possibly could. It was tricky, we broke a lot of things, and it worked. Here's what we did, what worked and what we broke.
13 September 2025
Optimizing World Model Inference Speed with Quantization
At its heart, the post tackles a critical bottleneck in large-scale transformer-based generative models: the KV cache. During inference, this cache stores the context from previous steps, but it can grow very large, consuming memory and hitting bandwidth limits as data is repeatedly read by the GPU. In this blog post, we detail how we address this via quantization and other optimization techniques.
1 August 2025
Why Human Evaluation is the Missing Piece in World Model Development
We're releasing OWL Eval, the first open-source evaluation platform built specifically for studying how humans perceive AI-generated videos. After running studies with hundreds of participants, we've learned that human evaluation reveals critical model failures that automated metrics completely miss. Our platform makes it dead simple to run these studies at scale.
27 July 2025
Optimizing Diffusion Decoders with Depth Pruning (Using ODE Regression)
We show applying ODE regression to drastically reduce the depth of our diffusion decoder, leading to a 40x speedup!
12 July 2025
Product of Experts for Visual Generation - An Illustrated Example
In this blog post, we illustrate a paper that leverages multiple specialist models and incorporating their individual expertise by having them influence the diffusion sampling at inference time. We also provide code examples, visualizations, and intuitions!
2 July 2025
Fast Audio Video World Models: Part 2
We trained an autoencoder with depth maps in the latent. It resulted in far better depth consistency in downstream generations. Next we’re training with optical flow as well, and solving the KV cache problem
21 June 2025
Generation vs. Reconstruction: Striking A Balance
The generation vs reconstruction trade-off gets weird when you push compression. Learn more about how we're managing it in this blog post!
13 June 2025
Fast Audio Video World Models: Attempt 1
7 June 2025
Autoencoders for Diffusion: A Deep Dive
Join us as we try to figure out how to make a good custom autoencoder for our World Model.
30 May 2025
Towards an Even Larger Video Game Dataset: Inverse Dynamics Models For Bootstrapping Unlabelled Data
This week we set our sights on taming unlabeled internet data for World Model training.
23 May 2025
Towards a Large Open Video Game Dataset
Today we are marking the start of our journey towards a general purpose open source video game world model.

Latest research

Transforming Prompt to Worlds

Diffusion Tokenizers: Diffusion Decoders Upgraded!

The Immersion Gap

The Path to Real-Time Worlds and Why It Matters

Scaling Up Data Collection: Lessons from OWL Control and OWL Tube

Optimizing World Model Inference Speed with Quantization

Why Human Evaluation is the Missing Piece in World Model Development

Optimizing Diffusion Decoders with Depth Pruning (Using ODE Regression)

Product of Experts for Visual Generation - An Illustrated Example

Fast Audio Video World Models: Part 2

Generation vs. Reconstruction: Striking A Balance

Fast Audio Video World Models: Attempt 1

Autoencoders for Diffusion: A Deep Dive

Towards an Even Larger Video Game Dataset: Inverse Dynamics Models For Bootstrapping Unlabelled Data

Towards a Large Open Video Game Dataset