Video is unavailable for watching
Show in Telegram
World Models on Pre-trained Visual Features enable Zero-shot Planning
A method that allows a robot to learn a “world model” during training and then uses that model at test time (when it’s actually operating) to figure out what actions to take—all without any extra teaching or rewards for the new task (zero-shot planning). The work is a step toward creating agents that are not limited to one specific task. Instead, they can adapt to a variety of tasks using the same underlying model of the world.
It uses a image recognition model, that has already been trained on a huge number of images. This is an example of how machine learning research from different directions can be combined to create more powerful systems (another example is combining language models with reinforcement learning to create inference models).
Instead of working with raw images (which are huge and complex), the method works in a “latent space”—a simplified, abstract version of the world.
Project site: https://dino-wm.github.io/
A method that allows a robot to learn a “world model” during training and then uses that model at test time (when it’s actually operating) to figure out what actions to take—all without any extra teaching or rewards for the new task (zero-shot planning). The work is a step toward creating agents that are not limited to one specific task. Instead, they can adapt to a variety of tasks using the same underlying model of the world.
It uses a image recognition model, that has already been trained on a huge number of images. This is an example of how machine learning research from different directions can be combined to create more powerful systems (another example is combining language models with reinforcement learning to create inference models).
Instead of working with raw images (which are huge and complex), the method works in a “latent space”—a simplified, abstract version of the world.
Project site: https://dino-wm.github.io/