New AI Model Can Simulate ‘Super Mario Bros.’ After Watching Game Footage
Last month, Google The GameNGen AI model shows that general image diffusion technique can be used to create a playable, acceptable version Doomsday. Now, researchers are using some of the same techniques with a model called MarioVGG to see if Artificial Intelligence can create reasonable videos of Super Mario Bros. to respond to user input.
The result of MarioVGG model-available as a preprint announced by crypto-related AI company Virtual Protocol—still shows a lot of obvious bugs and is too slow for anything close to real-time gaming. But the results show that even a limited model can infer some impressive physics and gameplay dynamics just by studying a bit of video and input data.
The researchers hope this is the first step towards “producing and demonstrating a reliable and controllable video game generator” or perhaps even “completely replacing game development and game engines with video generation models” in the future.
Watch 737,000 frames of Mario
To train their model, researchers MarioVGG (GitHub user Erniechew And Brian Lam listed as contributors) begins with a public dataset belong to Super Mario Bros. The game contains 280 “levels” of input data and images that are organized for machine learning purposes (level 1-1 was removed from the training data so that the images from it could be used in the evaluation process). The 737,000+ individual frames in that dataset were “preprocessed” into 35-frame blocks so that the model could begin to learn what the instantaneous results of different inputs generally look like.
To “simplify the gameplay scenario,” the researchers decided to focus on just two potential inputs in the dataset: “run right” and “run right and jump.” Even this limited set of movements presented some challenges for the machine learning system, since the preprocessor had to look back a few frames before the jump to determine whether and when a “run” had started. Any jumps that included mid-air adjustments (i.e., a “left” button) had to be discarded because “this would introduce noise into the training dataset,” the researchers wrote.
After preprocessing (and about 48 hours of training on an RTX 4090 graphics card), the researchers used a standard convolution And noise reduction the process of generating new video frames from a static starting game image and text input (or “running” or “jumping” in this limiting case). While these generated sequences only last a few frames, the last frame of one sequence can be used as the first frame of a new sequence, making it possible to generate game videos of any length that still show “coherent and consistent gameplay,” according to the researchers.
Super Mario 0.5
Even with all these settings, MarioVGG still doesn’t really produce smooth video that’s indistinguishable from real NES gameplay. For efficiency, the researchers downscaled the output frame rate from the NES’s native 256×240 resolution to 64×48. They also condensed the 35 frames of video into just seven generated frames distributed “at evenly spaced intervals,” resulting in a “game” video that looks much rougher than actual game output.
Despite these limitations, the MarioVGG model still struggles to approach real-time video generation, at this point. The single RTX 4090 the researchers used took six seconds to generate a six-frame video sequence, which is just over half a second of video, even at the extremely limited frame rate. The researchers admit this is “not realistic or friendly to interactive video games,” but hope that future optimizations to weight quantization (and perhaps more computing power) could improve this speed.
Still, with those limitations, MarioVGG is able to produce some pretty believable videos of Mario running and jumping from a static starting image, similar to Google’s Genie Game Maker. The model can even “learn the physics of the game entirely from the video frames in the training data, without any explicitly encoded hard rules,” the researchers write. This includes inferring behaviors like Mario falling when running off the edge of a cliff (with reliable gravity) and (usually) stopping Mario’s forward motion when he nears an obstacle, the researchers write.
While MarioVGG focuses on simulating Mario’s movements, the researchers found that the system can create the illusion of new obstacles for Mario as the video scrolls through an imaginary level. These obstacles “fit the graphical language of the game,” the researchers write, but currently cannot be influenced by user prompts (for example, placing a pit in front of Mario and making him jump over it).
Just do it
Like all probabilistic AI models, however, MarioVGG has the annoying tendency to occasionally produce completely useless results. Sometimes that means simply ignoring user input prompts (“we observed that the input action text was not always respected,” the researchers write). Other times, it means illusion Obvious visual glitches: Mario sometimes falls into obstacles, runs through obstacles and enemies, flashes different colors, shrinks/grows from frame to frame, or disappears completely for multiple frames before reappearing.
One particularly absurd video shared by the researchers shows Mario falling across a bridge, becoming Cheep-Cheep, then flying back across the bridges and turning into Mario again. It’s the kind of thing we’d expect to see. from a magical flower, not the original AI video Super Mario Bros.
The researchers suggest that longer training on “more diverse game data” could help address these key issues and help their model simulate more than just the inevitable running and jumping to the right. Still, MarioVGG is an interesting proof of concept that even with limited training data and algorithms, it’s possible to create some good starting models for basic games.
This story originally appeared on Ars Engineering.