Many digital artists, architects, engineers, and game developers today rely on 3D models. However, creating these digital objects is often a time-consuming, involved process. New artificial intelligence (AI) models may provide a solution.
AI-generated art has gained a lot of notoriety lately, though mostly in the form of 2D images. Now, several companies have announced machine learning software that can go a step further, turning reference text or pictures into 3D designs.
Generative AI Today
In September 2022, Google unveiled a text-to-3D model called DreamFusion. This algorithm builds on a previous one called Dream Fields, released in 2021, in which researchers trained on a library of 3D models with text labels. DreamFusion, however, doesn’t need existing 3D models to understand your requests, making it far more practical.
Two months later, graphics card giant Nvidia released a similar model. Their software, dubbed Magic3D, is almost identical from an outside perspective. You type in a description of the 3D model you want, and the algorithm will render one. However, Nvidia’s solution claims to be twice as fast.
The third major 3D generative AI you’ll find today comes from OpenAI, the makers of ChatGPT and Dall-E. This model, Point-E, also creates 3D renderings from the text but can do so in as little as one to two minutes on a single GPU.
“Point-E creates 3D renderings from a text in as little as one to two minutes on a single GPU.”
How 3D Generative Models Work
While all three big 3D model-generating AI solutions today have unique advantages and specific approaches, they follow the same general process. Here’s a closer look at how these algorithms work.
Training the AI on References
Early approaches to this kind of AI, like Dream Fields, trained them on 3D models and their text labels. However, this doesn’t leave them with much training data, limiting their scope. That’s why newer models learn to generate 3D models from labeled 2D images instead.
Today’s 3D model-generating AI starts as text-to-image algorithms. Consequently, the first stage in training one is feeding it labeled 2D images, like a picture of a dog with the accompanying text “dog.” This data is far more accessible, with ImageNet alone hosting more than 14 million labeled images, so it’s a better way to train the AI.
Before long, you should have a model that can associate 2D images with text descriptions fairly accurately. You can then move on to teaching it to turn those into 3D renderings.
“3D model-generating AI starts as text-to-image algorithms.”
The next step in generating 3D models with AI is an interpolation. This is the process of combining multiple 2D images of the same subject from different angles to produce a 3D version.
The underlying technology that enables this process is a neural radiance field (NeRF). NeRFs are neural networks that look at multiple views of an object and determine where each viewing angle exists in space. They can then piece them together, smoothing out the areas where different views overlap to produce a cohesive 3D model.
Traditionally, NeRFs work using photos of an object from multiple angles. In a text-to-3D model, however, they generate their own 2D images from various angles before combining them. As you might expect, this is a remarkably complex process, but recent advances have made it much faster.
Optimizing 3D Models
The product you’ll get from one pass through one of these NeRFs will likely be low-resolution and may contain errors. Consequently, it’s important to clean up and optimize any 3D models that come out after the interpolation process.
Some AI solutions today, like Google’s DreamFusion, will pass the rendering through several interpolation processes to remove noise and improve resolution. Nvidia’s Magic3D uses a second diffusion model that reduces noise and refines it according to the original 2D to raise its resolution.
Even after this optimization, you may have to clean up the models. That’s why these solutions present them as an adjustable file that you can edit to change their resolution, shape, color, lighting, and other factors.
Limitations and Possibilities
Just as home automation systems make security more convenient and accessible, automating 3D image generation can streamline many workflows. Artists could develop games or create digital scenes much faster when it comes to movies, as they wouldn’t spend as much time on model creation. Construction timelines could also shorten as architects generate 3D blueprints in less time.
However, these algorithms still carry some concerns. AI-generated art as a whole has come under fire because some artists’ work has appeared in training datasets without their permission, opening the door to copyright and ethical complications. Others fear that these tools may threaten employment and payment for human artists.
As AI art grows, the companies that build and use it will have to consider these complications. With a thoughtful, human-centric approach, though, these models could be revolutionary tools to help artists work, not replace them.
“Automating 3D image generation can streamline many workflows.”
Artificial Intelligence Could Revolutionize 3D Rendering
AI moved from generating 2D images to rendering 3D models in a relatively short period. This step forward opens the door to an impressive range of possibilities as long as data scientists and end users approach the technology carefully.
While still in its early stages, AI 3D model generation could revolutionize digital art and design. Industries from architecture to filmmaking could become more efficient as a result.