0% found this document useful (0 votes)
270 views31 pages

AI-Driven 3D Content Generation

Methods to get from a prompt to a 3d Geometry

Uploaded by

Gerd Schwaderer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
270 views31 pages

AI-Driven 3D Content Generation

Methods to get from a prompt to a 3d Geometry

Uploaded by

Gerd Schwaderer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

3D from prompt or image, using AI for 3D content generation

AI is rapidly evolving from text, sound, imagery and video to 3D


This white paper delves into the recent advancements in the field of 3D object
generation by using prompts, a discipline that sits at the intersection of AI-driven
language models and computer graphics.

With the increasing accessibility of technologies like Midjourney and DALL-E, the
leap from text-to-image to text-to-3D has opened new horizons for creators and
engineers alike.

Let’s explore the nuances of this transition, providing insights into both
commercial platforms and open-source endeavors that are shaping the future of
digital creation and what it may bring to our industrial tables soon.

If you cannot wait, just go to LumaAI’s Genie which was released 10th of Jan
Author's Note
As an enthusiastic participant in the world of technology, my exploration into 3D
object generation is driven by a deep-seated curiosity and a background rooted
in the historic industry. This not so white paper is crafted for technology
aficionados and professionals who are navigating the rapidly evolving landscape
of AI and 3D modeling. I spend about 100 hours and created hundreds of
By Midjourney
models….. so your feedback is invaluable – should there be any discrepancies or
omissions, please reach out and I will add or change content. Gerd Schwaderer

By Gerd Schwaderer | LinkedIn


Table of contents, prompt or image to 3D

Solutions summarized
Hands on evaluation of individual solutions
• Available Solutions Prompt to 3D
• Available Solutions Image to 3D
• 3D objects by altering a category object
DISCORD introduction
Huggingface introduction
Executive summary
By Midjourney

By Gerd Schwaderer | LinkedIn


Solutions: 3D from prompt and/ or image
No judgement, this is strictly alphabetical

Available Solutions Prompt to 3D Available Solutions Image to 3D


• Alpha3D • Kaedim
• CSM (Common Sense Machines) • StabilitAI
• Dreamfusion • Voxcraft
• Lucid Dreamer
• Luma AI
• Masterpiece Studio
• Meshy
• Point-E
• Sudo
• Tripo
• MeshGPT

Classic CAD modelling based on 2D image by GS & Spaceclaim

By Gerd Schwaderer | LinkedIn


3D objects by altering a category object
This is a method to quickly derive variants of existing objects, which often gets named although being different…

Category to 3D
• 3DFY
• Sloyd

These are often referred to as 'prompt or image to 3D', but


it seems the authors may not have actually tried these
solutions
They are very good if you need sofas, chairs or sneakers
If not, back to the other ones…

Set of chairs by Midjourney

By Gerd Schwaderer | LinkedIn


Links to all solutions, alphabetical order
• 3DFY
• Alpha3D
• CSM (Common Sense Machines)
• Dreamfusion
• Kaedim
• Lucid Dreamer
• Luma AI
• Masterpiece Studio
• MeshGPT
• Meshy
• Point-E by OpenAI
• Sloyd
• Sudo
• StabilitAI
• Tripo
• Voxcraft

By Gerd Schwaderer | LinkedIn


Prompt to 3D
By Gerd Schwaderer | LinkedIn
Alpha3D – prompt to 3D & category to 3D
Summary Imagery
Alpha3D is a true prompt to 3D solution, also offering an image to
3D-Object based on categories for sneakers and sofas (alpha).

The calculation time is very fast, just a few seconds.

The results for text to 3D are very…. coarse, but seem 3Dprintable.

I can somehow recognize my electric sheep and the ghetto blaster,


but it will need some work to get this more usable.

The category based 3D model of the sneaker actually worked pretty


well, taken in consideration it just got one image.

There are 50 free models to create, making it worth a try

The resulting mesh directly opens in Blender and any tool that
handles *.glb files (even powerpoint). As Blender is free and can
convert it to STL meshes, you are all set.

By Gerd Schwaderer | LinkedIn


CSM (1) , image, text and Video to 3D

Summary Imagery
CSM (Common Sense Machines) is also into enabeling the
path from almost anything to 3D, with the unique
capabilityto even mix it. More of that on the next page.
After uploading an image, the system tries to come up with
4 views of the object and then creates a 3D object from it.
The first version is coarse, but it can be refined. While the
coarse one is done in minutes, the refined one takes hours.
I tried with the Volkswagen beetle image here, the refined
result was quite reasonable as you can see. As the system
has to guess whats on the back side, it will hardly be
perfect.
The Video to 3D option and additional features like faster
calculation time are only available with the 'maker' plan or
higher

By Gerd Schwaderer | LinkedIn


CSM (2), mixing image, sketch and text to 3D

Summary Imagery
Now thats a little different and worth a separate page.
„Real time sketch to 3D“, even including a start image!
With their Cube App you can mix an image with a sketch and a
prompt, which influences the result, thus generating a more
constrained version of the result.
Here is an upload of a crazy shoe with flames, and the text input
„Hovercraft“. Some scribbling influences the 3D output in kind of
unpredictable ways, but the result is amazing!
It resembles pretty much to a spaceship by the famous artist Chris
Voss.
Ist by far the most amazing piece of 3D art I have seen yet, being
totally unpredictable though. But hey, isn‘t that creativity?
The low quality takes a few minutes, the refined one takes hours.
Worth the wait, worth testing!

By Gerd Schwaderer | LinkedIn


Dreamfusion by Google – prompt to 3D using 2D diffusion
Summary Imagery
Dreamfusion was announced end of 2022 and never got a site
to play with, leaving it to the more savvy programmer to install
it from GITHUB. But it waas one of the first approaches to create
3D from 2D. In this case, it creates 3D from 2D even without a
large 3D data depository such as OBJAVERSE.

Instead, the approach creates many images from different


directions of an object, using synthetic image data from Googles
IMAGEM technology (something like Dall-E or Midjourney).
Then using Nvidia‘s instant NeRF (neural radiance fields)
technology (somewhat like ultra fast photogrammetry, but
based on the density of light radiating off the object) is being
used to create a 3D pointcloud from the artificial images.

Interested in giving it a try? Here is a Dreamfusion installation


tutorial , Dreamfusion installation video

By Gerd Schwaderer | LinkedIn


Lucid Dreamer, a prompt based 3D builder on huggingface
Summary Imagery
• The paper "LucidDreamer: Towards High-Fidelity Text-to-
3D Generation via Interval Score Matching" explores a
new method for generating detailed 3D models from text
descriptions.
• The technique, called Interval Score Matching (ISM),
addresses issues found in previous methods which often
resulted in overly smooth and less detailed models.
• ISM uses a process that retains details and texture
quality, leading to more realistic and high-quality 3D
models.
• The research shows that this method not only improves
the fidelity of the generated models but also does so
efficiently, promising advancements in the field of 3D
content creation from text prompts.
• Full paper
• Github page
• Huggingface test environment

By Gerd Schwaderer | LinkedIn


Luma AI, a true prompt to 3D solution
Geomagic
Wrap
integrity
analyses
LumaAI actually offers much more than just prompt to 3D, working
similarly to Midjourney and running on Discord, like MESHY, 3DFY,
SPLINE, and PIKA
Please check out the slide about DISCORD is that’s sth. new to you.
It seems that their main endeavor is even in photogrammetry style
creation of 3D from film, which in the end is nothing but many
images. Their cutting-edge AI technology, capable of transforming GLB files
both text prompts and live video feeds into realistic 3D exported
representations, appears to be highly advanced. to 3D
Builder
Take the basketball player, create in T-pose for rigging. It’s an
amazing quick start to continue with.
LumaAI Genie export interface LumaAI in discord
The Boombox is perfectly printable with just a few spikes (analyzed
in Geomagic Wrap). Yet – no sharp corners or clever polygon
distribution.
The company’s /genie feature stands out, working exactly like
Midjourney’s /imagine, based on a specialized language model for
fast 3D model creation from simple text input.
There are options to redo and refine the version you prefer most
Output is a pretty good mesh file in various formats (fbx, obj, glb,
usdz, blend, stl with 50.000 polygons through the LumaAI Genie app
By Gerd Schwaderer | LinkedIn
Luma AI, breaking news 11. Jan 2024

LumaAI reveals Genie on browser and IOS

No need for discord anymore


High Res in just a couple of minutes (depends though)
3DPrint ready, I tested it!
Web: https://lumalabs.ai/genie
iOS: https://www.lumalabs.ai/iOS

By Gerd Schwaderer | LinkedIn


Masterpiece Studio (1), a prompt based 3D builder
Summary Imagery
Masterpiece Studio (or Masterpiece X) is a quite versatile 3D
builder with the possibility to individualize the output. Based in
Canada and cooperating with Nvidia, it is a text to 3D solution
that only needs to know the category.
You can furthermore edit the results in VR on MetaQUEST
environment using Masterpiece X. This is a truly unique method
for modeling an object in 3D, using only the Quest controllers to
manipulate parts. Watch it!
Geomagic
Categories are objects, animals and humans. Then you specify Wrap
integrity
with up to 5 words your object further, as a 3rd step adding the analyses
style. It takes only 2-8 minutes.
Output is an almost printable 30k mesh as a *.glb file, with
texture looking really good. The riggable cowboy in the corner is
a very good example of the potential use cases.
No install needed, enough free credits to start playing with it.

By Gerd Schwaderer | LinkedIn


MasterpieceX (2), a prompt based 3D builder – the VR editor
Summary Imagery
I wanted to add a second page here because it is hard to imagine
what the VR based editing allows the user to do. Take the asset, put
on your VR glasses and start manipulating using the constrollers.

So here just 2 screen dumps from the MasterpieceX-Metaquest


video I suggest you to watch in addition.

It allows rigging, texturing and modelling

By Gerd Schwaderer | LinkedIn


Meshy, an image & text to 3D solution on discord

Summary Imagery
Meshy’s originates from Silicon Valley. Their partners are from across
entertainment, gaming, AR/VR, and rendering engines.
As a few other great solutions, it runs currently on discord.
There is a full portfolio with image and text to 3D, and text to texture. Just select
the corresponding channel after reading the instructions.
It‘s always a good idea to look at showcases (for any solution), so you can see
whats possible and what prompts were being used.
/create for textto3d or /img3d for images kicks off the command.
Look at that spaceship, sheperd or the rocket engine - they are printable or easy
enough to fix (checked with industrial software), the results are impressive!

The image to 3D still has some trouble with guessing the backsides, I used a self-
drawn portrait from my grandfather and downloaded a few community
samples. It is of course good enough to go from sketch to 3D, import in to
Blender and then have a reference to work with. Not to directly use it.
Personally I think it is currently one of the best along with LumaAI.
There is enough free tokens to start with, then you need to purchase a plan.

By Gerd Schwaderer | LinkedIn


Point-E by OpenAI, a prompt based 3D builder
Summary Imagery
OpenAI, that name rings a bell to all of us. The Point-E solution was
released already at the end of 2022 and is only available on hugging
face. More about hugging face on the next slide.
However, the output, referred to as 'point-e' where 'e' denotes
efficiency, is merely a point cloud. Now us engineers know how
tricky it can be to get a closed model from a pointcloud. Those are
easier to calculate but harder to process, the key limitation here.
So Point-E actually consists of a text to image model, and an image
to pointcloud model. After training the models on a dataset of
“several million” 3D objects and associated metadata, Point-E can
produce colored point clouds that usually fit to the prompt, as the
OpenAI researchers say. While it's not perfect, it's already
considered 'ancient' in this fast-evolving field
To fully test it, you need to do some coding off the mentioned github
site. The hugging face test environment only delivers very coarse
results.
Paper

By Gerd Schwaderer | LinkedIn


Sudo AI (One-2-3-45++), image and text to 3D
Summary Imagery
The successor of One-2-3-45. Got out of the huggingface
environment in its own user interface.
This sample for image to 3D is a football player, from a synthetic
picture. The 3D result is right from its dimensions, just
geometrically not too detailed. Still very good for post
processing.
My first trial with a flying turtle -prompt to 3D- showed some
natural problems. While it looks perfect from the view direction,
the flappers were attached wrong. But the mesh is good. The
comic style train is a little squashed but nice to look at, I guess
the view created beforehand would better be in the axes and
not isometric. Which funny enough, it doesn’t do itself. It even
creates multiple contents which you need to crop manually.
So I think it also uses the text as an intermediate step to create
several views, where it tries to derive a 3d model from.
There is a few free trials, so be careful not wasting them.
Over all very promising.

By Gerd Schwaderer | LinkedIn


One-2-3-45, image to 3D project on huggingface
Summary Imagery
Predecessor of Sudo (One-2-3-45++)
Single image to 3D object functionality in 45 seconds
Just drop an image and wait a little
Make sure the image contains an isolated, single object
To create an isolated image in Midjourney for example start
with /imagine <prompt> isolated on white back ground.
Or use a real image and get rid of the back ground,
remove.bg does a great job
I used an articficial image from Lionel Messi here, which is
kind of tricky because its not exactly a natural lighting
Full paper
Github page
Huggingface test environment

By Gerd Schwaderer | LinkedIn


Tripo AI, prompt and image to 3D
Summary Imagery
Tripo AI is another possibility to create ready-to-use 3D
models based on text or images.
It doesn't function instantly, but the speed is reasonably fast
The text based 3D generation also produces nicely textured
models with just a few mesh errors as you see with the ski
boot and the amazing jacket here.
I was really fascinated by the robot I sent in with a synthetic
image and the model I got in 3D. It really makes sense and
includes the same 5 correctly positioned legs as seen in the
image.
Interestingly, I asked the image generator for 6 legs, but
that’s a different story.
So well done, Tripo, this is even with the other leaders of
the pack.

By Gerd Schwaderer | LinkedIn


MeshGPT, a research paper worth mentioning
Summary Imagery
MeshGPT is a project from the Technical University of Munich, the Politecnico di
Torino and Audi.
I wanted to mention MeshGPT, although I cannot test it, as the results are,
unlike the ones in this not so white paper, clean and clever meshes with true
edges and an intelligent subdivision. If so, that would be a fantastic direction to
go. Those assets would be much more lightweight and engineering like than all
other ones.
According to MeshGPT it creates triangle meshes by autoregressively sampling
from a transformer model that has been trained to produce tokens from a
learned geometric vocabulary. These tokens can then be decoded into the faces
of a triangle mesh. That method, so they say, generates clean, coherent, and
compact meshes, characterized by sharp edges and high fidelity.
In simply words: In simple words: MeshGPT can more effectively determine
where fewer blocks of information create better geometry by analyzing
numerous examples. That learning is quite similar to learn a language with its
grammar for shapes.
All of that, I have to admit, I cannot test or confirm.
github
Paper

By Gerd Schwaderer | LinkedIn


Image to 3D
By Gerd Schwaderer | LinkedIn
Kaedim 3D , an image to 3D solution

Summary Imagery
Kaedim sais on discord: we are part of NVIDIA's Inception Program for AI
startups, a CDL AI Stream alumni, and an EPIC Megagrant receiver.

But there is little traffic on the site.

Kaedim seems to be a powerful image to 3D tools when looking at some


of the samples from discord, if I could only get it to work.

The plans are kind of expensive, and the Indie plan doesn‘t let me, for
some reason, work with the displayed VW beetle which is nicely placed
on a white back ground.

Kaedim added recently an option to go from text to 3D, but I think,


reading it through, that it will first create a 2D image and then go into the
regular workflow. Shame I couldn‘t try better for you.

These ones are from discord and look amazing!

Therefore, it might be powerful but I coudn‘t test it.

By Gerd Schwaderer | LinkedIn


StabilityAI, 3D generation from images

Summary Imagery
Can’t test it, but:

Stability AI is an open source generative AI company that focusses on


open-access AI models with minimal resource requirements to create
imaging, language, code and audio. They started in 2019, have
millions of users that created hundreds of million assets already.

Stable Zero123 is the successor of various other versions and


generates novel views of an object in order to then derive a 3D
object, using the model trained on the objaverse datasets.

Its necessary to do some installation, check out their huggingface and


github sites.

Check out the project page: Project Page

Having said that – there is OTHER solutions based on this model that
can be tested, which will be on the next slides

By Gerd Schwaderer | LinkedIn


VoxCraft, image & category to 3D, text to texture on discord

Summary Imagery
Voxcraft supplies a variety of solutions as mentioned in the header.
Prompt to 3D
Just input: /text3D_alpha CATEGORY TEXTURE
only allows 4 categories right now, airplane, chair, car and table
Image to 3D
/img3D PROMPT IMAGE VERSION lets you create a model from image, which
takes up to 30 minutes though.
The prompt to texture, which adds texture to an existing model, is hosted on
another channel. Link is in the „getting started“ chat. The results look pretty
good, but it was not my focus to try that.
Once subscribed to the discord bot (see the discord page what this is about),
follow the instructions for the possible and necessary prompts and have a go.
It is free for a few models a day which is nice.
The text to 3d is quite limited and coarse right now (see the race car which was
supposed to be a beetle car). Takes a few minutes.
The text to 3D takes about half an hour.

By Gerd Schwaderer | LinkedIn


Category to 3D
By Gerd Schwaderer | LinkedIn
3DFY – parametric content creation by category

Summary Imagery
3DFY is more of a parametric content creation tool.

Instead of allowing an arbitrary text input, it is focussed on creating


variants of existing objects.

Runs in the cloud, no install needed

This can be very handy as long as your category exists.

So if you don‘t need sofas, wait for the next version.

If you require a solid wooden table with a Swiss edge, it can actually
create one that is exportable as a *.glb, *.fbx, and Blender file.

Not free, for downloads you need to purchase a plan

By Gerd Schwaderer | LinkedIn


Sloyd – parametric content creation
Summary Imagery
Sloyd is a 3D model database with extensive options to
parametrically alter the look and feel of a model, but not text or
image to 3D. Although it feels much like a parametric CAD system, of
course the output is a meshed object.

There is a larger base of categories than in 3DFY, and the modelling


options are partially CAD style, partially almost like with voxels.

You can bend, stretch and alter the model many ways, add and leave
away accessories (like the exhaust, roof, etc.)

If you would create a squashed tractor like me, *.glb, and *.obj is
exportable.

Just free for 1 model, for more downloads you need to purchase a
plan.

The resulting mesh is not directly printable, but an assembly of


meshed objects

By Gerd Schwaderer | LinkedIn


DISCORD – hosting Midjourney, LumaAI, MESHY and others

Discord is a communication platform originally designed for gamers


that allow users to communicate via voice calls, video calls, text
messaging, media, and files in private chats or as part of
communities
It has since expanded its reach to a broader audience, including
communities interested in a variety of topics beyond gaming.
It is also very easy to access, extremely scalable, and allows real-time
interaction with AI tools.
Discord supports the use of “Bots” that can automate tasks, handle
commands and interact with users, which is crucial for services like
Midjourney and LumaAI that operate based on user inputs.

You can run Discord easily in the browser, no need to install anything.
LumaAI doesn’t need a subscription (yet).

By Gerd Schwaderer | LinkedIn


Huggingface
Many KI solutions are hosted here
Hugging Face is an open-source platform where the machine
learning community collaborates on models, datasets, and
applications.
The community has a focus on Natural Language Processing (NLP)
and hosts a wide array of NLP models and tools that developers can
utilize to tackle various language processing tasks.

Solutions you can try on Hugging Face:


Point-E
One 2-3-45
Lucid Dreamer, project page
StableZero123

By Midjourney

By Gerd Schwaderer | LinkedIn


Executive Summary

What we observe is a rapidly evolving technology that enables the creation of 3D


content through text prompts or a single image. The emphasis is on “content” though.

While Mr. Zuckerberg will find it extremely valuable to be able to create assets for the
Metaverse in seconds, most solutions so far do not point directly into the engineering
world.

Not yet.

The result geometry is textured meshes, based on point clouds or voxel models derived
using new KI based technology in various ways.

But what’s next? This is only the result of more or less 1 year, and the spike of papers
only in the last 3 months is incredible. Especially the option to mix prompt, sketch and
image is impressive and shows that it is only the beginning.

Will there be more engineering quality output soon? No problem to mill on polygons or
3Dprint them.

Can constraints be included, like true edges, constraints, intelligent polygon structure?
MeshGPT looks promising.

We might take a completely different route and automatically reverse-engineer those


meshes with constraints, using an LLM fed with CAD, to create manufacturable models
with AI. What about a LLM that supports GD&T tolerancing? GS & Midjourney
By Dall-E
Let me know, post me a line and have a discussion, the author [Gerd Schwaderer]

By Gerd Schwaderer | LinkedIn

You might also like