a movie monster made of vines and plastic vivid colors and 8K 3D
a sorcerer
an octopus
poster art of a lion made of cardboard and vines
poster art of a man
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
EVERY PYTHON DEVELOPER OUT THERE! ALL OF YOU! PLEASE ADD EXACT VERSION NUMBERS TO YOUR REQUIREMENTS.TXT FILES!
Did I convince you? Are you adding in version numbers right now? If not, read on.
It has been a while since I was annoyed enough to add a new blog post in the annoyances catagory, but this is it.
Python Packages
Python. A constant love/hate relationship with me. Love for the most part as it allows me to add many awesome new machine learning systems to Visions of Chaos. The hate (for now) comes from version hell with packages. This is more a developer issue (all of us) and not a problem with Python itself.
When most developers create a new Python program/script/system they usually provide a requirements.txt file listing the Python packages their code needs to run. Packages are like extra libraries of code that give the script more functionality. They allow the Python code to have more commands and they make it easier for devs to code. These packages are installed with the Python pip command.
Here are a few lines from a typical requirements.txt file
The first line specifies an exact version number. That means that gradio 3.33.1 will be installed. This is good.
The next 3 lines do not specify any version numbers. This is bad. By default when a version nuimber is not specified the latest available version is used. If the same script and requirements are being used soon after the script is released then this is probably not an issue as the developer most likely used the same current versions. The problem arises as more time elapses between the release date and the user install date. numpy here is a good example. numpy has deprecated (made obsolete and unsuported) many commands and syntax over the versions. If pip installs the latest version every time, the chances are that a new version is going to break existing code. When this happens the poor end user (or me) has to go and try and work out which library broke and how to fix it (if possible).
The last line specifies a >= version number. This is just as bad as no version. This also shows an issue I had only recently (one of the reasons I wrote this blog post). Pillow now has a v10 release that breaks some of the v9.5 code. If the author had specified 9.5.0 as an exact version then there would be no problems. Pillow could advance to v136.56.3 and it would not matter as the script in question would still know to install v9.5.0.
Python Virtual Environments
When I first started adding machine learning systems to Visions of Chaos I quickly encountered version hell. Firstly if you are going to add a lot of different Python scripts you are going to run into version conflicts. Some scripts need v1 of a certain package, some need v2. To get around this you can use Python environments. Environments keep a certain set of packages and versions isolated from others. When you want to run a certain script, you activate its environment first so you know you have the right packages. Within the environments in Visions of Chaos I always specify exact version numbers for the packages. Life was good, back to work on more interesting things. No such luck.
The Real Problem
Now we really get to the annoyance. Unless EVERY developer out there specifies exact version numbers in their required packages lists any updates could cause version hell.
Visions of Chaos supports using the GPU for calculations. Without the GPU support these machine learning systems run orders of magnitude slower on the CPU. Because of different versions of pytorch with GPU support, most devs do not include GPU supported pytorch with their requirements. Makes sense to avoid lots of “it doesn’t install for me” complaints. I know what specific version of pytorch Visions of Chaos uses so what I do is, at the end of any environment setup, I will uninstall any existing CPU pytorch versions and install the GPU version I know works.
This all worked smoothly until a week ago. I had reports a lot of the modes in Visions of Chaos were not working. Scripts that ran happily for months would fail when new users installed them (great first impression for a new user to find a lot of the features do not work). Time to test. I reset the environments in question and ran the scripts. Sure enough the same errors.
What happened this time was Python package requirements without version numbers being updated outside my control.
Firstly it was pytorch. My environment setups usually end with this
That gets rid of any auto-installed CPU pytorch and installs a versioned GPU version.
BUT, when pytorch installs it also installs a bunch of its own pre-requisite packages. And when it does this it does NOT specify version numbers. So even though every line and package I install has exact versions, a dependancy from pytorch does not and that causes my scripts to fail. pytorch updated to the lastest typing_extensions package that caused script errors.
Same thing happened with the latest Pillow v10 release around the same time. Changes to v10 caused problems with v9.5.0.
Both of those issues could be fixed by adding these next lines to the end of the environment setups.
But that is only a temp fix. Any day now a non-versioned requirement of a package I install could cause this same madness all over again. I have spent 5 days now tediously reinstalling environments and debugging and fixing code, all because someone somewhere did not spend 2 minutes to put version numbers into their requirements.txt. A quick update months later, this still causes frequent problems and bug reports that I need to add tweaks to my environments to remove incompatible packages and force an older package to fix an issue in some other package somewhere out there.
To get an idea of how often this issue causes me problems, open the Visions of Chaos revision history. Every entry there that starts with “Fixed install issue” was a perfectly working script that failed due to an unversioned Python package update.
The Fix?
If it was up to me I would change pip to enforce that a version must be specified. No version, pip errors out with “You didn’t specify a version you bozo! Don’t you know how much of a hassle this can cause!” If each package specifies a version it installs fine.
All a dev has to do is before uploading their working new script is run a quick pip list command to show the packages and versions. Then they just copy those versions into their requirments.txt file. If every dev did this (and Python forced them to) this version hell would be fixed (maybe not a 100% fix, but much better than what we have now). Maybe an enforced law of version numbers is needed?
Maybe this post can also help explain to users who just see Visions of Chaos “not work” why it happens and why it is outside my control.
One of the recent additions I added to Visions of Chaos was AnimateDiff. Git repository is here if you want to see the code or more info. AnimateDiff generates short 2 second movies at 8 frames per second, 16 frames total. This is due to the model being trained on a bunch of movies that were only 2 seconds long. The generation process takes around 1m13s per movie on a 4090 and uses around 15.3 GB of GPU VRAM.
You can see some cherry picked results here. Not every result you try will be that good. I include a batch button in Visions of Chaos so you can run the same prompt over multiple random seeds to generate a bunch of outputs on the current prompt. That way you can come back after a while and look for the best result.
The first question people ask (or at least everyone who tried it in Visions of Chaos did) was “how do I make longer and larger sized videos”?
AnimateDiff Prompt Travel
In comes AnimateDiff Prompt Travel. The dev worked out how to merge the shorter 2 second clips into longer movies and it handles larger resolutions too. For the simplest usage you give a list of frame numbers and text prompts and the script does the rest. This script takes around 12.8 GB VRAM when running.
The settings for both of those movies are included with Visions of Chaos so you can create them on your own PC and tweak the prompts to anything else.
Art Games
On the Softology Discord there is the art-games channel. The purpose of the channel is for users to take another user’s Text-to-Image output, change up to two words and post the new image. This continues on and slowly evolves new subjects of the images. I took a bunch of these prompts and ran them through AnimateDiff Prompt Travel. This is the result.
These are the prompts that change every 4 seconds of movie time.
Laughing watching collapsing scifi insanity
Laughing dog watching collapsing house scifi insanity
Laughing astronaut dog watching collapsing house planet scifi insanity
Laughing astronaut above collapsing planet scifi insanity
tranquil astronaut above futuristic planet scifi insanity
tranquil dragon above futuristic planet steampunk insanity
Tranquil dragon above futuristic planet
xenomorph butterfly above futuristic city
garden airships above futuristic city
Garden spheres above futuristic city
psychedelic_spheres_above_futuristic_sky
glass spheres above a sky
glass sphere containing a galaxy
glass spheres containing a galaxy
glass spheres containing a cute creature
glass cage containing a cute spider
glass nicholas cage containing a cute spider monkey
glass wonderland landscape containing a cute spider monkey
glass wonderland landscape containing a cute spider astronaut
wonderland landscape containing a cute alien robot
Wonderland landscape containing a cute quokka
Wonderland landscape burning a cute scarecrow
large crow burning a cute scarecrow
large crow burning a cute scarecrow on halloween
large dragon burning a cute castle on halloween
Large dragon burning a Spring castle on grass
Large dragon eating a Spring roll on grass
pixar dragon eating a Spring roll on cgsociety
pixar mouse eating a Spring salad on cgsociety
beksinski mouse eating a decaying salad on cgsociety
disney mouse eating a decaying franchise on cgsociety
disney mouse driving a decaying car on cgsociety
humanoid mouse driving a cyberpunk car on cgsociety
humanoid tree driving a cyberpunk car on mars
Humanoid tree driving a green car on asphalt
bonsai tree in a green car on asphalt
rainbow tree in a green pot on asphalt
rainbow unicorn melting in a green pot on asphalt
rainbow unicorn marshmallow melting in a green pot on asphalt instagram
rainbow robot, marshmallow smoking in a green pot on instagram
Handsome robot, marshmallow bouncing in a green pot on instagram
Rainbow Robot wearing a jingasa smoking a green pot, Artwork, Golden Hour, High Contrast, 3D, Feng Shui, volumetric Light, Iridescent, Brushed Aluminum
Handsome marshmallow robot bouncing in a green suit on a piano
Bionic marshmallow robot in a green suit stomping on a piano
Bionic elephant robot in a green suit stomping on a bridge
Bionic elephant robot in a green tuxedo dancing on a bridge
bulbuous elephant male in a green tuxedo dancing on a bridge
bulbous knight male in a green armor dancing on a bridge
bulbous knight male in green armor dancing on a tank
warcraft knight male in green armor driving on a tank
warcraft knight male in green armor driving on a tesla
warcraft knight male in green armor driving on a tesla
warcraft pig male in shiny armor driving on a tesla
warcraft pig male in shiny armor flying on a dragon
warcraft pig male in shiny armor fighting a dragon
warcraft pig female in shiny armor fighting a big dragon
warcraft pig male in shiny armor fighting a dragon
lego pig female in shiny armor fighting a big pumpkin
lego cat female in black armor fighting a big pumpkin
weird cat female in black armor inside a big pumpkin
steampunk cat female in black armor inside a big warehouse
steampunk cat female in black armor inside a rich bank
fat cat aristocrat in black armor inside a rich bank
fat cat aristocrat in black armor inside a rich bank
fat cat aristocrat wombat in matte black armor inside a rich bank
fat cat aristocrat wombat in matte black suit inside a robbed bank
fat cat aristocrat wombat in matte black suit driving a robbed Cadillac
fat cat shooting rat in matte black suit driving a robbed Cadillac
fat cat shooting rat in matte black suit driving a futuristic Cadillac
fat cat rat in matte black space suit driving a futuristic Cadillac
fat cat in matte black space suit driving a futuristic Cadillac in disney
fat cat in matte black space suit driving a futuristic Cadillac book in disney
fat cat in matte black space suit driving a futuristic book in disney France
fat cat in matte black space suit driving a book in renaissance france
cat in matte black hat driving a helicopter in renaissance france
cat in matte black hat robe driving cleaning a spaceship in renaissance Mars
cat in matte black robe watering a garden in renaissance Mars
cat in matte black robe watering a garden in Arizona desert renaissance Mars
cat in matte black robe eating a cactus in Arizona desert
cat in fluffy robe eating a cactus in Arizona desert
zombie in fluffy robe eating a brain in Arizona desert
zombie in fluffy robe eating a donut in Chicago desert
zombie in fluffy bikini eating a donut in chicago street
Zombie in fluffy city eating donut in street
Zombie in city eating pizza in street
zombie in city partying in street
zombie in Rome partying in museum
Zombie in Rome studying in party
shark in Rome feasting in party
dapper shark in Rome feasting in situ
dapper koala in Sydney feasting in situ
dapper axolotl in Chinatown feasting in situ
satanic axolotl in Chinatown meditating in situ
satanic sheep in cafe meditating in situ
satanic sheep in cafe meditating in space
giant blob in cafe meditating in space
giant viking in a cafe meditating in space
scary clowns in a cafe meditating in space
hairy clowns in a car meditating in space
hairy starfish in a fishbowl meditating in space
lairy starfish in a fishbowl cogitating in space
alien starfish in a helmet cogitating in space
alien creature in a helmet cogitating in space moebius
alien monk in a kasaya cogitating in space moebius
alien monk in a kasaya cogitating in labyrinth esher
dumfounded alien child in a kasaya cogitating in labyrinth esher
alien creature in a helmet cogitating in space moebius
dumbfounded alien child in a kayak cogitating in labyrinth escher
dumbfounded alien puppet in a kayak conflagrating in labyrinth escher
dumbfounded alien puppet in a kayak conflagrating in labyrinth escher, damien hirst
disturbing alien puppet in a bubble conflagrating in labyrinth escher, damien hirst
Cute alien puppet in a bubble conflagrating in labyrinth escher, damien hirst
Tutorials
The Future of AI Movies
AI movie creation is advancing quickly like image generation did before it. It won’t be long before everyone can generate their own movies at home with finer control of the imagery produced.
A quick post showing some steps to get NeRF going in Visions of Chaos to help first time users.
Step 1 – Training
1. Create a new empty directory for your trained data eg D:\Nerf Test\
2. Create a directory under that called images eg D:\Nerf Test\images\
3. If you have a series of images you know will work for training, put them under images. Otherwise, you can copy the images from C:\Users\YourUserName\AppData\Roaming\Visions of Chaos\Examples\MachineLearning\Instant Neural Graphics Primitives\data\nerf\fox\images\.
4. Start Visions of Chaos and select Mode->Machine Learning->Mesh Generation->Instant Neural Graphics Primitives
5. Set the source to be D:\Nerf Test and click Train.
6. Wait for the training to finish. For the fox images on a 3090 it took around 3 minutes.
Step 2 – Viewing
With the Source location still pointing to D:\Nerf Test you can now click View to start the viewer GUI.
If you used the fox images you will see the point cloud of the trained data like the following. Middle mouse button click and drag to slide the model around. Left click and drag to rotate.
Step 3 – Creating a Movie
Lastly you can now create a movie of a virtual camera moving around the 3D point object.
1. Let the points accumulate enough to see a reasonable image that is not too noisy.
2. Scroll down in the settings dialog and expand Snapshot.
3. Click Save.
Now to make the camera path. By default the path dialog is hidden behind the main dialog, so click and drag the main dialog out of the way.
When you have the Camera Path dialog showing, move the camera (middle click and drag, left click and drag) to the position you want your movie to start at.
1. Click Add from cam to add that point.
2. Rotate and zoom to another location and once again click Add from cam.
3. Do this another few times to create the camera key frames.
4. Once you added all the points click Save to save the path.
5. You can now close the GUI.
6. With the Source directory still set to D:\Nerf Test click Movie.
By default it will create a 15 second movie at 30 fps at a size of 1280×720. You can change these settings if you wish.
The movie frames will be created …
…and the movie will play when finished.
The movie is saved under your specified Scene directory.
Train your own images
See the fox images as an idea of images to use. You want a series of images rotating around the subject showing it from all sides you want to see in the final movie.
You can also use a movie to train from of your subject rotating. The movie frames will be extracted for you and then trained as normal.
a bronze sculpture of Robert DeNiro rendered in unreal engine and trending on Flickr
a chinese painting of a peacock by Agnes Lawrence Pelton and Bob Thompson
a cute girl 4K HD realism and 8K 3D
a fine art painting of a palace made of mist
a green tree frog
a lion
a storybook illustration of the Australian outback
ballpoint pen art of Frankenstein
Brad Pitt by Rhea Carmi and Robert Bechtle
beauty, 4K, 8K, HD, hyper detailed, high detail, surrealism
an oil painting by Picasso and van Gogh, 4K, 8K, HD, hyper detailed, high detail, surrealism
Name: Stable Diffusion v2
Author: Original script by Robin Rombach et al
Original script: https://github.com/Stability-AI/stablediffusion
Time for 768×768 on a 3090: 42 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: Uses a newly trained version of the Stable Diffusion model that renders native at 768×768. The following examples show 768×768 sized output.
a cave
a detailed painting of fear IMAX and Flickr
a digital rendering of a human made of chrome and gold
a mansion
a portrait of a sad clown
a spooky forest
a storybook illustration of a lush rainforest for sale on Facebook Marketplace and #film
a pastel of Big Bird by John Blair and Christoph Ludwig Agricola CryEngine and 4K HD realism
a tributary
a watercolor painting of a western town trending on ArtStation and Tri-X 400 TX
a werewolf
an abstract painting of Gandalf
an alien forest IMAX and vivid colors
an engraving of a cute girl
a hyperrealistic matte painting of melting color, 4K, 8K, HD, high detail, hyper detailed
a hyperrealistic matte painting of a lush rainforest, 4K, 8K, HD, high detail, hyper detailed
a hyperrealistic matte painting of a magical glowing mushroom forest at night, 4K, 8K, HD, high detail, hyper detailed
Name: Kandinsky v2.1
Author: Original script by AI Forever
Original script: https://github.com/ai-forever/Kandinsky-2
Time for 768×768 on a 3090: 1 minute 14 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: A new alternative script to Stable Diffusion and other models. Definitely worth a try.
a cabin
a fireman
a hyperrealistic painting of an ocean by Ella Guru and Walter Emerson Baum rendered in unreal engine and photorealistic
a lineart illustration of goldfish 4K photo and vivid colors
a portrait of a beautiful young girl in a garden at dusk
a robot
an impressionist painting of a happy family
an ugly monster
Harry Potter
Spiderman
Name: DeepFloyd IF
Author: Original script by DeepFloyd AI Research Band
Original script: https://github.com/deep-floyd/IF
Time for 1024×1024 on a 3090: 1 minute 17 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 only
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: A new alternative script to Stable Diffusion and other models. 1024×1024 native resolution is nice.
Click to see these samples in 1024×1024 resolution.
a babbling brook
a cathedral
a collage painting of a vast city lens flare and 8K 3D
a cove
a mountain cabin
a still life of a mountain path
a teddy bear
a worried man made of bones and wire
an allegory of Charmander
gorillas
Name: Kandinsky v2.2
Author: Original script by AI Forever
Original script: https://github.com/ai-forever/Kandinsky-2
Time for 1024×1024 on a 3090: 1 minute 3 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: An update to Kandinsky. Handles 1024×1024 resolution. Superb fast results.
a digital rendering of frogs 8K 3D and CryEngine
a gouache of a happy person made of liquid metal and metal by Emma Lampert Cooper and Zha Shibiao
a lounge room
a lush rainforest 8K 3D and for sale on Facebook Marketplace
a watercolor painting of an ugly person made of chrome and chrome
an attractive woman
an ugly face
computer graphics of fear ZBrush and filmic
impressionist of New York City trending on ArtStation and trending on Flickr
a kitten wearing pajamas and sunglasses in times square
A mystical forest filled with glowing mushrooms and iridescent butterflies, where a wise old owl perched on a branch watches over a group of playful fairies as they dance under the moonlight.
a new york city street in the rain
a painting of the amazon rainforest
a portrait of a beautiful young girl in a garden at dusk
scratchboard art of a crying person Flickr and 8K 3D
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
a photo of a beautiful young girl in a summer garden at dusk
a storybook illustration of a cozy den
New York City
Name: Latent Majesty Diffusion v1.3
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb
Time for 512×512 on a 3090: 2 minutes 24 seconds
Maximum resolution on a 24 GB 3090: 512×512 (when using GFPGAN upscaling)
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Starts with a smaller resolution image (usually 256×256 pixels), upscales it with GFPGAN, and then does a few more diffusion passes. GFPGAN can really help get better coherency in faces.
a hyperrealistic painting of a cute creature
a hyperrealistic painting of an evil clown
a picture of a tree
a surrealist painting of kittens
an engraving of an angry woman made of voxels
an oil painting of an attractive woman by Eileen Aldridge
an ultrafine detailed painting of Bruce Willis 4K HD realism
a collage painting of a lush rainforest by Doc Hammer and Alexander Ivanov hyperrealistic and CryEngine
a cubist painting of a lion and a sunset CryEngine and trending on pixiv
a fine art painting of a zombie
a gulf by I Ketut Soki and Alfons von Czibulka
a monastery trending on Flickr and #film
a morning landscape
a prairie CGSociety and CryEngine
a werewolf
ballpoint pen art of a monument
cyberpunk art of heaven filmic and CryEngine
Name: CLIP Guided k-diffusion
Author: Original script by Katherine Crowson
Original script: https://colab.research.google.com/drive/1w0HQqxOKCk37orHATPxV8qb0wb4v-qa0
Time for 512×512 on a 3090: 6 minutes 56 seconds
Maximum resolution on a 24 GB 3090: Fixed to 512×512 resolution.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new script by Katherine. Seems to generate more abstract results and these example images needed a long run of random prompts to select from.
a jigsaw puzzle of paranoia by Petr Brandl and Sasha Putrya
a landscape vivid colors
a pastel of Cookie Monster by Ren Bonian and Ángel Botello for sale on Facebook Marketplace and CryEngine
a reef
a renaissance painting of Al Pacino
a statue of a submarine made of metal and crystals by James Sessions American painter and Elfriede Lohse-Wächtler
an airbrush painting of a nightmare creature vivid colors and rendered in Cinema4D
an oil painting of a cephalopod made of paper and mist
an ugly person and an area 4K HD realism and trending on pixiv
a collage painting of a tiger vivid colors and photorealistic
a cove 4K photo and CryEngine
a cute creature
a glacier
a space nebula
a townhouse
a valley
an oil painting of a peacock by Wu Hong and Eve Ryder
Cthulhu
digital art of a wetland made of cheese and timber by Jacob Duck and Jacob Gerritsz Cuyp
Name: Latent Diffusion LAION_400M v2
Author: Original script by pesser
Original script: https://github.com/pesser/stable-diffusion
Time for 16 256×256 images on a 3090: 49 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Renders multiple images quickly. Coherency is best at 256×256 so these example images are 2×2 tiled results. Each took 35 seconds on a 3090.
a babbling brook
a colorful parrot
a fine art painting of a castle
a matte painting of a rose
a pencil sketch of a cave 4K photo and hyperrealistic
a photorealistic painting of Cthulhu for sale on Facebook Marketplace and Flickr
a surrealist painting of a cloudy sunset
a surrealist painting of a monkey
an illustration of of a tiger by Stanley Twardowicz and Antoni Pitxot
an impressionist painting of a cottage
Name: Stable Diffusion
Author: Original script by pesser
Original script: https://github.com/CompVis/stable-diffusion
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one.
a black and white photo of puppies
a cathedral rendered in unreal engine and super detailed
a city made of mist trending on ArtStation and trending on Flickr
a detailed matte painting of a lush rainforest made of crystals and feathers
a king
a polaroid photo of a clown vivid colors and 8K 3D
an airbrush painting of the Terminator CryEngine and for sale on Facebook Marketplace
an ambient occlusion render of a wetland by William Forsyth and Victorine Foot trending on pixiv and CryEngine
poster art of a farm by Frederic Leighton and Yang Borun rendered in unreal engine and 8K 3D
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
a portrait of a young boy by Hendrick Cornelisz. van Vliet
a tree by Philips Wouwerman
a western town
a zombie
Han Solo psychedelic
vector art of the Amazon Rainforest
Name: Hypertron v2
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 18 seconds
Description: Version 2 of Hypertron. More models, more flavors. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Can give good results if you let it run a large random batch overnight.
a bronze sculpture of a spooky forest by Herb Aach
a diamond made of flowers
a gouache of an android by Wu Bin
a photo of a kitchen
a photorealistic painting of a cemetery
a sketch of a haunted house
a tattoo of Squirtle made of clay
an art deco painting of a human by Nicolas Lancret 8K 3D
Name: GLID-3-XL
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3-xl
Time for 512×512 on a 3090: 1 minute 04 seconds
Maximum resolution on a 24 GB 3090: 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Improved/updated version of GLID-3. Uses CLIP for better accuracy. Great textures and lighting. Poor image coherency when over 256×256.
a demon
a detailed matte painting of a bouquet of flowers
a kitchen
a photorealistic painting of a movie monster hyperrealistic
a picture of The Incredible Hulk by Kazimir Malevich
a pop art painting of an angry woman
a spooky forest
an abbey
New York City by Marie Courtois
poster art of Gandalf vivid colors
Name: ruDALL-E Aspect Ratio
Author: Alex Shonenkov
Original script: https://github.com/shonenkov-AI/rudalle-aspect-ratio
Time for 512×512 on a 3090: N/A
Maximum resolution on a 24 GB 3090: N/A
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Version of ruDALL-E that generates wide and/or tall aspect ratio images. The shorter side is limited to 256 pixels. Results can be very nice. Will generate multiple images at once, so these sample images have 4 results per prompt.
a black and white photo of a werewolf
a cartoon of a swamp
a large waterfall made of metal
a lounge room
a matte painting of a townhouse
a palace made of mist
a photo of an ugly woman
a tropical beach
an evil clown
dense woodland
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.
a 3D render of Robocop
a futuristic city IMAX
a matte painting of trypophobia
a renaissance painting of a cloudy sunset trending on ArtStation
a woman 4K photo
an evil clown Flickr
an oil painting of a nightmare creature by Louis Janmot
a matte painting of New York City by Robin Guthrie
a portrait of a young girl
a rough seascape
a sea monster
a teddy bear
a werewolf
an airbrush painting of an angry woman
an attractive woman
Name: Looking Glass
Author: bearsharktopus
Original script: https://colab.research.google.com/drive/11vdS9dpcZz2Q2efkOjcwyax4oob6N40G
Time for 265×256 on a 3090: 1 minute 19 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 03 seconds
Description: A variation on ruDALL-E that added support for training the output with a single image or directory of images. It does seem to create better results than the raw ruDALL-E scripts (starting from a single image of random Perlin noise).
a minimalist painting of a castle in the mountains
a photocopy of a monkey vivid colors
a spooky forest by Laura Muntz Lyall
a teddy bear made of wrought iron
dense woodland
God
Name: GLIDE
Author: Unknown
Original script: https://colab.research.google.com/github/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
Time for 256×256 on a 3090: 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Locked to 256×256
Description: Images are rendered tiny at 64×64 and then upscaled internally within the script to 256×256 for ouput. The model has been “trimmed” so it cannot do anything human related and only does well for subjects it knows about. Hopefully they release the full model and/or train a larger resolutioon model in the future. Nothing to get excited about yet.
a cathedral
a color pencil sketch of a fire breathing dragon by Erwin Bowien
a gorilla
a library
a mosaic of monkeys
a painting of a cabin next to a stream in a secluded forest
an elephant
dinosaurs
goldfish
the Sydney Harbour Bridge lens flare
Name: Disco Diffusion
Author: @Somnai
Original script: https://colab.research.google.com/drive/1bItz4NdhAPHg5-u87KcH-MmJZjK-XqHN
Time for 512×512 on a 3090: 3 minutes 18 seconds
Maximum resolution on a 24 GB 3090: 2496×1088 11 minutes 50 seconds
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Diffusion script that includes all the latest features. Capable of rendering some very nice large resolution images (it may even do better at larger sized images than smaller resolutions like these samples).
a cute creature
a detailed matte painting of a morning landscape
a peacock made of mist by Reinier Nooms
a Pokemon character by William Etty
a polaroid photo of an angry woman
a rough seascape
a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D
an attractive woman
computer rendering of a desert oasis rendered in unreal engine
a digital painting of Chewbacca by Willem van de Velde the Elder
a sad person
a skull
a storybook illustration of a happy clown by Gwen Barnard
a tree by Colin Gill
Bugs Bunny
fireworks by Károly Lotz
The Grand Canyon
Yoda
Name: ruDOLPH
Author: SBER AI
Original script: https://github.com/sberbank-ai/ru-dolph
Time for 128×128 on a 3090: 1 minutes 15 seconds
Maximum resolution on a 24 GB 3090: Locked to 128×128
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another ruDALL-E variation script. Locked to a tiny 128×128 resolution for now until they train the larger models. These examples were 4x upscaled with Real ESRGAN.
a castle
a colorful parrot
a fine art painting of an ugly woman
a kitchen
a pastel of spirals made of plastic
a photorealistic painting of a cityscape
a portrait of a woman
a sad person by Ramon Casas i CarbÃ
kittens
vector art of a woman
Name: CLIP Guided Deep Image Prior
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1_oqIK8A67EgtJDdfsuJojc5ukNzirdle
Time for 512×512 on a 3090: 1 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 or 1680×720
Maximum resolution on an 8GB 2080: 512×512 (5 minutes 7 seconds) or 640×360
Description: Interesting script that has decent coherency. If only the output was slightly sharper and the colors slightly richer it would be a winner. Still good for unique outputs that the other methods cannot achieve.
a flemish baroque of a shrine
a statue of a tardigrade made of clay
a surrealist painting of a Pixar character
a surrealist painting of an evening landscape 4K photo
an abstract sculpture of an evil clown by Han Gan
an ambient occlusion render of Bugs Bunny made of wood
Cookie Monster
Jabba The Hutt by Shūbun Tenshō
tentacles by Johanna Marie Fosie
vector art of heaven
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
Name: FourierVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1nGNBjhbYnDHSumGPjpFHjDOsaZFAqGgF
Time for 512×512 on a 3090: 1 minutes 40 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024 4 minutes 07 seconds
Description: Detailed images. The default script generates washed out pastel images, but with some gamma and brightness tweaks they can be improved (still not ideal, but better). Allows very large resolution images.
a cathedral
a charcoal drawing of zombies
a detailed painting of a sunset by Thomas Cantrell Dugdale
a ghost made of mist
a kitchen
a movie monster
a pencil sketch of a sad clown
a werewolf
an evil clown by Viktor Oliva
an ink drawing of an ugly monster
Name: PyramidVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1dpAS_wK34y7c6s-CatAFmBtbkjGT_erM
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024 10 minutes 48 seconds
Description: Very detailed images. Not the fastest script, but gives some very nice results. Lower VRAM requirements so good for lesser spec GPUs. Definitely one of the better scripts worth exploring.
a desert oasis
a lush rainforest
a marble sculpture of an angry person
a minimalist painting of the Amazon Rainforest
a nightmare creature
a pastel of a computer made of paper
an abstract sculpture of a sad clown
an acrylic painting of an alien forest | vivid colors
Medusa
vector art of an ugly woman
Name: Visions of AI v1
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 1 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 33 seconds
Description: My first attempt at actually creating a Text-to-Image script. Based on the excellent example from Jonathan Whitaker‘s AIAIArt Lesson 3 tutorial. Gives some very nice fine detail in some areas, but suffers the non coherance of other scripts in that it creates multiple copies of the subject throughout the image. After actually trying to write my own script I only have more respect for those who can do this. Hopefully I can improve these results for a version 2. In the meantime, here are some sample from the current Visions of AI script.
a cartoon of the human condition by Judy Takács
a cubist painting of an evening landscape
a digital rendering of frogs
a fire breathing dragon
a hyperrealistic painting of a movie monster
a morning landscape
a shark
a woodcut of an ugly man
an airbrush painting of C-3PO
Frankenstein
Name: Visions of AI v2
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 2 minutes 35 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 36 seconds
Description: An attempt to improve the coherency of the previous script. The first 30 iterations zoom into the image every 10 frames. This results in larger shapes/blobs for the rest of the script to work from. The idea is that it will give larger subjects compared to the v1 script. Kind of works. Gives blurrier results. To be fixed in the next version?
a morning landscape by William Gear
a raytraced image of a nightclub lens flare
a tentacle monster by Carlo Crivelli
a woodcut of a worried woman by Li Keran
an illustration of of a cave made of cheese
Cthulhu
cyberpunk art of a futuristic city
goldfish
reflective spheres
the Australian outback
Name: Multi-Perceptor CLIP Guided Diffusion
Author: Varkarrus
Original script: https://colab.research.google.com/drive/1y3Vt39A5KSNFRa6Z2bCqDHxteZSVH9NC
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 1152×384 (dimensions must be divisible by 128).
Maximum resolution on an 8GB 2080: 128×128 1 minute 56 seconds
Description: Builds upon previous CLIP Guided Diffusion scripts. Like the previous script by Dango233 it uses three CLIP models simultaneously to “rate” the generated images, and I have added options to use up to six different CLIP models. The resulting image accuracy compared to the prompt, and the resulting image coherence seem to be much better than previous CLIP Guided Diffusion scripts that could almost have random outputs sometimes. This script is superb and highly recommended. Great lighting, textures and brushstrokes. Normally with these blog posts I do a batch run of random prompts overnight and then pick the best 10 images. In this case I had nearly 50 images in my “good” folder after going through the batch results. So, for this script I am showing 20 sample images.
a cute creature | TriX 400 TX
a digital painting of Frankenstein by Kanzan Shimomura
a morning landscape by János SaxonSzász
a nightmare creature
a photorealistic painting of a teddy bear
a portrait of a young girl
a space nebula | IMAX
a worried man
a zombie by Nathaniel Hone
an acrylic painting of a spider by Abram Arkhipov
an airbrush painting of a monkey by Jeremy Henderson
an alien landscape
an ugly creature made of insects
an ultrafine detailed painting of a sad person | ZBrush
Arnold Schwarzenegger | trending on ArtStation
concept art of Robocop
dinosaurs
Dracula | CGSociety
flesh made of insects
God by William Simpson
Name: Pixel MultiColors
Author: Remi Durant
Original script: https://colab.research.google.com/drive/17c-13cl_VQKpHq2rDrnFVi6ZT-CHeZNn
Time for 512×512 on a 3090: 0 minutes 44 seconds
Maximum resolution on a 24 GB 3090: 4096×4096.
Maximum resolution on an 8GB 2080: 2048×2048 7 minutes 45 seconds
Description: Very noisy/pixelated/abstract results. The default script gives dark images which some tweaks to brightness and contrast can help. Maybe a little bit of blur could help too in a future revision. It is fast though, and can support huge image sizes.
a charcoal drawing of a cute creature made of metal
a matte painting of halloween by Carlos Trillo Name
a photorealistic painting of an alien landscape by Jacob Ochtervelt
a rough seascape filmic
a sea monster
a woodcut of a skull by Gu Hongzhong trending on ArtStation
Cthulhu
trypophobia
Name: Multi-Perceptor VQGAN+CLIP
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 2 minute 30 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: As with the previous Multi-Perceptor CLIP Guided Diffusion scripts this one allows two different CLIP models to be used to rate the VQGAN output images. VQGAN is not going to beat diffusion for image coherance, but this script can give some very nice lighting and fine details in images.
a bronze sculpture of an evil clown made of clay by Dionisio Baixeras Verdaguer
a fantasy land by Shigeru Aoki
a hyperrealistic painting of puppies
a midnineteenth century engraving of the Sydney Opera House
a statue of reflective spheres
a surrealist painting of a tropical beach
an alien city CGSociety
an oil painting of a fire breathing dragon
computer rendering of a well kept garden by Norman Garstin ZBrush
war CryEngine
Name: Hypertron
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 2 minute 00 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 35 seconds
Description: Another VQGAN based script. Has various “flavors” to give different results. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Still worth a try.
a black and white photo of a fireman
a cute monster by Józef Mehoffer
a matte painting of a forest clearing
a pop art painting of a human
a renaissance painting of a ghost by Jan van de Cappelle film
a sea monster made of metal
a tattoo of a zombie
a watercolor painting of a dragon Flickr
an art deco painting of a haunted house by Mary Cameron
concept art of a mountainscape by Maximilian Cercha
Name: CLIP Guided Diffusion Secondary Model Method
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1mpkrhOjoyzPeSWy2r7T8EYRaU7amYOOi
Time for 512×512 on a 3090: 2 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new diffusion based script from Katherine Crowson including a new “secondary model” she trained. Capable of some unique results with good textures and lighting.
a detailed painting of Fozzy Bear by LeConte Stewart
a flemish baroque of a happy person trending on pixiv
a flock of birds
a Ghostbuster CGSociety
a kitchen made of cheese
a nightmare creature
a photorealistic painting of The Grinch
a portrait of a woman
an art deco painting of a sad clown
an oil painting of a nightmare
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: CLIP Guided Diffusion v4
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3
Time for 512×512 on a 3090: 3 minutes 05 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another CLIP Guided Diffusion script. Locked to 512×512 resolution. Like the other CLIP Diffusion scripts, some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. When it gets a “hit” it can create very detailed impressive results, but the amount of “misses” stops it from getting a great rating. Still worth a try if you have the patience to run a large batch of images waiting for the best results. The following samples came hand picked from a large batch run of random prompt phrases.
a forest clearing
a storybook illustration of a nightmare
an impressionist painting of a cemetery
Harry Potter in the style of Rembrandt
a detailed painting of a witch
a babbling brook
a desert oasis
a hyperrealistic painting of an android
eyeballs
a cross stitch of Buzz Lightyear
Name: CLIP Guided Decision Transformer
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3
Time for 512×512 on a 3090: 1 minutes 13 seconds
Maximum resolution on a 24 GB 3090: Locked to 384×384
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another one from Katherine Crowson. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. When it gets a “hit” it can create very detailed impressive results, but the amount of “misses” stops it from getting a great rating. The following samples came hand picked from a large batch run of random prompt phrases.
Another good point for CLIP Decsision Transformer is that it will generate a batch of images from each run. So rather than a single image for the prompt text you can specify (for example) 8 images to be generated from the prompt. This allows a much larger set of images to be quickly generated to find those great outputs in.
For these images I have enhanced the resolution 4x using Real-ESRGAN (the thumnails are the original output images and the clicked images are resized x4).
a detailed painting of a palace by Thomas Kinkade
a drawing of Chewbacca
a forest path
a renaissance painting of a mountain range
a rough seascape
a rough seascape
a spooky forest
an oil on canvas painting of a western town
Frankenstein
The Grand Canyon
Name: CLIPIT
Author: dribnet
Original script: https://github.com/dribnet/clipit
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another GAN+CLIP script. Gives nice results that tend to match the prompt text more closely. This one is heavy on VAM usage.
Name: Quick CLIP Guided Diffusion
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1FuOobQOmDJuG7rGsMWfQa883A9r4HxEO
Time for 512×512 on a 3090: 43 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Modified version of CLIP Guided Diffusion that gets results quicker. Option for 256×256 or 512×512 sized images. Still very hit and miss when getting images that resemble the input prompt. The following samples came from a large overnight batch run of random prompts.
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.