Generating Images With Midjourney
Image generation has gotten incredibly powerful. Combine it with large language models to generate your prompts, and you have a remarkably effective AI pipeline.
It's only been relatively recently that I've started exploring image generation with AI models such as Midjourney. I've followed the emergence of generative image models like DALL-E and Stable Diffusion, but I didn't think much of it. This was largely due to these models initially not creating very convincing images, similarly to GPT before version 3.
With the latest releases, some of these models have become incredibly powerful. I've explored Midjourney in recent weeks and I’m deeply impressed, especially with the latest version. We may not be at the stage of truly photorealistic images yet, but it is very close and simply exceptional for illustrations.
In this post, I’ll briefly explain some of the core features of Midjourney and how to use them.
Getting started
Using Midjourney requires a paid subscription. Even if you're not planning on using it in the long term, I highly recommend signing up for the cheapest plan for one month, which will cost you $10. It’s the most fun I’ve had for $10 in a long time.
Perhaps the most “annoying” thing about Midjourney is that its interface is via Discord, a social messaging platform. Thus, you will need a free Discord account first, which you can create at discord.com. Discord communities are organized by server, so once you have your discord account, you can join the Midjourney server (I recommend following this handy quickstart guide by Midjourney). Once that’s done, you can use any of the channels to start your prompts.
Getting ideas
One of the biggest hurdles for image generation beginners is generating ideas for prompts. The Midjourney system works by starting with /imagine
, followed by a prompt. If you're already full of ideas, go ahead, but I've found one of the most powerful ways to generate ideas is to actually ask ChatGPT for prompts. What I do is explain to ChatGPT what I am looking for and instruct it to give me several ideas along with their prompts.
What I would do is to explain to ChatGPT what I am looking for, and to instruct it to give me several ideas along with their prompts.
Let’s say I am giving a talk about image generation with AI 😎, and I’d like to underline my message with an image. I have some vague ideas about robots painting, but I’m not really a visual artist, so I’m heading over to ChatGPT to describe my situation with the following prompt":
I’m creating a slide about AI-driven image generation, and I’d like to generate an image (with Midjourney) that conveys the idea of AI generating visual imagery. I have this idea about robots painting. Can you expand on that idea and generate a few more? Please provide the corresponding prompts with each idea.
Here’s the response of ChatGPT (using GPT-4):
Of course, I'd be happy to help brainstorm and provide prompts for AI-generated images centered around the theme of AI generating visual imagery.
Robots Painting:
Prompt: "A robot sitting at an easel under a cherry blossom tree, delicately painting a vibrant landscape."
Digital Brush Strokes:
Prompt: "Glowing 3D lines of code transforming into strokes of a digital paintbrush on a canvas, creating a stunning sunset."
Machine Learning Mosaic:
Prompt: "A mosaic composed of tiny images, each one a piece of AI-generated art, forming a larger image of a computer chip."
Art Gallery of the Future:
Prompt: "An art gallery with paintings that change and morph, each piece of art being created in real-time by an AI."
Sculpting with Light:
Prompt: "A 3D printer-like robot that sculpts with beams of light, creating AI-driven holographic sculptures."
Abstract Algorithm Art:
Prompt: "A mathematical algorithm morphing into a captivating and intricate piece of abstract art."
Synthetic and Organic Fusion:
Prompt: "A robot hand and a human hand each holding a paintbrush, working in unison to create a vibrant blend of synthetic and organic art."
Conductive Creativity:
Prompt: "Circuitry pathways on a motherboard transforming into a beautiful, sprawling cityscape."
AI Dreamscapes:
Prompt: "A complex neural network visualized as a stunning, surreal dreamscape with floating islands and vibrant galaxies."
Pixel Painters:
Prompt: "Miniature robots converting pixels on a digital display into a stunning visual art."
Remember that the effectiveness of AI-generated images can depend on the specific capabilities of the AI system being used, so results may vary. I hope these ideas help spark your creativity and effectively illustrate the idea of AI-driven image generation.
These are fun ideas, and I’d like to play with some of them. The last one strikes my as closest to my idea, so I’m going to go ahead and prompt Midjourney correspondingly:
Depending on the server load, this can take some time before it starts. In this case, after few second, I am seeing the first outlines of 4 images Midjourney generates:
It’s quite fascinating to watch the pictures emerge! After about one minute, I have these four proposals:
Oh wow, that’s quite impressive! I love the one at the bottom right, even though it’s a little dark. I can now either try agin, by clicking on the circled arrow, upscale each image (by clicking on the U buttons), or make 4 new variants based on any of the four images, by clicking on the V buttons. The numbering is 1 for top left, 2 for top right, 3 for bottom left, and 4 for bottom right.
Since I want to explore the bottom right picture, I’ll hit V4.
A minute later, I have four new images:
Ok, let me check the first one in more detail. I’m going to upscale it by hitting U1, which gives me:
You can see I can continue from here, by creating more variations or zooming out - or simply downloading the image, by hitting the Web button, which will take me to a downloadable link outside of Discord.
In this case, I decide the theme is a little too dark for my taste, so I’ll go back to the prompt. Actually, looking at the image, I think what I’d want are tiny little robots carrying around pixels, building a mosaic. So I’ll enter this as a prompt, and shortly thereafter, a have four suggestions:
This is better. The robots are cute! But I wanted multiple robots, perhaps I’ll need to spell that out? And this is till a little dark for me, so I’ll clarify that they’re building a colorful mosaic. It’s probably also hard to understand what a pixel is, so I’ll fix that too. My prompt now is
Dozens of tiny little hyperrealistic robots carrying around small pieces of color, building a colorful mosaic. The mood should be light and optimistic.
Here’s the result:
Nice! Image 2 is what I am looking for, so I’ll upscale that (U2) and then say “vary subtle”, which gives me:
It’s really impressive how the images vary subtly, including the shapes of the pixels, and the robots themselves. I think the guys in imagine 1 are cute, so I’ll upscale that image (U1) and then hit “zoom out 2x”:
Great, I think I’m done here. I’ll upscale image 2 and download it:
As you can imagine, I could go on and on, try new prompts, go back to ChatGPT and ask for more refined ideas.
It can be very insightful to browse other people’s prompts for inspiration. I've noticed that when you instruct Midjourney to create icons, it generates a whole series of them. Recently, as I was considering updating the logo for this newsletter, I had a thought: wouldn't it be nice to have letters that resemble robots? So, I proceeded to provide the following prompt to Midjourney:icons of all alphabetical letters in robot like font, no background
The results, with some variations, were nothing short of mind-boggling to me:
I can’t emphasize enough how remarkable this is. It’s entirely possible that this just speaks to me, visually, and none of my readers. But I find these images so delightful that I am thinking about printing them on canvas and hanging them up in my office.
These letters allow me to spell out “prompt engineering”, but I’d also like to replace my little robot guy in the logo with a realistic looking, cute robot. Here’s the prompt I’m using:
little hyperrealistic robot, smiling, helpful, waving, no background
After a few versions, I obtain this cute little fellow:
(When my 10-year-old daughter saw him, her reaction was, "Aaaaaawwwwwww - so cute! Can I have it?" I take that as a sign of approval.)
Combining it with the robot letters gives me my new logo:
Not too shabby for an hour of prompting!
Parameters
Midjourney also allows you to specificy parameters. You can find a complete list here. One of the most important ones that I now use all the time is --aspect
, or --ar
(note the double dash), which allows you to set the aspect ratio of the image. This can be quite handy, because at times, you will need a specific aspect ratio, for examples for slides.
Here’s a prompt I recently used to spice up the opening slide when I give a presentation about digital epidemiology (first part of prompt brought to you by ChatGPT):
An epidemiological curve transforming into a stream of binary data, conveying the idea of digital epidemiology. Flat, vector, on white background. --ar 16:9
This was one of the results:
In addition, for another newsletter, I needed a slightly flatter curve for a logo with dimensions 1100 x 200, so I picked another variant, and then used the Zoom out functionality to create a longer curve:
which I was then able to crop down to the desired format (same content, but the dimensions of the image are now 1100 x 200):
In conclusion, I hope this has sparked your interest in exploring Midjourney. I find it incredibly entertaining and also useful. A significant part of my job these days is communication, and having powerful, good-looking visuals is a key component. Needless to say, one needs to be mindful about the use of generative imagery, but as a source of illustration, I can see only advantages.