“Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.”
This is the self-description of the Midjourney research laboratory. It sees itself not just as a technical tool but rather as a pioneer for an expanded human imagination. With over 20 million users in November 2024, it is clear that Midjourney's approach has met with broad approval. After all, when the first image is generated using AI, it can be a pretty magical experience. But as with any new technology, it takes time and practice to use its full potential.
This guide looks at how Midjourney works, how it compares to other image generators and, of course, what good prompts look like to create mesmerizing images.
Midjourney is an AI-based image generator that automatically creates digital images by entering text commands (prompts). Developed by the independent research lab of the same name, the platform uses machine learning algorithms to generate detailed and “creative” visual representations. Midjourney thus makes it possible to easily generate complex images without requiring in-depth graphical knowledge on the part of the user.
Midjourney was first made available to the public as an open beta version in July 2022 and quickly attracted great interest. Over the years, several new versions have been released, bringing faster and high-resolution image generation, improved customization options and more intuitive user interfaces. The latest version 6.1 was released in July 2024. This version offers improved image quality, faster generation times (around 25% faster than version 6) and refined details, especially for complex textures and fine features such as eyes and facial features. In addition, 6.1 includes new upscale options (“Subtle” and “Creative”) that allow you to create higher resolution images with improved detail.
The exact way Midjourney works remains a well-kept secret, but like other image generators, the technology is based on two key machine learning approaches: Large Language Models (LLM) and Diffusion Models (DM).
The Large language model enables the AI to capture the meaning of the prompt – i.e. a text-based description – and convert it into a vector that serves as a digital version of the description. This vector then controls the next step, the Diffusion. This is a process in which the model was originally trained by adding noise to its training dataset and then gradually removing it to restore the original image.
Thus, by removing noise from a randomly generated image, Midjourney can generate new images that match the description entered by the user. It usually only takes a minute from entering the prompt to the finished image – a fast journey from idea to visual result.
Midjourney faces strong competition in the image generator market. They all have their advantages and disadvantages and sometimes differ greatly from one another. For a better overview, here is a comparison of the 3 largest current providers.
Characteristic | Midjourney (V6.1) | DALL-E (3) | Stable Diffusion (3) |
---|---|---|---|
Quality & realism | High image quality, realistic representations, good level of detail; strengths in photorealism and atmospheric lighting | Highly stylized and detailed images; particularly strong in graphics and illustrations | Realistic scenes, high quality with complex compositions, but sometimes less detail |
Prompt accuracy | High fidelity, especially with simple to moderately complex prompts | Good accuracy, especially with simple to complex texts | Strong fidelity, especially with relational and complex prompts |
Adjustment options | Many options for style, variation and reference images | Inpainting and interactive editing possible | Supports custom models and adjustments for specific styles |
Platform access | Access required via Discord | Access via ChatGPT web platform and via Bing | Open source and can be installed locally, flexibly accessible via API |
Pricing | Subscription required; no free version | Integrated into the paid version of ChatGPT or free via Bing | Free in the basic version; higher prices for customized models |
Field of application | High precision in creative, commercial and artistic projects | Particularly suitable for stylized and detailed images | Versatile; especially for users who need custom and versatile images |
Midjourney is particularly distinguished by its artistic focus, which allows creating not only photorealistic but also stylized and sophisticated images. The customization options and image fidelity are very good, but access and cost are limiting for some. DALL-E scores with its ease of use and is ideal for beginners as it is integrated into ChatGPT. It's flexible to edit, but with less artistic freedom and texture than Midjourney. Stable Diffusion is particularly attractive for advanced users who like to customize local models. Its open source availability and flexibility make it ideal for specific style and model customizations, but technical knowledge is required for optimal use.
A Discord account is required to use Midjourney, as all interaction takes place via the Discord platform. Any device that supports Discord can be used for this. The setup is quick:
Create a Discord account (if you don't already have one).
Use the link https://discord.gg/midjourney and join the official Midjourney Discord server.
There is currently no free trial period. Therefore, a subscription must be taken out directly. To test Midjourney, the Basic Plan for $10 is best. You can do this by typing the command /subscribe on the Midjourney Discord server. A personal link for membership will then be generated.
After completing a membership, the image prompts can be entered in special channels for newcomers (Newcomer Rooms) and the AI then generates the corresponding images.
Midjourney offers various paid subscriptions to be able to use the services to their full extent. There is no free version and the test phase is currently suspended because the number of users is too high. You can choose from the following subscriptions:
More detailed information about membership options can be found under Midjourney subscription.
Midjourney uses prompts to create visuals from text descriptions. Creating a good prompt is the key to high-quality images, as it determines the content, style and composition.
In Midjourney, prompts are entered via the Discord interface. It starts with the command /imagine followed by the description.
Example:/imagine a futuristic cityscape at sunset, vibrant colors, ultra-realistic
The AI interprets these inputs and creates an image based on the description. A good prompt depends on how clear and detailed the description is. For example:
For example: /imagine a youngcaton a cushion in a pencil sketch style
Here are some examples of different styles:
Different eras with their typical style characteristics are also possible:
Midjourney offers a number of parameters that can be used to further control the output, for example:
Prompt example:
/imagine a futuristic cityscape at sunset, vibrant colors, flying cars, skyscrapers made of glass, ultra-realistic --ar 16:9 --q 2
If you enter this prompt, the following output appears:
When the generation is complete, four image variants are displayed, which can now be edited further:
For example, if the color scheme of the fourth image is particularly appealing, but the image does not fully meet your expectations, you can click on “V4“ to generate a series of four variants:
In this example, we choose the fourth image again and create an upscaled image by clicking on “U4”:
It is possible to edit the resulting image further:
Once the desired result is achieved, the last thing to do is to download it – and the image of a futuristic city with flying cars is ready.
Sometimes there is a clear idea of the image you want, but finding the right prompt can be difficult. Despite numerous adjustments, the result often falls short of expectations. In such cases, the command /describe can be helpful to analyze existing images that are similar to the desired one. This allows you to understand how Midjourney interprets them.
After entering the command, a drag-and-drop box appears to upload the image:
Once the appropriate image has been uploaded, Midjourney will display four options that describe it. Simply select the most suitable option and refine it further to create the perfect prompt for your project. The /describe command can be a valuable tool when Midjourney's outputs don't align with your desired results.
After the basic principles of image generation with Midjourney have been explained, a compact cheat sheet with important commands and parameters follows. This serves to further optimize prompt creation and make image output more precise and controllable.
Instruction | Notation | Example in the prompt | Function and application |
---|---|---|---|
Image generation | /imagine | /imagine a sunset over mountains | Basic command to start image generation. Always followed by a text description of the desired image. |
Image description | /describe | /describe [upload image] | Describes an uploaded image in four text variants. |
Load images from URL | Insert image URL | /imagine [image URL] a sunset over mountains | Allows you to use image URLs as a starting point for image generation and combine them with textual instructions. |
Aspect ratio | --ar | /imagine a sunset over mountains --ar 16:9 | Defines the aspect ratio of the image. Default is 1:1. It is possible to change it to, e.g., --ar 16:9 for wider images or --ar 9:16 for portrait formats. |
Custom sizes | --w, --h | /imagine a sunset over mountains --w 1920 --h 1080 | Sets a custom image width (--w) and height (--h) in pixels to obtain a specific resolution. |
Image quality | --q | /imagine a sunset over mountains --q 2 | Increases image quality. Standard is --q 1. Higher values (up to --q 2) increase the level of detail but extend processing time. |
Select version | --v | /imagine a sunset over mountains --v 5 | Selects a specific version of the Midjourney engine. For example --v 5 or --v 6 for the latest versions. |
Style template | --style | /imagine a sunset over mountains --style 1000 | Determines the artistic style of the images. Values from 0 (realistic) to 1000 (extremely stylized) are possible. |
Chaos factor | --chaos | /imagine a sunset over mountains --chaos 80 | Increases the randomness of the image, with values from 0 to 100; higher values lead to unpredictable and creative results. |
Sharpening | --hd | /imagine a sunset over mountains --hd | Activates "HD" mode for sharper and detailed images. |
Number of variations | --n | /imagine a sunset over mountains --n 3 | Generates a fixed number of variations (between 1 and 4). By default, four images are generated. |
Exclude certain objects in the image | --no [object/feature] | /imagine a sunset over mountains --no trees | Excludes certain elements from the image; in this case, no trees are generated in the image. |
Prioritize details | --details | /imagine a sunset over mountains --details | Increases the level of detail in the image. Works well for images with a lot of elements that need more detail. |
Color balance | --color | /imagine a sunset over mountains --color warm | Determines the color tone of the image, e.g., warm, cold or vibrant. |
Shadow effects | --shadow | /imagine a sunset over mountains --shadow | Adds deeper shadows and realistic lighting, especially in darker scenes. |
To conclude, a selection of prompt examples accompanied by images is provided to showcase the possibilities Midjourney offers.
/imagine Elephant made of glass, Kintsugi, orange sunset, national geographic, scenic landscape --ar 16:9
/imagine minimalistic photographie of black parrot, eating oranges, white background --ar 4:3
/imagine a large-format picture with a figure leaning on an orange (HEX #FF792B) tower PC in the foreground, in the background a wide wooded valley, with gently sloping cliffs and a bright, cloudy sky, low horizon, in the style of Pumpkin and Fruits by Yayoi Kusama, --ar 1600:1000
/imagine a cowboy sitting at a table in a tavern, playing poker, cowboy hat pulled low on his face, some cards in his hands, a superior laugh on his lips, frontal view, half-length figure, cards in orange (HEX #FF792B), --ar 16:9
/imagine a personification of language melting under an orange (HEX #FF792B) sun, surrealism in the style of Salvador Dalí, --ar 8:5
Share this post: