Stable Diffusion Guide

A starter guide to Stable Diffusion image generation

OctoAI provides a highly customizable Stable Diffusion endpoint with best-in-class inference speed. You can mix & match Stable Diffusion assets, including checkpoints, Low Rank Adaptations (LoRA’s), and textual inversions, to customize your generated images. There’s several image dimension options if you want to create images larger than 512x512.

Stable Diffusion Primer
Stable Diffusion has several asset types, which can be combined within a single image generation.

  • Checkpoints are custom versions of the Stable Diffusion model which can have a number of representations such as a specific style or custom subject. Styles can vary from general purpose, realistic photography, or even cartoons.
  • LoRA’s are additional custom weights applied to a base checkpoint. Similar to checkpoints, LoRA’s can represent a specific style or custom subject that are much smaller in size.
  • Textual inversions are embeddings that represent custom subjects. You can also use negative embeddings to avoid undesirable content, like poor quality hands and lighting.

The web UI is an easy way to experiment by combining assets, and the equivalent API call is displayed in the cURL example. Looking for more Stable Diffusion details? Check out our beginner’s guide to fine-tuning Stable Diffusion.

Prompt weighting

You can emphasize, or de-emphasize, specific words or phrases of the image generation prompt using weighting. To use prompt weighting, format your prompt using parentheses: prompt = "A cat with (long whiskers)"

This emphasizes the phrase “long whiskers” with a weight of 1.1. Adding additional parentheses such as "(((long whiskers)))" performs additional multiples of 1.1, so for 3 sets of parentheses, the weight would be 1.33. More specific weights can also be specified in the form: prompt = "A cat with (long whiskers: 0.8)"

This will weigh all words in the parentheses by a factor of 0.8. Notably, weights do not have to be greater than one. Using a weight of less than 1 will de-emphasize the contained words.

Using weights in negative prompts can also be helpful. For example, you can avoid distorted hands: negative_prompt = "(distorted hands: 1.5)"

Checkpoints

A diverse set of checkpoints, with varying styles, are available to customize your images:

  • General purpose: stable-diffusion-v1-5, Deliberate, anything-v4.0, and anything-v5
  • Photography style: Realistic_Vision, reliberate_v10, and icbinp (I can’t believe it’s not photography)
  • Cartoon or video game style: toonyou_beta3, dark-sushi-mix, and zovya_rpg_v3
  • Architecture: ArchitectureRealMix
  • Midjourney style: DreamShaper and openjourney

The image results with different checkpoints, even using the same prompt, can be significantly different. Using the simple prompt A medieval knight holding a shield, you can see see the results from icibnp (left) and toonyou3 (right).

LoRA's

LoRA’s can further customize your images, including styles ranging from steampunk schematics to low light conditions. You can include multiple LoRA’s in a single image generation, and provide a weight for each LoRA. A greater weight value will have more influence on the generated image.

You can also create your own LoRA assets on OctoAI via fine-tuning, which is currently in Private Preview.

Textual Inversions

Textual inversions represent a custom subject or concept within the embeddings of Stable Diffusion. The name of the textual inversion acts as a specific trigger word, which must be included in the prompt. Similar to prompt weighting, you can increase the weight of textual inversion using the format (textual-inversion:weight).

charturnerv2 can help you create multiple views of a subject. You can use an example prompt of A medieval knight holding a shield, front view, back view, (charturnerv2:1.5) to generate multiple views:

Textual inversions can also represent negative embeddings, which are trained on undesirable content like bad quality hands. You can use these in your negative prompt to improve your images:

  • badhandv4: avoids bad quality hands
  • ng_deepnegative_v1_75t: avoids unnatural positions and upside down structures
  • easynegative, BadDream, and FastNegativeEmbedding: general purpose textual inversions to avoid poor quality

Start creating images

Ready to start creating images? Navigate to the OctoAI Stable Diffusion endpoint to get started.