Guided Stable Diffusion: TextToImage Template on Python SDK

Canny-Guided Stable Diffusion and MLSD-Guided Stable Diffusion on Python SDK

For examples of the types of outputs you can expect, please visit the Canny Demo or the MLSD Demo at OctoAI.

This document provides a guide for running inferences to generate an image that takes either an edge map from Canny using another image or line guidance from another image using MLSD and a prompt. Our first example includes how to use an image created from our TextToImage Template SDK Inferences guide to do so.

As a next step, guided stable diffusion is an excellent candidate for Asynchronous Inference Using Python SDK.

Requirements

Please follow How to create an OctoAI API token if you don't have one already.

Canny: Guided TextToImage with Canny Edge Maps

Canny takes an image as base64, a prompt, and a number of de-noising steps (1-500). Our first step, is converting an image to base64. The SDK provides a library for this. Let's go ahead and use Stable Diffusion TextToImage to generate an image for our Canny to generate an image. Please verify the URLs are correct when running this code sample yourself.

Please reference the Overview: QuickStart Templates on SDK to Run Inferences for details on finding endpoint URLs for QuickStart and cloned templates and the SDK Reference for more information on classes. These same endpoints for accessing inferences and health checks are also available on your cloned endpoints as well, allowing you to make custom applications using our QuickStart templates without rate limiting.

import base64

from octoai.client import Client
from octoai.types import Image

# Verify the URLs by checking the QuickStart templates prior to running these examples
sd_health_check = "https://stable-diffusion-demo-kk0powt97tmb.octoai.run/healthcheck"
canny_health_check = "https://controlnet-canny-demo-kk0powt97tmb.octoai.run/healthcheck"
stable_diffusion_url = "https://stable-diffusion-demo-kk0powt97tmb.octoai.run/predict"
canny_url = "https://controlnet-canny-demo-kk0powt97tmb.octoai.run/predict"  

OCTOAI_API_TOKEN = "API Token goes here from guide on creating OctoAI API token"
# The client will also identify if OCTOAI_TOKEN is set as an environment variable
client = Client(token=OCTOAI_API_TOKEN)

Alright, we have all our imports and URLs set. Let's go ahead and run an inference against the Stable Diffusion QuickStart Template and use the image to run it again through Canny. For more information about using Stable Diffusion, check the TextToImage SDK QuickStart Template Guide.

def stable_diffusion_poodle() -> base64:
    """ This method will create and save an image of a realistic poodle.
    return: Image of a poodle
    rtype: base64
    """
    # By checking the QuickStart templates, we see the inputs Stable Diffusion accepts.
    sd_inputs = {
        "prompt": "A high-quality, realistic photograph of a a poodle",
        "negative_prompt": "Blurry photo, distortion, low-res, bad quality",
        "guidance_scale": 7.5,
        "num_inference_steps": 40,
        "scheduler": "DPMSolverMultistep"
    }
    # First, let's make sure the endpoint is available for inferences.
    if client.health_check(sd_health_check) != 200:
        return
    # Stable Diffusion, as a TextToImage model, returns an image in base64.
    sd_image_base64 = client.infer(endpoint_url=stable_diffusion_url, inputs=sd_inputs).get("image_0")
    # Saves the image.
    Image.from_base64(sd_image_base64).to_file("sd_poodle.png")
    return sd_image_base64
  
Let's go checkout what AI generated for us.  Awww, what a sweet face!

Great! Let's take him on a vacation to the mountains using Canny.

def canny_poodle(image: base64) -> base64:
    """This method uses an image to generate a modified image such as changing the background.

    image: Image canny uses as a base, in this case a poodle
    type: base64
    return: Modified image
    rtype: base64
    """

    canny_inputs = {
      	# "image" and "prompt" are both required fields for Canny.
        "image": image,  # Our base64 image from Stable Diffusion
        "prompt": "A high-quality photograph of a poodle dog.  In the background are snow-covered mountains.",
        "num_steps": 10  # Number of de-noising steps, 1-500, optional
    }
    # Let's make sure the canny endpoint is healthy too.
    if client.health_check(canny_health_check) != 200:
        return
    # Canny returns a dictionary with the prediction time and an array of images in base64.
    canny_response = client.infer(endpoint_url=canny_url, inputs=canny_inputs)
    # In the "images" array, the 0 index shows the edges used as a guide.
    # Index 1-3 contains different levels of processed images with later indices being more processed.
    canny_image_base64 = canny_response.get("images")[1]
    # Our Image library can easily save to a file.
    Image.from_base64(canny_image_base64).to_file("canny_poodle.jpeg")
    return canny_image_base64
  
# Let's run it!
sd_poodle = stable_diffusion_poodle()
canny_poodle(sd_poodle)
Vacation poodle!

Vacation poodle!

MLSD: Guided TextToImage Generation with Straight Line Detection

Let's move on from dogs into some scenery generation using straight line detection with MLSD controlled image generation. MLSD takes an image (to guide the straight lines), a prompt for the image to be generated, a negative prompt of things we don't want to see, a number of de-noising steps from 1-500, and a classifier-free guidance scale from 0.1-30.

Using stable diffusion again, I generated an image of a luxury apartment to serve as the line base for our MLSD-generated image. This has lots of good straight edges to guide the MLSD image generation.

from octoai.client import Client
from octoai.types import Image

client = Client()  # Your API token can be set in the client if not an environment variable.
# We can use our types module to easily load a file in as a base_64 string
image = Image.from_file("luxury_apartment.png").image_b64

# Please verify the URLs on the demo page for MLSD
mlsd_url = "https://controlnet-mlsd-demo-kk0powt97tmb.octoai.run/predict"
mlsd_health_check = "https://controlnet-mlsd-demo-kk0powt97tmb.octoai.run/healthcheck"

mlsd_inputs = {
  # "image" and "prompt" are required fields for MLSD
  "image": image,  # Our base64 image from Stable Diffusion
  "prompt": "A realistic photograph a medieval castle interior. The hearth is aflame.  The furniture is made of wood and covered in animal skins.",
  "negative_prompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
  "num_steps": 20,  # Number of de-noising steps, 1-500, optional
  "guidance_scale": 20,  # Classifier-free guidance scale from 0.1-30
}
# Let's make sure the canny endpoint is healthy too.
if client.health_check(mlsd_health_check) == 200:
  # MLSD returns a dictionary with the prediction time and an array of images in base64.
  mlsd_response = client.infer(endpoint_url=mlsd_url, inputs=mlsd_inputs)
  # In the "images" array, the image at index 0 will show you the lines used as a guide.
  # Index 1 contains our generated image.
  mlsd_image_base64 = mlsd_response.get("images")[1]
  # Our Image library can easily save to a file.
  Image.from_base64(mlsd_image_base64).to_file("hearth.jpeg")
  

When you run this small program, you should get a result similar to the below result in "hearth.jpeg".

Canny and MLSD Outputs

Different levels of processing from Canny and MLSD may match your needs. The 0 element in the images array from the json returned from outputs represents the edge maps in base64 string format, while the 3rd is generally the most processed image.

{
  prediction_time_ms: 2709.765884,
  images: ["0 element is the edge maps used to generate the rest of the images in Base64",
           "1 - 3 elements are base64 images at different stages.",
 }