Guided Stable Diffusion on TypeScript SDK

Canny Edge Maps on OctoAI TypeScript SDK

For examples of the types of outputs you can expect, please visit the Canny Demo or the MLSD Demo at OctoAI.

This is a guide for running inferences to generate an image that takes either an edge map from Canny using another image or line guidance from another image using MLSD and a prompt. Our first example includes how to use an image created from our Stable Diffusion - TextToImage TypeScript SDK Inferences guide to do so.

As a next step, Asynchronous Inference Using the TypeScript SDK works very well with guided stable diffusion.

Requirements

Canny: Guided TextToImage with Canny Edge Maps

Canny takes an image as base64, a prompt, and a number of de-noising steps (1-500). Our first step, is converting an image to base64. For this example, you can use the Stable Diffusion - TextToImage TypeScript SDK Inferencesguide to generate an image if you don't have one you'd like to use.

Let's go checkout what AI generated for us.  Awww, what a sweet face!

Another StableDiffusion generated poodle on a laptop.

Please verify the URLs are correct when running this code sample if you run into any unexpected errors.

Please reference the TypeScript SDK Reference for more information on classes and methods. These same endpoints for accessing inferences and health checks are also available on your cloned endpoints as well, allowing you to make custom applications using our QuickStart templates without rate limiting.

import { Client } from "@octoai/client";
import * as fs from "fs";

/**
 * This method will create and save an image given inputs including a prompt and an image in base64.
 *
 * @param inputs: includes the "image" as a base64 string and "prompt" as a string.
 * @param fileName: File you would like to write the canny outputs to.
  */
async function saveCannyImage(inputs: Record<string, any>, fileName: string) {
    // The client will also identify if OCTOAI_TOKEN is set as an environment variable
    // If you don't have one set, you can use the code snippet below, editing the token, to send a token to the client.
    // const OCTOAI_TOKEN = "API Token goes here from guide on creating OctoAI API token";
    // const client = new Client(OCTOAI_TOKEN);
    const client = new Client();

    // Verify the URLs by checking the QuickStart templates prior to running these examples
    const cannyHealthCheck = "https://controlnet-canny-demo-kk0powt97tmb.octoai.run/healthcheck";
    const cannyEndpoint = "https://controlnet-canny-demo-kk0powt97tmb.octoai.run/predict";

    // First, let's make sure the endpoint is available for inferences.
    if (await client.healthCheck(cannyHealthCheck) === 200) {
        // Canny returns a dictionary with the prediction time and an array of images in base64.
        const outputs: any = await client.infer(cannyEndpoint, inputs);
        // In the "images" array, the 0 index shows the edges used as a guide.
        // Index 1-3 contains our generated images.
        // Then let's save the image and see what we got!
        let buffer = Buffer.from(outputs.images[3], "base64");
        fs.writeFileSync(fileName, buffer);
    }
}

Alright, we have all our imports and URLs set, a basic method for running inference verifying the endpoint is healthy first, and a way to write our file. Let's go ahead and run an image in base64 format through and prepare ourself for more poodles!

const cutePoodle = fs.readFileSync("./cute_poodle.png", {encoding: "base64",});
// By checking the QuickStart templates, we see the inputs Canny accepts.
const snowyPoodleInputs = {
    // "image" and "prompt" are both required fields for Canny.
    "image": cutePoodle,  // Our base64 image from Stable Diffusion guide
    "prompt": "A high-quality photograph of a poodle puppy on a wooden chair In the background are snow-covered mountains.",
    "num_steps": 10  //  Number of de-noising steps, 1-500, optional
};
// Prepare yourself for cute!
saveCannyImage(snowyPoodleInputs, "snowy_poodle.png").then();

Great! Let's take him on a vacation to the mountains using Canny.

Vacation poodle!

Vacation poodle!

MLSD: Guided TextToImage Generation with Straight Line Detection

Let's move on from dogs into some scenery generation using straight line detection with MLSD controlled image generation. MLSD takes an image (to guide the straight lines), a prompt for the image to be generated, a negative prompt of things we don't want to see, a number of de-noising steps from 1-500, and a classifier-free guidance scale from 0.1-30.

Using stable diffusion again, I generated an image of a luxury apartment to serve as the line base for our MLSD-generated image. This has lots of good straight edges to guide the MLSD image generation.

import { Client } from "@octoai/client";
import * as fs from "fs";

/**
 * This method will create and save an image given inputs including a prompt and an image in base64.
 *
 * @param inputs: includes the "image" as a base64 string and "prompt" as a string.
 * @param fileName: File you would like to write the canny outputs to.
 */
async function saveMLSDImage(inputs: Record<string, any>, fileName: string) {
    const client = new Client();

    // Verify the URLs by checking the QuickStart templates prior to running these examples
    const MLSDHealthCheck = "https://controlnet-mlsd-demo-kk0powt97tmb.octoai.run/healthcheck";
    const MLSDEndpoint = "https://controlnet-mlsd-demo-kk0powt97tmb.octoai.run/predict";

    // First, let's make sure the endpoint is available for inferences.
    if (await client.healthCheck(MLSDHealthCheck) === 200) {
        // Canny returns a dictionary with the prediction time and an array of images in base64.
        const outputs: any = await client.infer(MLSDEndpoint, inputs);
        // In the "images" array, the 0 index shows the edges used as a guide.
        // Index 1-3 contains our generated images.
        // Then let's save the image and see what we got!
        let buffer = Buffer.from(outputs.images[3], "base64");
        fs.writeFileSync(fileName, buffer);
    }
}

Let's go ahead and run this program and generate it in ryokan-style decor.


const luxuryApartment = fs.readFileSync("./luxury_apartment.png", {encoding: "base64",});
const luxInputs = {
    // "image" and "prompt" are required fields for MLSD
    "image": luxuryApartment,  // Our base64 image from Stable Diffusion
    "prompt": "A high quality photo of an apartment with Japanese ryokan decor",
    "negative_prompt": "lowres, cropped, worst quality, low quality",
    "num_steps": 20,  // Number of de-noising steps, 1-500, optional
    "guidance_scale": 20,  // Classifier-free guidance scale from 0.1-30
}

As you can see, MLSD guides the image generation based on the lines in the provided image for the input, but all the stylings have shifted to the stylings of a traditional Japanese hotel.

Canny and MLSD Outputs

Different levels of processing from Canny and MLSD may match your needs. The 0 element in the images array from the json returned from outputs represents the edge maps in base64 string format, while the 3rd is generally the most processed image.

Using interfaces around expected outputs can be helpful in TypeScript, however is outside of the scope of this guide.

{
  prediction_time_ms: 2709.765884,
  images: ["0 element is the edge maps used to generate the rest of the images in Base64",
           "1 - 3 elements are base64 images at different stages.",
 }