Image Gen API Docs

Our URL for image generations is at https://image.octoai.run/generate/{engine_id}, where engine_id is one of:

  1. sdxl
    1. this is for SDXL v1.0
  2. sd
    1. this is for SD v1.5
  3. controlnet-sdxl
    1. this is for ControlNet SDXL

This includes text2img, img2img, and inpainting.

cURL Example for text2img

curl -H 'Content-Type: application/json' -H "Authorization: Bearer $OCTOAI_TOKEN" -X POST "https://image.octoai.run/generate/sdxl" \
    -d '{
        "prompt": "The angel of death Hyperrealistic, splash art, concept art, mid shot, intricately detailed, color depth, dramatic, 2/3 face angle, side light, colorful background",
        "negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
        "sampler": "DDIM",
        "cfg_scale": 11,
        "height": 1024,
        "width": 1024,
        "seed": 2748252853,
        "steps": 20,
        "num_images": 1,
        "high_noise_frac": 0.7,
        "strength": 0.92,
        "use_refiner": true,
        "style_preset": "3d-model"
    }' > response.json

Python Example for text2img

import requests
import os
import base64
import io
import PIL.Image


def _process_test(url):
    
    OCTOAI_TOKEN = os.environ.get("OCTOAI_TOKEN")

    payload = {
        "prompt": "Face of a yellow cat, high resolution, sitting on a park bench",
        "negative_prompt": "Blurry photo, distortion, low-res, bad quality",
        "steps": 30,
        "width": 1024,
        "height": 1024,
    }

    headers = {
        "Authorization": f"Bearer {OCTOAI_TOKEN}",
        "Content-Type": "application/json",
    }

    response = requests.post(url, headers=headers, json=payload)

    if response.status_code != 200:
        print(response.text)

    img_list = response.json()["images"]

    for i, img_info in enumerate(img_list):
        img_bytes = base64.b64decode(img_info["image_b64"])
        img = PIL.Image.open(io.BytesIO(img_bytes))
        img.load()
        img.save(f"result_image{i}.jpg")


if __name__ == "__main__":
    _process_test("https://image.octoai.run/generate/sdxl")

Image Generation Arguments

  • prompt: A string describing the image to generate.
    • We currently have a 77 token limit on prompts for SDXL and 231 for SD 1.5
    • You can use prompt weighting, e.g. (A tall (beautiful:1.5) woman:1.0) (some other prompt with weight:0.8) . The weight will be the product of all brackets a token is a member of. The brackets, colons and weights do not count towards the number of tokens.
  • prompt_2: This only applies to SDXL. By default, setting only prompt copies the input to both prompt and prompt_2. When prompt and prompt_2 are both set, they have very different functionality. The second prompt is meant for more human readable descriptions of the desired image.
    • For example, prompt is used for "word salad" style control of the image. This is the type of prompting you are likely familiar with from SD 1.5. Prompts like the following work well:
      prompt = "photorealistic, high definition, masterpiece, sharp lines"
      
      whereas prompt_2 is meant for more human readable descriptions of the desired image. For example:
      prompt_2 = "A portrait of a handsome cat wearing a little hat. The cat is in front of a colorful background.
      
  • negative_prompt Optional: A string indicating a prompt for guidance to steer away from. Unused when not provided.
  • negative_prompt_2: This only applies to SDXL. This prompt is meant for human readable descriptions of what you don’t want the image, e.g. you would say “Low resolution” in negative_prompt then “Bad hands” in negative_prompt_2.
  • sampler Optional: A string specifying which scheduler to use when generating an image. Defaults to "DDIM”
    • Premium samplers (2x price)
      1. DPM_2
      2. DPM_2_ANCESTRAL
      3. DPM_PLUS_PLUS_SDE_KARRAS
      4. HEUN
      5. KLMS
    • Regular samplers
      1. DDIM
      2. DDPM
      3. DPM_PLUS_PLUS_2M_KARRAS
      4. DPM_SINGLE
      5. DPM_SOLVER_MULTISTEP
      6. K_EULER
      7. K_EULER_ANCESTRAL
      8. PNDM
      9. UNI_PC
  • height Optional: An integer specifying the height of the output image. Defaults to 1024 for SDXL and 512 for SD 1.5.
  • width Optional: An integer specifying the width of the output image. Defaults to 1024 for SDXL and 512 for SD 1.5.

Supported Output Resolutions (Width x Height) are as follows:

For SDXL:

(1024, 1024),
(896, 1152),
(832, 1216),
(768, 1344),
(640, 1536),
(1536, 640),
(1344, 768),
(1216, 832),
(1152, 896)

For SD1.5

(512, 512),
(640, 512),
(768, 512),
(512, 704),
(512, 768),
(576, 768),
(640, 768),
(576, 1024),
(1024, 576)

Note: init_image and mask_image will be resized to the specified resolution before applying img2img or inpainting.

  • cfg_scale Optional: How strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt). When not set defaults to 12.
  • steps Optional: How many steps of diffusion to perform. The higher this is, the higher the image clarity will be but proportionally increases the runtime. Defaults to 30 when not set.
  • num_images: An integer describing the number of images to generate.
  • seed Optional: An integer that fixes the random noise of the model. Using the same seed guarantees the same output image, which can be useful for testing or replication. Use null to select a random seed.
  • use_refiner: This only applies to SDXL. A boolean true or false determines whether to use the refiner or not
  • high_noise_frac Optional: This only applies to SDXL. A floating point or integer determining how much noise should be applied using the base model vs. the refiner. A value of 0.8 will apply the base model at 80% and Refiner at 20%. Defaults to 0.8 when not set.
  • checkpoint: Here you can specify a checkpoint either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.
  • loras: Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.
  • textual_inversions: Here you can specify textual inversions and their corresponding trigger words. Note that using a custom asset increases generation time.
  • vae: Here you can specify variational autoencoders. Note that using a custom asset increases generation time.

Here’s an example of how to mix OctoAI assets (checkpoints, loras, and textual_inversions) in the same API request. Note how octoai assets require an “octoai:” prefix but your private assets don’t need that prefix. Asset names need to be unique per account.

payload = {
    ...
		"checkpoint": "octoai:realcartoon",
    "loras": {
        "octoai:crayon-style": 0.7,
        "octoai:paint-splash": 0.3
    },
    "textual_inversions": {
        "octoai:NegativeXL": "negativeXL_D",
    },
		"vae": "your_vae_name"
    ...
}
  • style_preset Optional: This only applies to SDXL. Used to guide the output image towards a particular style. Defaults to None.
"base"
"3d-model"
"analog-film"
"anime"
"cinematic"
"comic-book"
"Craft Clay"
"modeling-compound"
"digital-art"
"enhance"
"fantasy-art"
"isometric"
"line-art"
"low-poly"
"neon-punk"
"origami"
"photographic"
"pixel-art"
"tile-texture"
"Advertising"
"Food Photography"
"Real Estate"
"Abstract"
"Cubist"
"Graffiti"
"Hyperrealism"
"Impressionist"
"Pointillism"
"Pop Art"
"Psychedelic"
"Renaissance"
"Steampunk"
"Surrealist"
"Typography"
"Watercolor"
"Fighting Game"
"GTA"
"Super Mario"
"Minecraft"
"Pokémon"
"Retro Arcade"
"Retro Game"
"RPG Fantasy Game"
"Strategy Game"
"Street Fighter"
"Legend of Zelda"
"Architectural"
"Disco"
"Dreamscape"
"Dystopian"
"Fairy Tale"
"Gothic"
"Grunge"
"Horror"
"Minimalist"
"Monochrome"
"Nautical"
"Space"
"Stained Glass"
"Techwear Fashion"
"Tribal"
"Zentangle"
"Collage"
"Flat Papercut"
"Kirigami"
"Paper Mache"
"Paper Quilling"
"Papercut Collage"
"Papercut Shadow Box"
"Stacked Papercut"
"Thick Layered Papercut"
"Alien"
"Film Noir"
"HDR"
"Long Exposure"
"Neon Noir"
"Silhouette"
"Tilt-Shift"
  • init_image Optional: Only applicable for Img2Img and inpainting use cases i.e. to use an image as a starting point for image generation. Argument takes an image encoded as a string in base64 format.
    • Use .jpg format to ensure best latency
  • strength Optional: Only applicable for img2img use cases. A floating point or integer determines how much noise should be applied. Values that approach 1.0 allow for high variation i.e. ignoring the image entirely, but will also produce images that are not semantically consistent with the input and 0.0 keeps the input image as-is. Defaults to 0.8 when not set.
  • mask_image Optional: Only applicable for inpainting use cases i.e. to specify which area of the picture to paint onto. Argument takes an image encoded as a string in base64 format.
    • Use .jpg format to ensure best latency

Prompt weighting

You can emphasize, or de-emphasize, specific words or phrases of the image generation prompt using weighting. To use prompt weighting, format your prompt using parentheses: prompt = "A cat with (long whiskers)"

This emphasizes the phrase “long whiskers” with a weight of 1.1. Adding additional parentheses such as "(((long whiskers)))" performs additional multiples of 1.1, so for 3 sets of parentheses, the weight would be 1.33. More specific weights can also be specified in the form: prompt = "A cat with (long whiskers: 0.8)"

This will weigh all words in the parentheses by a factor of 0.8. Notably, weights do not have to be greater than one. Using a weight of less than 1 will de-emphasize the contained words.

Using weights in negative prompts can also be helpful. For example, you can avoid distorted hands: negative_prompt = "(distorted hands: 1.5)"

Python example for inpainting

import requests
import os
import base64
import io
import PIL.Image

def _process_test(url):
    image = PIL.Image.open("dog.jpg")
    mask = PIL.Image.open("dogmask.jpg")

    # Create a BytesIO buffer to hold the image data
    image_buffer = io.BytesIO()
    image.save(image_buffer, format='JPEG')
    image_bytes = image_buffer.getvalue()
    encoded_image = base64.b64encode(image_bytes).decode('utf-8')

    # Create a BytesIO buffer to hold the image data
    mask_buffer = io.BytesIO()
    mask.save(mask_buffer, format='JPEG')
    mask_bytes = mask_buffer.getvalue()
    encoded_mask = base64.b64encode(mask_bytes).decode('utf-8')

    OCTOAI_TOKEN = os.environ.get("OCTOAI_TOKEN")

    payload = {
        "prompt": "Face of a yellow cat, high resolution, sitting on a park bench",
        "negative_prompt": "Blurry photo, distortion, low-res, bad quality",
        "steps": 30,
        "width": 1024,
        "height": 1024,
        "init_image": encoded_image,
        "mask_image": encoded_mask
    }
    headers = {
        "Authorization": f"Bearer {OCTOAI_TOKEN}",
        "Content-Type": "application/json",
    }

    response = requests.post(url, headers=headers, json=payload)

    if response.status_code != 200:
        print(response.text)

    img_list = response.json()["images"]

    for i, img_info in enumerate(img_list):
        img_bytes = base64.b64decode(img_info["image_b64"])
        img = PIL.Image.open(io.BytesIO(img_bytes))
        img.load()
        img.save(f"result_image{i}.jpg")


if __name__ == "__main__":
    _process_test("https://image.octoai.run/generate/sdxl")

ControlNet

  • We default to ControlNet Canny checkpoint, but you can upload other ControlNet checkpoints into the OctoAI Asset Library and then use those checkpoints at generation time via the parameter checkpoint
  • Make sure to provide your ControlNet mask, e.g. a canny mask or depth mask, in the controlnet_image parameter

Python Example for ControlNet Canny (the default)

import base64
import io
import os
import time

import PIL.Image
import requests

import cv2 as cv2

import matplotlib.pyplot as plt  # Import Matplotlib


def _process_test(endpoint_url):
    image_path = "cat.jpeg"
    img = cv2.imread(image_path)
    img = cv2.resize(img, (1024, 1024)) # Resize to a resolution supported by OctoAI SDXL

    edges = cv2.Canny(img,100,200) # 100 and 200 are thresholds for determining canny edges

    height, width = edges.shape

    # Convert Canny edge map to PIL Image
    edges_image = PIL.Image.fromarray(edges)

    # Create a BytesIO buffer to hold the image data
    image_buffer = io.BytesIO()
    edges_image.save(image_buffer, format='JPEG')
    image_bytes = image_buffer.getvalue()
    encoded_image = base64.b64encode(image_bytes).decode('utf-8')

    model_request = {
        "controlnet_image": encoded_image,
        "prompt": (
            "A photo of a cute tiger astronaut in space"
        ),
        "negative_prompt": "low quality, bad quality, sketches, unnatural",
        "steps": 20, 
        "num_images": 1,
        "seed": 768072361,
        "height": height,
        "width": width,
    }

    prod_token = os.environ.get("OCTOAI_TOKEN")  # noqa
    
    reply = requests.post(
        f"{endpoint_url}",
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {prod_token}",
        },
        json=model_request,
    )

    if reply.status_code != 200:
        print(reply.text)
        exit(-1)

    img_list = reply.json()["images"]
    print(img_list)

    for i, idict in enumerate(img_list):
        ibytes = idict['image_b64']
        img_bytes = base64.b64decode(ibytes)
        img = PIL.Image.open(io.BytesIO(img_bytes))
        img.load()
        img.save(f"result_image{i}.jpg")


if __name__ == "__main__":
    endpoint = "https://image.octoai.run/generate/controlnet-sdxl"

    # Change this line to call either a10 or a100
    _process_test(endpoint)

Python Example for ControlNet Depth

First install the extra dependency for generating depth maps:

  • pip install controlnet-aux==0.0.7
import base64
import io
import os
import time

import PIL.Image
import requests

import cv2 as cv2

def _process_test(endpoint_url):
    image_path = "cat.jpeg"
    img = cv2.imread(image_path)
    img = cv2.resize(img, (1024, 1024)) # Resize to a resolution supported by OctoAI SDXL

    edges = cv2.Canny(img,100,200) # 100 and 200 are thresholds for determining canny edges

    height, width = edges.shape

    # Convert Canny edge map to PIL Image
    edges_image = PIL.Image.fromarray(edges)

    # Create a BytesIO buffer to hold the image data
    image_buffer = io.BytesIO()
    edges_image.save(image_buffer, format='JPEG')
    image_bytes = image_buffer.getvalue()
    encoded_image = base64.b64encode(image_bytes).decode('utf-8')

    model_request = {
        "controlnet_image": encoded_image,
        "prompt": (
            "A photo of a cute tiger astronaut in space"
        ),
        "negative_prompt": "low quality, bad quality, sketches, unnatural",
        "steps": 20, 
        "num_images": 1,
        "seed": 768072361,
        "height": height,
        "width": width,
    }

    prod_token = os.environ.get("OCTOAI_TOKEN")  # noqa
    
    reply = requests.post(
        f"{endpoint_url}",
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {prod_token}",
        },
        json=model_request,
    )

    if reply.status_code != 200:
        print(reply.text)
        exit(-1)

    img_list = reply.json()["images"]
    print(img_list)

    for i, idict in enumerate(img_list):
        ibytes = idict['image_b64']
        img_bytes = base64.b64decode(ibytes)
        img = PIL.Image.open(io.BytesIO(img_bytes))
        img.load()
        img.save(f"result_image{i}.jpg")


if __name__ == "__main__":
    endpoint = "https://image.octoai.run/generate/controlnet-sdxl"

    # Change this line to call either a10 or a100
    _process_test(endpoint)