Image Gen API Docs
Our URL for image generations is at https://image.octoai.run/generate/{engine_id}
, where engine_id is one of:
sdxl
- this is for SDXL v1.0
sd
- this is for SD v1.5
controlnet-sdxl
- this is for ControlNet SDXL
This includes text2img, img2img, and inpainting.
cURL Example for text2img
curl -H 'Content-Type: application/json' -H "Authorization: Bearer $OCTOAI_TOKEN" -X POST "https://image.octoai.run/generate/sdxl" \
-d '{
"prompt": "The angel of death Hyperrealistic, splash art, concept art, mid shot, intricately detailed, color depth, dramatic, 2/3 face angle, side light, colorful background",
"negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
"sampler": "DDIM",
"cfg_scale": 11,
"height": 1024,
"width": 1024,
"seed": 2748252853,
"steps": 20,
"num_images": 1,
"high_noise_frac": 0.7,
"strength": 0.92,
"use_refiner": true,
"style_preset": "3d-model"
}' > response.json
Python Example for text2img
import requests
import os
import base64
import io
import PIL.Image
def _process_test(url):
OCTOAI_TOKEN = os.environ.get("OCTOAI_TOKEN")
payload = {
"prompt": "Face of a yellow cat, high resolution, sitting on a park bench",
"negative_prompt": "Blurry photo, distortion, low-res, bad quality",
"steps": 30,
"width": 1024,
"height": 1024,
}
headers = {
"Authorization": f"Bearer {OCTOAI_TOKEN}",
"Content-Type": "application/json",
}
response = requests.post(url, headers=headers, json=payload)
if response.status_code != 200:
print(response.text)
img_list = response.json()["images"]
for i, img_info in enumerate(img_list):
img_bytes = base64.b64decode(img_info["image_b64"])
img = PIL.Image.open(io.BytesIO(img_bytes))
img.load()
img.save(f"result_image{i}.jpg")
if __name__ == "__main__":
_process_test("https://image.octoai.run/generate/sdxl")
Image Generation Arguments
prompt
: A string describing the image to generate.- We currently have a 77 token limit on prompts for SDXL and 231 for SD 1.5
- You can use prompt weighting, e.g.
(A tall (beautiful:1.5) woman:1.0) (some other prompt with weight:0.8)
. The weight will be the product of all brackets a token is a member of. The brackets, colons and weights do not count towards the number of tokens.
prompt_2
: This only applies to SDXL. By default, setting onlyprompt
copies the input to bothprompt
andprompt_2
. Whenprompt
andprompt_2
are both set, they have very different functionality. The second prompt is meant for more human readable descriptions of the desired image.- For example,
prompt
is used for "word salad" style control of the image. This is the type of prompting you are likely familiar with from SD 1.5. Prompts like the following work well:
whereasprompt = "photorealistic, high definition, masterpiece, sharp lines"
prompt_2
is meant for more human readable descriptions of the desired image. For example:prompt_2 = "A portrait of a handsome cat wearing a little hat. The cat is in front of a colorful background.
- For example,
negative_prompt
Optional
: A string indicating a prompt for guidance to steer away from. Unused when not provided.negative_prompt_2
: This only applies to SDXL. This prompt is meant for human readable descriptions of what you don’t want the image, e.g. you would say “Low resolution” innegative_prompt
then “Bad hands” innegative_prompt_2
.sampler
Optional
: A string specifying which scheduler to use when generating an image. Defaults to "DDIM”- Premium samplers (2x price)
- DPM_2
- DPM_2_ANCESTRAL
- DPM_PLUS_PLUS_SDE_KARRAS
- HEUN
- KLMS
- Regular samplers
- DDIM
- DDPM
- DPM_PLUS_PLUS_2M_KARRAS
- DPM_SINGLE
- DPM_SOLVER_MULTISTEP
- K_EULER
- K_EULER_ANCESTRAL
- PNDM
- UNI_PC
- Premium samplers (2x price)
height
Optional
: An integer specifying the height of the output image. Defaults to1024
for SDXL and512
for SD 1.5.width
Optional
: An integer specifying the width of the output image. Defaults to1024
for SDXL and512
for SD 1.5.
Supported Output Resolutions (Width x Height) are as follows:
For SDXL:
(1024, 1024),
(896, 1152),
(832, 1216),
(768, 1344),
(640, 1536),
(1536, 640),
(1344, 768),
(1216, 832),
(1152, 896)
For SD1.5
(512, 512),
(640, 512),
(768, 512),
(512, 704),
(512, 768),
(576, 768),
(640, 768),
(576, 1024),
(1024, 576)
Note: init_image
and mask_image
will be resized to the specified resolution before applying img2img or inpainting.
cfg_scale
Optional
: How strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt). When not set defaults to12
.steps
Optional
: How many steps of diffusion to perform. The higher this is, the higher the image clarity will be but proportionally increases the runtime. Defaults to30
when not set.num_images
: An integer describing the number of images to generate.seed
Optional
: An integer that fixes the random noise of the model. Using the same seed guarantees the same output image, which can be useful for testing or replication. Usenull
to select a random seed.use_refiner
: This only applies to SDXL. A booleantrue
orfalse
determines whether to use the refiner or nothigh_noise_frac
Optional
: This only applies to SDXL. A floating point or integer determining how much noise should be applied using the base model vs. the refiner. A value of0.8
will apply the base model at 80% and Refiner at 20%. Defaults to0.8
when not set.checkpoint
: Here you can specify a checkpoint either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.loras
: Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.textual_inversions
: Here you can specify textual inversions and their corresponding trigger words. Note that using a custom asset increases generation time.vae
: Here you can specify variational autoencoders. Note that using a custom asset increases generation time.
Here’s an example of how to mix OctoAI assets (checkpoints, loras, and textual_inversions) in the same API request. Note how octoai assets require an “octoai:” prefix but your private assets don’t need that prefix. Asset names need to be unique per account.
payload = {
...
"checkpoint": "octoai:realcartoon",
"loras": {
"octoai:crayon-style": 0.7,
"octoai:paint-splash": 0.3
},
"textual_inversions": {
"octoai:NegativeXL": "negativeXL_D",
},
"vae": "your_vae_name"
...
}
style_preset
Optional
: This only applies to SDXL. Used to guide the output image towards a particular style. Defaults to None.
"base"
"3d-model"
"analog-film"
"anime"
"cinematic"
"comic-book"
"Craft Clay"
"modeling-compound"
"digital-art"
"enhance"
"fantasy-art"
"isometric"
"line-art"
"low-poly"
"neon-punk"
"origami"
"photographic"
"pixel-art"
"tile-texture"
"Advertising"
"Food Photography"
"Real Estate"
"Abstract"
"Cubist"
"Graffiti"
"Hyperrealism"
"Impressionist"
"Pointillism"
"Pop Art"
"Psychedelic"
"Renaissance"
"Steampunk"
"Surrealist"
"Typography"
"Watercolor"
"Fighting Game"
"GTA"
"Super Mario"
"Minecraft"
"Pokémon"
"Retro Arcade"
"Retro Game"
"RPG Fantasy Game"
"Strategy Game"
"Street Fighter"
"Legend of Zelda"
"Architectural"
"Disco"
"Dreamscape"
"Dystopian"
"Fairy Tale"
"Gothic"
"Grunge"
"Horror"
"Minimalist"
"Monochrome"
"Nautical"
"Space"
"Stained Glass"
"Techwear Fashion"
"Tribal"
"Zentangle"
"Collage"
"Flat Papercut"
"Kirigami"
"Paper Mache"
"Paper Quilling"
"Papercut Collage"
"Papercut Shadow Box"
"Stacked Papercut"
"Thick Layered Papercut"
"Alien"
"Film Noir"
"HDR"
"Long Exposure"
"Neon Noir"
"Silhouette"
"Tilt-Shift"
init_image
Optional
: Only applicable for Img2Img and inpainting use cases i.e. to use an image as a starting point for image generation. Argument takes an image encoded as a string in base64 format.- Use .jpg format to ensure best latency
strength
Optional
: Only applicable for img2img use cases. A floating point or integer determines how much noise should be applied. Values that approach1.0
allow for high variation i.e. ignoring the image entirely, but will also produce images that are not semantically consistent with the input and0.0
keeps the input image as-is. Defaults to0.8
when not set.mask_image
Optional
: Only applicable for inpainting use cases i.e. to specify which area of the picture to paint onto. Argument takes an image encoded as a string in base64 format.- Use .jpg format to ensure best latency
Prompt weighting
You can emphasize, or de-emphasize, specific words or phrases of the image generation prompt using weighting. To use prompt weighting, format your prompt using parentheses: prompt = "A cat with (long whiskers)"
This emphasizes the phrase “long whiskers” with a weight of 1.1
. Adding additional parentheses such as "(((long whiskers)))"
performs additional multiples of 1.1
, so for 3 sets of parentheses, the weight would be 1.33
. More specific weights can also be specified in the form: prompt = "A cat with (long whiskers: 0.8)"
This will weigh all words in the parentheses by a factor of 0.8
. Notably, weights do not have to be greater than one. Using a weight of less than 1 will de-emphasize the contained words.
Using weights in negative prompts can also be helpful. For example, you can avoid distorted hands: negative_prompt = "(distorted hands: 1.5)"
Python example for inpainting
import requests
import os
import base64
import io
import PIL.Image
def _process_test(url):
image = PIL.Image.open("dog.jpg")
mask = PIL.Image.open("dogmask.jpg")
# Create a BytesIO buffer to hold the image data
image_buffer = io.BytesIO()
image.save(image_buffer, format='JPEG')
image_bytes = image_buffer.getvalue()
encoded_image = base64.b64encode(image_bytes).decode('utf-8')
# Create a BytesIO buffer to hold the image data
mask_buffer = io.BytesIO()
mask.save(mask_buffer, format='JPEG')
mask_bytes = mask_buffer.getvalue()
encoded_mask = base64.b64encode(mask_bytes).decode('utf-8')
OCTOAI_TOKEN = os.environ.get("OCTOAI_TOKEN")
payload = {
"prompt": "Face of a yellow cat, high resolution, sitting on a park bench",
"negative_prompt": "Blurry photo, distortion, low-res, bad quality",
"steps": 30,
"width": 1024,
"height": 1024,
"init_image": encoded_image,
"mask_image": encoded_mask
}
headers = {
"Authorization": f"Bearer {OCTOAI_TOKEN}",
"Content-Type": "application/json",
}
response = requests.post(url, headers=headers, json=payload)
if response.status_code != 200:
print(response.text)
img_list = response.json()["images"]
for i, img_info in enumerate(img_list):
img_bytes = base64.b64decode(img_info["image_b64"])
img = PIL.Image.open(io.BytesIO(img_bytes))
img.load()
img.save(f"result_image{i}.jpg")
if __name__ == "__main__":
_process_test("https://image.octoai.run/generate/sdxl")
ControlNet
- We default to ControlNet Canny checkpoint, but you can upload other ControlNet checkpoints into the OctoAI Asset Library and then use those checkpoints at generation time via the parameter
checkpoint
- Make sure to provide your ControlNet mask, e.g. a canny mask or depth mask, in the
controlnet_image
parameter
Python Example for ControlNet Canny (the default)
import base64
import io
import os
import time
import PIL.Image
import requests
import cv2 as cv2
import matplotlib.pyplot as plt # Import Matplotlib
def _process_test(endpoint_url):
image_path = "cat.jpeg"
img = cv2.imread(image_path)
img = cv2.resize(img, (1024, 1024)) # Resize to a resolution supported by OctoAI SDXL
edges = cv2.Canny(img,100,200) # 100 and 200 are thresholds for determining canny edges
height, width = edges.shape
# Convert Canny edge map to PIL Image
edges_image = PIL.Image.fromarray(edges)
# Create a BytesIO buffer to hold the image data
image_buffer = io.BytesIO()
edges_image.save(image_buffer, format='JPEG')
image_bytes = image_buffer.getvalue()
encoded_image = base64.b64encode(image_bytes).decode('utf-8')
model_request = {
"controlnet_image": encoded_image,
"prompt": (
"A photo of a cute tiger astronaut in space"
),
"negative_prompt": "low quality, bad quality, sketches, unnatural",
"steps": 20,
"num_images": 1,
"seed": 768072361,
"height": height,
"width": width,
}
prod_token = os.environ.get("OCTOAI_TOKEN") # noqa
reply = requests.post(
f"{endpoint_url}",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {prod_token}",
},
json=model_request,
)
if reply.status_code != 200:
print(reply.text)
exit(-1)
img_list = reply.json()["images"]
print(img_list)
for i, idict in enumerate(img_list):
ibytes = idict['image_b64']
img_bytes = base64.b64decode(ibytes)
img = PIL.Image.open(io.BytesIO(img_bytes))
img.load()
img.save(f"result_image{i}.jpg")
if __name__ == "__main__":
endpoint = "https://image.octoai.run/generate/controlnet-sdxl"
# Change this line to call either a10 or a100
_process_test(endpoint)
Python Example for ControlNet Depth
First install the extra dependency for generating depth maps:
pip install controlnet-aux==0.0.7
import base64
import io
import os
import time
import PIL.Image
import requests
import cv2 as cv2
def _process_test(endpoint_url):
image_path = "cat.jpeg"
img = cv2.imread(image_path)
img = cv2.resize(img, (1024, 1024)) # Resize to a resolution supported by OctoAI SDXL
edges = cv2.Canny(img,100,200) # 100 and 200 are thresholds for determining canny edges
height, width = edges.shape
# Convert Canny edge map to PIL Image
edges_image = PIL.Image.fromarray(edges)
# Create a BytesIO buffer to hold the image data
image_buffer = io.BytesIO()
edges_image.save(image_buffer, format='JPEG')
image_bytes = image_buffer.getvalue()
encoded_image = base64.b64encode(image_bytes).decode('utf-8')
model_request = {
"controlnet_image": encoded_image,
"prompt": (
"A photo of a cute tiger astronaut in space"
),
"negative_prompt": "low quality, bad quality, sketches, unnatural",
"steps": 20,
"num_images": 1,
"seed": 768072361,
"height": height,
"width": width,
}
prod_token = os.environ.get("OCTOAI_TOKEN") # noqa
reply = requests.post(
f"{endpoint_url}",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {prod_token}",
},
json=model_request,
)
if reply.status_code != 200:
print(reply.text)
exit(-1)
img_list = reply.json()["images"]
print(img_list)
for i, idict in enumerate(img_list):
ibytes = idict['image_b64']
img_bytes = base64.b64decode(ibytes)
img = PIL.Image.open(io.BytesIO(img_bytes))
img.load()
img.save(f"result_image{i}.jpg")
if __name__ == "__main__":
endpoint = "https://image.octoai.run/generate/controlnet-sdxl"
# Change this line to call either a10 or a100
_process_test(endpoint)
Updated 5 days ago