You can immediately starting prototyping your app using open-source models that have been pre-optimized by OctoAI, by either clicking on (1) or (2) from the home page to access the quickstart templates. You can try these out without installing anything directly from your browser.
These templates are available for a wide range of use cases, such as image generation and text generation.
Let’s try out the Stable Diffusion template for image generation. Click the Generate button labeled (1) below.
Voila! We just generated an image!
If you would run an inference programmatically, scroll down to find an example curl command that works on a Mac. All of our endpoints, including template ones, speak simple JSON over HTTP.
The full API spec for each template can be found at the /docs route for the endpoint. For example, the Stable Diffusion template is hosted at https://stable-diffusion-demo-kk0powt97tmb.octoai.run/predict, so the corresponding API docs can be found at https://stable-diffusion-demo-kk0powt97tmb.octoai.run/docs-- you will need to fill out your auth token in the "Password" field when navigating to this link. Keep the Username field empty.
Note that our template endpoints are per-IP rate-limited and meant for testing & experimentation only in public beta. If you want to experience the same APIs without a rate limit, click the Clone button (labeled as 1 below) to produce a copy of the endpoint that is custom to you.
Configure your name, autoscaling, and privacy settings for your endpoint, and then click Clone.
- The name of the endpoint will be part of the URL you'll later run inferences against.
- The minimum replicas defaults to 0, which means we autoscale down to 0 whenever your endpoint is not receiving requests from your users and the timeout period has passed (this is a way to keep your costs down). Minimum replicas should be set to a higher number if you want to ensure highest uptime for your users and avoid cold starts.
- The maximum number of replicas should be set based on how many maximum simultaneous inferences you expect in production. For example, if you expect to handle up to 250 inferences per minute, and each inference takes 1 second for your model on a GPU, then you should set maximum replicas to 5. This is because 250 inferences per minute translates to 250 / 60 = 4.167 inferences per second and 1 GPU can only handle one request at a time in this case.
- The idle timeout should be set to the number of seconds you’d like our servers to wait while no inference requests are received before autoscaling down to your min replicas. We default to 3600 seconds (one hour) in order to optimize cold start for more compute-intensive AI applications. Set idle timeout to a lower number (e.g. 60 seconds) if your container is small or you want to make sure to control costs more aggressively.
- Whether this endpoint requires an API token or not. If set to public, anyone can make inferences of this endpoint.
- Finally just hit clone to proceed.
Now go ahead and click Clone, and you'll see the Stable Diffusion endpoint in your own account now with full controls over the settings.
Below is an example CURL command that works on a Mac. The example CURL requires that you have
Note that you need to edit the CURL to use your own API token. Read How to create an OctoAI API token to learn how to get a token and store it in your local environment, before running the CURL on a private endpoint.
Updated 26 days ago