OctoAI today supports 2 types of Endpoints.
- We have a set of templates which are already running with no cold start.
- And a set of templates which requires users to clone them to their account before use. You will experience cold start upon first use.
You can read more about them in Quickstart Template Endpoints
See Endpoint Inference for details on how to use the Python SDK for inference.
Most importantly you can create your endpoints in two ways.
If you're starting from Python code and would like help in turning it into an OctoAI endpoint, see this guide for how our SDK & CLI can support you.
Of if you already have a container, or would like to build one yourself we can run any containers with an HTTP server written in any language, as long as your container is built on a GPU and comes with a declarative configuration of which port is exposed for inferences.
If you already have a container in hand, check out our guide for deploying an already-prepared container to OctoAI.
Updated 19 days ago