Cold Start

Quickstart templates have minimal cold start

OctoAI keeps at least one replica warm for our quickstart templates (such as Stable Diffusion and Whisper) so you're less likely to experience cold start latency. If you clone these templates while setting minimum replicas to 1, you should expect the same for your cloned templates. However, if you clone these templates while setting minimum replicas to 0, then you should expect around 30 seconds of cold start for you cloned templates. Cold start latency occurs whenever scaling up a new replica - from 0 to 1, 1 to 2, etc.

Cold start on Custom Containers

We are working hard to get cold start on custom containers down to about 30 seconds. Larger containers may experience longer cold starts as they require more resources and thus more time to initialize before running inference. If cold start is too long for you right now, please ping us in Discord or the chat bubble in the bottom right corner of the UI, so we can onboard you to a feature called Volumes for cold start reduction.