3-tier compute-based pricing, with a generous amount of free compute credits upfront

OctoAI’s pre-built Quickstart templates are free to use as you prototype your app

You can run inferences on OctoAI’s quickstart templates for free, as you prototype your app (see Welcome to the OctoAI compute service! 🐙). We offer quickstart templates for a wide range of use cases, such as image generation and text generation. These templates are rate-limited and are meant for prototyping use only, so make sure to clone them into an endpoint in your own account for production use. When cloning a template, you can customize the autoscaling and privacy settings of your endpoint.

3-tier Compute Pricing Structure

Starting June 14, 2023, we will bill for usage of 3 tiers of compute:

  1. Large: this maps to an A100 GPU with 80GB memory and is priced at $0.00145 per second (~$5.20 per hour)
  2. Medium: this maps to an A10 GPU with 24GB memory and is priced at $0.00032 per second (~$1.15 per hour)
  3. Small: this maps to a T4 GPU with 16GB memory and is priced at $0.00011 per second (~$0.40 per hour)

We bill by the second of compute usage, starting at the time when the endpoint is ready for inferences.

  • In other words, we bill for the total of inference duration and timeout duration.
  • We do NOT bill for the duration of cold start.
  • The time when the endpoint is ready for inferences is when the healthcheck on your endpoint starts to return a 200; or in the case where your endpoint has no healthcheck, the time when you see the "Replica is running" log line in your Events tab in the UI.

Cloned templates already have a pre-set hardware/ pricing tier:

If you instead create an endpoint from a custom endpoint, you can choose between the three tiers.

We are giving users $11.00 worth of free compute credits upon first use of our service. That is equivalent to 2+ hours of compute on our large tier hardware, 9+ hours of compute on our medium tier hardware, or 27+ hours of compute on our small tier hardware. The credits expire in about one month and do not renew.

When you’re about to run out your free credits, we will prompt you to put down your credit card, which you can do in your Account Usage page.

  • You can always check how many credits you have remaining in the top-level banner in the UI (see screenshot below).
  • If you do not put down a valid credit card before your credits run out, we will suspend your account and automatically terminate all your endpoints.

After you run out of free credits, we will start billing for compute seconds your credit card on a monthly basis.

You can view your usage to date, add a credit card, and view invoices in your Account Usage page anytime.

Contact Sales for enterprise tier pricing

For enterprise users, we offer additional features, such as:

  • Inferences run in your private environment
  • Our team of compiler experts optimize your model to run faster and cheaper
  • Hardware is reserved for your account over a time period
  • You receive priority access to our Customer Experience and Engineering teams
  • Potential discounts for high-volume, committed spend