added

June 12, 2023

  • Join us for the OctoAI compute service public beta launch this Wednesday, June 14th! Register here.
  • With the launch of our service, changes will be made to our billing. You can find pricing plans and hardware options here. Changes and new user incentives taken into immediate effect are noted below:
    • Tomorrow, June 13th, any existing endpoints will be set to min replicas=0 so that you are not billed for an instance unintentionally left active and running. Be prepared for a cold start before your first inference and reset to min replicas=1 if you prefer to keep the instance warm.
    • Every user who logs in during public beta will receive credits for 2 free compute hrs on A100 (or 10+ hrs on A10!) to use in their first two weeks.
    • The first 500 users to create a new endpoint will receive credits for 12 free compute hrs on A100 (or 50+ hrs on A10!) to use within their first month.
  • You now have two options to integrate OctoAI endpoints into your application:
    • Our new Python client (supports synchronous inference). Read more about it here.
    • Our HTTP REST API now supports both synchronous and asynchronous calls allowing users to request inference without persisting a connection, poll for status, and retrieve the completed prediction data. This is most effective when managing longer running requests.Read more about it here.
  • We’ve updated our Whisper model to be much faster - don't worry, the input / output schema is the same!
  • We've also added MPT 7B and Vicuña 7B as new quickstart templates as better alternatives to Dolly, which will be removed soon.