Health Check Path in Custom Containers

The healthcheck path is be the server route in the container that indicates when the server is ready to receive requests (i.e. when the model is done being set up and all the weights/ assets needed for inference have been loaded).

  • It is strongly recommended that you configure a health check path in your container; otherwise, you will get inferences failures whenever you try to make an inference request before your server is ready for inferences. The only times when you do not need to configure a health check path is when your server becomes ready for inferences instantaneously because your model is tiny (this is typically unlikely to be true for AI use cases).
  • If you define a healthcheck, your endpoint has 5 minutes from the time the the image is pulled to return a 200 OK response. This will mark the endpoint as available, and the same criteria applies for additional replicas. If there are 3 consecutive calls to the healthcheck endpoint that return a non-200 OK status, then the replica will be restarted.
  • You can see an example of a health check path in our Flan T5 container in Advanced: Build a Container from Scratch in Python. The health check path exposed by that container is /healthcheck. After the endpoint is created, one should be able to hit https://<endpoint-name>-<account-id> for a 200 response whenever the server is healthy and ready.