runtime#

In order to perform baseline evaluations and stress tests Efemarai needs to run your models. To do that you need to provide two Python functions in your model repository. The first one to load the model and the second one to make predictions.

Example

Here is a quick example defining the runtime for an object detection model:

  runtime:
    image: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-runtime
    device: gpu
    load:
      entrypoint: inference:get_model  # inference.py is in model repository
      inputs:
        - name: params_url
          value: ${model.files.params.url}
      output:
        name: model
    predict:
      entrypoint: inference:predict
      inputs:
        - name: model
          value: ${model.runtime.load.output.model}
        - name: datapoints
          value: ${datapoints}
      output:
        name: predictions

Properties

  • image: Docker image on top of which the runtime environment for your model will be built

  • device: the device to run inference run on. The device is always passed as an input argument both to load and predict. Supported devices:

    • cpu - this is the default device

    • gpu - a GPU is provisioned to the runtime environment and it is up to the user to move the loaded model and input/output data to/from the specified device.

  • batch (optional): batching related information. If not provided batching is not performed.

  • load: specifies the user function to loading the model

  • predict: specifies the user function to make predictions with the model