runtime
#
In order to perform baseline evaluations and stress tests Efemarai needs to run your models. To do that you need to provide two Python functions in your model repository. The first one to load the model and the second one to make predictions.
Example
Here is a quick example defining the runtime for an object detection model:
runtime:
image: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-runtime
device: gpu
load:
entrypoint: inference:get_model # inference.py is in model repository
inputs:
- name: params_url
value: ${model.files.params.url}
output:
name: model
predict:
entrypoint: inference:predict
inputs:
- name: model
value: ${model.runtime.load.output.model}
- name: datapoints
value: ${datapoints}
output:
name: predictions
Properties
image
: Docker image on top of which the runtime environment for your model will be builtdevice
: the device to run inference run on. Thedevice
is always passed as an input argument both toload
andpredict
. Supported devices:cpu
- this is the default devicegpu
- a GPU is provisioned to the runtime environment and it is up to the user to move the loaded model and input/output data to/from the specified device.
batch
(optional): batching related information. If not provided batching is not performed.load
: specifies the user function to loading the modelpredict
: specifies the user function to make predictions with the model