In this tutorial we will show you how to improve your model robustness using the Efemarai platform. All you need is an installed version of the efemarai package and an uploaded project with a domain, a model, and a dataset!
If you have missed the installation steps, refer to our previous tutorial!
Step 1: Understanding model robustness#
When we talk about improving model robustness, we want to minimize the sensitivity that the model has to external factors. Or looking at it from a different perspective - our models should be able to work in a variety of environments and conditions.
A way to specify those specifications is through building a domain. Within the domain we can sequence and define our expectations for the model. Using the sliders we can change the severity of transformations and thus the performance of the model at that level.
After building the right domain we can do several things:
Evaluate initial robustness of your model - Maybe your model is already great!
Identify under-performing regions - You could collect additional data following those specifications.
Keep track of the improvements that are made to the model at every update cycle.
Use the data that makes your model misbehave to improve its robustness straightaway.
In this tutorial, we’ll look at the last part.
Step 2: Kickstarting the initial stress test#
First, let’s run a stress test (if unsure how to do this, checkout the
Getting started tutorial). Click on
Stress Test, choose
the optimal parameters from the drop down menu on the button
Test and start
Alternatively, from the CLI you can run
efemarai test run test.yaml with
test.yaml holding the specifications for the test. And for the SDK call you
can look into
You should now see a new test run appear with the
State of the run going through:
Starting- We are kicking off things on our side to make sure we can perform inference on your model.
Evaluating- We are evaluating the model against the dataset you have chosen. We do this to establish a baseline behaviour.
Testing- We are generating samples that are rare, and evaluating its performance.
GeneratingReport- Crunching all of the numbers.
Finished- The stress test has ended successfully.
Failed- Something has gone wrong. Hover to see the error message.
Stopped- The stress test was manually stopped.
Now, let’s wait for the stress test to finish - you will see a green label
Step 3: Download generated dataset#
If you navigate to the right side to the
Download data button, you will see
a pop-up window with several options:
Dataset Format- the format in which to export the dataset.
Minimum Score- the threshold for failure score above which to download the samples. If you want to download more critical samples you can specify a higher value in the range
[-1, 1]. Higher values mean the model performs worse on these examples (we have seen that those contribute more to a new model).
Include original dataset- a boolean whether to include the original dataset or not.
After that click on
Export. This will activate the download of your new dataset!
So let’s now download the original dataset with all of the samples above
that have negative impact on the model.
Step 4: Retrain your model#
After fetching the enriched dataset, let’s retrain the model. Kickoff a standard model retraining with this new dataset.
efemarai.yaml configuration to have the new model name and weights
path and re-upload it (e.g.
efemarai model create efemarai.yaml), or
alternatively, use the SDK to upload the model directly from your code with
Voilà! Your new model is uploaded.
Feel free to edit the details of the model:
Name- [required] the name of your new model (must be the same as the name in
Description- [optional] description of the model
Version- [optional] version of the model
Repository URL- [required] the link to the model repo in github
Repository Branch- [optional] the name of the branch in the repo
Repository Access Token- [required] an access token for the repo
Step 5: Comparing robustness with a new model#
Once you have uploaded the new model, which has been trained on the new data, let’s see how it performs!
We need to start another stress test and compare the results to the original.
Stress Tests and choose the new model and the same parameters as
before. Once the run has finished, open the stress test by clicking on its
name. This will open a window with performance metrics and graphs of your
model. You can click on each of the various sections:
Failure Score- How much impact does the domain impact the model performance?
Vulnerability Score- How does the score of a particular sample change after it is being altered by the domain?
mAP- Includes standard COCO mAP and mAR metrics.
Confusion Matrix- A confusion matrix showing how the model mixes up different classes.
Attributes(only in Stress Tests) - Violin plots showing model susceptibility and model aggravation depending on the values of each axis of the domain.
Classification Metrics- Shows standard classification metrics like accuracy, F1 score, precision, recall.
You can analyze the results for each model from this window. Also, you can
compare several stress tests and baseline runs at once, by selecting them from
the drop down menu to the left of the
You can click on each sample and investigate individually what is contributing to that drop in performance through our interactive tool.