Quick Start
Vocabulary
Before diving in the details of the API, let's detail first some vocabulary points that we will use consistently through this whole documentation:
by framework: we refer to the library that was used to train a neural network. Currently only Caffe and TensorFlow are supported.
by neural network or network: we refer to a trained deep convolutional neural network and all the necessary preprocessing information to be able to perform inference (i.e. computation) on any input image. At this point, there is no semantic on the output of the network. For that, you need to couple it with a recognition model.
by recognition model or model: we refer to the necessary information to interpret the output of the neural network. It might be its output labels, the threshold at which it makes sense to consider a detection valid, NMS threshold, etc. A recognition model is made of a specification part that properly defines the output of the model and a version that implements this specification. A specification can have multiple versions that implement it. Specifications can currently describe classification, tagging and detection models:
by classification: a model that is able to recognise the main content of the image among a set of N possible exclusive labels.
by tagging (also referred as multi-label classification): a variant of classification where multiple labels, also called tags, can be assigned to the same image.
by detection: a model that is able to predict the position of multiple instances of objects in an image, given a set of N possible object labels. Each object instance is localised thanks to a bounding box, **often shortened bbox** below, which is a rectangle that delimits the predicted extend of the object in the image.
Test pre-trained models
When following the link to the API below, you will be asked for login. Simply use the same email address and password from your Deepomatic Studio account.
List public models
The first thing you may want to do is to try our pre-trained demo image recognition models. There are currently six of them:
imagenet-inception-v1
: A generalist content classifier trained on ImageNet with 1000 output classes.real-estate-v2
: A real estate tagging model that allow to automatically annotate images coming from a property ad with the room type, context of the photo and some typical objects appearing in the photo.fashion-v4
: A detector that is able to localise fashion items in images.furniture-v1
: A detector that is able to localise furniture in images.
To get a list of public recognition models, run the following scripts
List model labels
To access the list of labels for a classifiers, visit by the API endpoint bellow replacing {:model_name}
with one of the IDs above.
Please refer to the inference section for a complete description of the returned data.
List model specifications
To access specifications for a given model, including its output labels, run the following scripts:
Test a model
You can run a recognition query on a test image from an URL, a file path, binary data, or base64 encoded data. As the API is asynchronous, the inference endpoint returns a task ID. If you are trying the shell example, you might have to wait one second before trying the second curl
command before the task completes.
You can try your first recognition query from an URL by running:
The result of this command will be made of a JSON dictionary result
with one outputs
field. This field will have one element as our public networks only have one interesting output tensor of type labels
. For most networks you will want to look at the value of result['outputs][0]['labels']['predicted']
which is a list of object with the following fields:
label_name
: the name of the detected object.label_id
: the numeric ID of the label corresponding tolabel_name
.roi
: ab object containing a bounding boxbbox
localising the position of the object in the image. The coordinates are normalised and the origin is in the top-left corner. Please refer to the documentation of a description ofregion_id
.score
: the "confidence" score output of the softmax layer.threshold
: the threshold above the confidence score is considered high enough to produce an output. If you setshow_discarded
toTrue
in the query, you will also get inresult['outputs][0]['labels']['discarded']
a list of object candidates that did not pass the threshold.
Below is a typical output:
Pre-processing examples
Please refer to the documentation for an example of how to upload a network. This operation involves defining of input images should be preprocessed via the preprocessing
field. We give below some examples:
Caffe classification
Caffe faster-RCNN
TensorFlow inception v3
The mean file unitary_mean.npy can be build with:
TensorFlow detection
Specification examples
Please refer to the documentation for an example of how to create a recognition specification. This operation involves defining outputs
of your algorithms. We give below some examples of this field:
Post-processing examples
Please refer to the documentation for an example of how to create a recognition version. This operation involves the post_processings
field which defines how the output of the network should be handled.
In the post-processings proposed below, we omit the thresholds
field on purpose: they will be set by default. The default value are:
for classification
0.025
withexclusive == True
androi == "NONE"
.for tagging
0.5
withexclusive == False
androi == "NONE
.for detection
0.8
withroi == "BBOX"
/
Classification
Anchored detection
Direct output detection
Yolo detection
Last updated