Quick Start

Vocabulary

Before diving in the details of the API, let's detail first some vocabulary points that we will use consistently through this whole documentation:

  • by framework: we refer to the library that was used to train a neural network. Currently only Caffe and TensorFlow are supported.

  • by neural network or network: we refer to a trained deep convolutional neural network and all the necessary preprocessing information to be able to perform inference (i.e. computation) on any input image. At this point, there is no semantic on the output of the network. For that, you need to couple it with a recognition model.

  • by recognition model or model: we refer to the necessary information to interpret the output of the neural network. It might be its output labels, the threshold at which it makes sense to consider a detection valid, NMS threshold, etc. A recognition model is made of a specification part that properly defines the output of the model and a version that implements this specification. A specification can have multiple versions that implement it. Specifications can currently describe classification, tagging and detection models:

  • by classification: a model that is able to recognise the main content of the image among a set of N possible exclusive labels.

  • by tagging (also referred as multi-label classification): a variant of classification where multiple labels, also called tags, can be assigned to the same image.

  • by detection: a model that is able to predict the position of multiple instances of objects in an image, given a set of N possible object labels. Each object instance is localised thanks to a bounding box, **often shortened bbox** below, which is a rectangle that delimits the predicted extend of the object in the image.

Test pre-trained models

When following the link to the API below, you will be asked for login. Simply use the same email address and password from your Deepomatic Studio account.

List public models

The first thing you may want to do is to try our pre-trained demo image recognition models. There are currently six of them:

  • imagenet-inception-v1: A generalist content classifier trained on ImageNet with 1000 output classes.

  • real-estate-v2: A real estate tagging model that allow to automatically annotate images coming from a property ad with the room type, context of the photo and some typical objects appearing in the photo.

  • fashion-v4: A detector that is able to localise fashion items in images.

  • furniture-v1: A detector that is able to localise furniture in images.

To get a list of public recognition models, run the following scripts

curl https://api.deepomatic.com/v0.7/recognition/public \
-H "X-API-KEY: ${DEEPOMATIC_API_KEY}"

List model labels

To access the list of labels for a classifiers, visit by the API endpoint bellow replacing {:model_name} with one of the IDs above.

Labels API endpoint
# Generic label endpoint
https://api.deepomatic.com/v0.7/recognition/public/{:model_name}

# Example with street-v1
https://api.deepomatic.com/v0.7/recognition/public/street-v1

Please refer to the inference section for a complete description of the returned data.

List model specifications

To access specifications for a given model, including its output labels, run the following scripts:

MODEL_NAME="fashion-v4"
curl https://api.deepomatic.com/v0.7/recognition/public/${MODEL_NAME} \
-H "X-API-KEY: ${DEEPOMATIC_API_KEY}

Test a model

You can run a recognition query on a test image from an URL, a file path, binary data, or base64 encoded data. As the API is asynchronous, the inference endpoint returns a task ID. If you are trying the shell example, you might have to wait one second before trying the second curl command before the task completes.

You can try your first recognition query from an URL by running:

MODEL_NAME="fashion-v4"
TASK=`curl https://api.deepomatic.com/v0.7/recognition/public/${MODEL_NAME}/inference \
-H "X-API-KEY: ${DEEPOMATIC_API_KEY}" \
-d '{"inputs": [{"image": {"source": "https://static.deepomatic.com/resources/demos/api-clients/dog2.jpg"}}], "show_discarded": false}' \
-H "Content-Type: application/json"`

# The curl result will return a task ID that we use to actually get the result
echo ${TASK}
TASK=$(echo ${TASK} | sed "s/[^0-9]*\([0-9]*\)[^0-9]*/\1/")
sleep 1
curl https://api.deepomatic.com/v0.7/tasks/${TASK} \
-H "X-API-KEY: ${DEEPOMATIC_API_KEY}"

The result of this command will be made of a JSON dictionary result with one outputs field. This field will have one element as our public networks only have one interesting output tensor of type labels. For most networks you will want to look at the value of result['outputs][0]['labels']['predicted'] which is a list of object with the following fields:

  • label_name: the name of the detected object.

  • label_id: the numeric ID of the label corresponding to label_name.

  • roi: ab object containing a bounding box bbox localising the position of the object in the image. The coordinates are normalised and the origin is in the top-left corner. Please refer to the documentation of a description of region_id.

  • score: the "confidence" score output of the softmax layer.

  • threshold: the threshold above the confidence score is considered high enough to produce an output. If you set show_discarded to True in the query, you will also get in result['outputs][0]['labels']['discarded'] a list of object candidates that did not pass the threshold.

Below is a typical output:

JSON
{
    "outputs": [{
        "labels": {
            "predicted": [{
                "label_name": "sunglasses",
                "label_id": 9,
                "roi": {
                    "region_id": 1,
                    "bbox": {
                        "xmin": 0.312604159,
                        "ymin": 0.366485775,
                        "ymax": 0.5318923,
                        "xmax": 0.666821837
                    }
                },
                "score": 0.990304172,
                "threshold": 0.347
            }],
            "discarded": []
        }
    }]
}

Pre-processing examples

Please refer to the documentation for an example of how to upload a network. This operation involves defining of input images should be preprocessed via the preprocessing field. We give below some examples:

Caffe classification

JSON
{
    "inputs": [
        {
            "tensor_name": "data",
            "image": {
                "resize_type": "SQUASH",
                "data_type": "FLOAT32",
                "dimension_order": "NCHW",
                "pixel_scaling": 255.0,
                "mean_file": "imagenet_mean.binaryproto",
                "target_size": "224x224",
                "color_channels": "BGR"
            }
        }
    ],
    "batched_output": true
}

Caffe faster-RCNN

JSON
{
    "inputs": [
        {
            "tensor_name": "data",
            "image": {
                "resize_type": "NETWORK",
                "data_type": "FLOAT32",
                "dimension_order": "NCHW",
                "pixel_scaling": 255.0,
                "mean_file": "imagenet_mean.binaryproto",
                "target_size": "800",
                "color_channels": "BGR"
            }
        },
        {
            "tensor_name": "im_info",
            "constant": {
                "shape": [
                    3
                ],
                "data": [
                    "data.1",
                    "data.2",
                    1.0
                ]
            }
        }
    ],
    "batched_output": false
}

TensorFlow inception v3

JSON
{
    "inputs": [
        {
            "tensor_name": "map/TensorArrayStack/TensorArrayGatherV3:0",
            "image": {
                "resize_type": "CROP",
                "data_type": "FLOAT32",
                "dimension_order": "NHWC",
                "pixel_scaling": 2.0,
                "mean_file": "unitary_mean.npy",
                "target_size": "299x299",
                "color_channels": "BGR"
            }
        }
    ],
    "batched_output": true
}

The mean file unitary_mean.npy can be build with:

Python
import numpy as np
mean = np.ones((1, 1, 3))  # HWC setup with H = 1, W = 1 and C = 3

with open('unitary_mean.npy', 'wb') as f:
    np.save(f, mean, allow_pickle=False)

TensorFlow detection

JSON
{
    "inputs": [
        {
            "tensor_name": "image_tensor:0",
            "image": {
                "resize_type": "NETWORK",
                "data_type": "UINT8",
                "dimension_order": "NHWC",
                "pixel_scaling": 255.0,
                "mean_file": "",
                "target_size": "500",
                "color_channels": "RGB"
            }
        }
    ],
    "batched_output": true
}

Specification examples

Please refer to the documentation for an example of how to create a recognition specification. This operation involves defining outputsof your algorithms. We give below some examples of this field:

Python
def generate_outputs(labels, algo):
    return [{
        "labels": {
            "roi": "BBOX" if algo == "detection" else "NONE",
            "exclusive": algo != "tagging",
            "labels": [{"id": i, "name": l} for (i, l) in enumerate(labels)]
        }
    }]

# This generates `outputs` for classification (exclusive labels, softmax output)
outputs = generate_outputs(['hot-dog', 'not hot-dog'], 'classification')

# This generates `outputs` for tagging (non-exclusive labels, sigmoid output)
outputs = generate_outputs(['is_reptile', 'is_lezard'], 'tagging')

# This generates `outputs` for detection (exclusive labels)
outputs = generate_outputs(['car'], 'detection')

Post-processing examples

Please refer to the documentation for an example of how to create a recognition version. This operation involves the post_processingsfield which defines how the output of the network should be handled.

In the post-processings proposed below, we omit the thresholdsfield on purpose: they will be set by default. The default value are:

  • for classification 0.025 with exclusive == True and roi == "NONE" .

  • for tagging 0.5 with exclusive == False and roi == "NONE .

  • for detection 0.8with roi == "BBOX" /

Classification

JSON
{
    "classification": {
        "output_tensor": "inception_v3/logits/predictions"
    }
}

Anchored detection

JSON
{
    "detection": {
        "anchored_output": {
            "anchors_tensor": "rois",
            "scores_tensor": "cls_prob",
            "offsets_tensor": "bbox_pred"
        },
        "discard_threshold": 0.025,
        "nms_threshold": 0.3,
        "normalize_wrt_tensor": "im_info"
    }
}

Direct output detection

JSON
{
    "detection": {
        "direct_output": {
            "boxes_tensor": "detection_boxes:0",
            "scores_tensor": "detection_scores:0",
            "classes_tensor": "detection_classes:0"
        },
        "discard_threshold": 0.025,
        "nms_threshold": 0.3,
        "normalize_wrt_tensor": ""
    }
}

Yolo detection

JSON
{
    "detection": {
        "yolo_output": {
            "output_tensor": "import/output:0",
            "anchors": [1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071]
        },
        "discard_threshold": 0.025,
        "nms_threshold": 0.3,
        "normalize_wrt_tensor": ""
    }
}

Last updated