Deepomatic Platform
  • Overview
  • Release notes
    • January 2025
    • November 21, 2024
    • October 17, 2024
    • September 19, 2024
    • July 18, 2024
    • June 27, 2024
    • May 23, 2024
    • April 18, 2024
    • March 21, 2024
    • February 22, 2024
    • January 18, 2024
    • December 13, 2023
    • October 26, 2023
    • July 20, 2023
    • June 29, 2023
    • May 29, 2023
    • April 27, 2023
    • March 30, 2023
    • February 17, 2023
    • January 19, 2023
    • December 22, 2022
    • November 18, 2022
    • October 19, 2022
    • September 19, 2022
    • July 27, 2022
    • June 26, 2022
    • May 17, 2022
    • April 13, 2022
    • March 17, 2022
    • February 10, 2022
    • December 21, 2021
    • October 26, 2021
  • Getting started
  • ADMIN & USER MANAGEMENT
    • Invite and manage users
      • Invite group of users at once
      • SSO
        • Azure Active Directory
  • Deepomatic Engage
    • Integrate applications
      • Deepomatic vocabulary
      • Deepomatic connectors
        • Set-up
        • Camera Connector
        • Work Order Connector
      • API integration
        • Authentication
        • Errors
        • API reference
          • Work order management
          • Analysis
            • Guide field workers
            • Perform an analysis
            • Correct an analysis
          • Data retrieval
          • Endpoints' list
      • Batch processing
        • Format
        • Naming conventions
        • Processing
        • Batch status & errors
      • Data export
    • Use the mobile application
      • Configure a mobile application
      • Create & visualize work orders
      • Complete work orders
      • Offline experience
    • Manage your business operations with customisable solutions
      • Roles
      • Alerting
      • Field services
        • Reviewing work orders
        • Exploring work orders
        • Grouping work orders
        • Monitoring assets performance
      • Insights
  • Security
    • Security
    • Data Protection
Powered by GitBook
On this page
  • Class Balancing
  • Third party dataset
  • Crop Margin

Was this helpful?

  1. Deepomatic Drive
  2. Configuring Visual Automation Applications
  3. Training models and model versions

Dataset options

Last updated 1 month ago

Was this helpful?

Class Balancing

This option helps balance the representation of underrepresented concepts in your dataset.

Each region in a dataset can have:

  • One label in classification and detection tasks

  • Multiple labels in tagging tasks

The goal of class balancing is to achieve a more uniform distribution of labels across all regions. The algorithm follows a loss-based approach to do this:

  1. It first assesses the initial label distribution (e.g., {'cat': 54, 'dog': 13, 'horse': 79}).

  2. It calculates an entropy value to measure how uniform the distribution is.

  3. Each data point is assigned a score based on its individual loss, which reflects how prevalent its associated labels are in the dataset.

  4. The algorithm ranks samples by prevalence and selects the least represented ones to add to the dataset. It then updates the loss values accordingly.

  5. This process repeats until adding more samples no longer improves entropy, meaning the dataset has reached its best possible balance.

To ensure efficient balancing, the algorithm follows these rules:

  • You can define a maximum expansion ratio, which limits how much larger the new dataset can be compared to the original.

  • The same samples are not repeatedly reused to avoid overfitting.

  • A perfectly balanced dataset is not always achievable. For example, if a label appears in only two samples ({A -> 55, B -> 41, C -> 2}), it cannot be duplicated excessively to match other labels. Balancing stops when no further improvements in entropy can be made.

Third party dataset

This option allows you to enrich your training data by integrating external datasets. It is available for tagging and detection models.

You can select a third-party dataset, such as COCO, and define the number of records to include in your training set. This helps increase dataset diversity and improve model generalization by introducing additional labelled samples. By leveraging external data, you can compensate for underrepresented concepts in your dataset, improve robustness, and reduce biases.

Crop Margin

This applies to sub views of detection views

You can train a dataset composed of images which are regions, or crops, of a parent "Detection" view. This sub view can be any kind of task. Typically, when training on a view whose parent is a detection view, we crop the region out of the original image with the coordinates of the parent view. With this parameter, we expand this crop, allowing you to enlarge the sample region by using the original image, expanding the crop by the percentage given in the parameter.

This can be useful if important elements of context are situated next to the bounding box.

Example: you are tasked with detecting animals. You create a first view to detect any animal, and a second one to select the kind of animal.

Your parent view correctly detects animal instances in a bounding box. Next, you should predict which kind of animal it is. For that task, the tail could be useful, unfortunately your bounding box did not include the tail of the animals.

Adding a crop margin will allow the training engine to take a larger crop from the image and include the tail.