Workflow performance evaluation
The perf
command calculates and displays performance scores per task. It compares tasks from the app version in production with the same tasks run locally. Only work orders tagged "ai_performance_review"
and "review completed"
are selected for the analysis.
The command generates a JSON file (perf_results_data.json
) in the missions/your_project/workflow_v2/perf_tests
directory. The structure of the JSON file looks like this:
In the output example:
"task_groundtruth"
corresponds to an analyzed or corrected value if such a value exists in production."task_value"
is the result of the task executed locally.
Command Usage:
An example usage:
Parameters:
--org
: Organization name.--site_id
: ID of the site.--app_version_id
: The version ID of the app in production where you want to run the analysis.--cwd
: The working directory path for your local project, where the specs.yaml is located.--env
: Path to the environment file containing project-specific environment variables.--worker-nb
: Number of workers (threads) to use for local analyses. Default is 3--pdb
: Enables Python Debugger (optional, if debugging is needed).
As a result, you'll get:
Score computing:
Preview Score
The percentage of correct results on the production site. A result is correct if analyzed_value
matches corrected_value
or if corrected_value
is empty. The score is (correct results / total count) * 100
.
Local Score
The percentage of correct results from the locally executed workflow. A result is correct if it matches corrected_value
or, when empty, analyzed_value
. The score is (correct local results / total count) * 100
.
Last updated
Was this helpful?