Turning Video into Traffic Data Part Two

In Turning Video into Traffic Data Part One, I wrote about Miovision’s systematic method for processing the large amount of video that is uploaded to our system. I detailed our three step process for video configuration, quality assurance, and data validation, and explained how computer vision is used to detect vehicle movements from video. If you haven’t yet read Part One, I would recommend you start there.

In this second-and-final post, I will be diving into the details of data accuracy, how we account for error, how we develop our best-in-class algorithm, and how that helps our customers rely on the quality of Miovision data for any project of any size.

Additional content credits to Justin Eichel, PhD, Miovision Technical Director and Computer Vision Architect, and James Barr, Miovision Product Manager.

Deconstructing a Frame of Video into Spatial Regions for Counting

When video is uploaded to Miovision, cardinal direction and number of lanes are required inputs. That is because each video is split into video segments to be processed individually.

Each video segment is determined by spatial region, lane and approach. Segments are then  distributed through a number of processes on a cloud computing service and queued for distribution to a computer vision task.

When computer vision tasks are complete, each video segment is queued for human review and verification. Humans manually count a 12% cross-section from each hour of video to ensure that the computer vision algorithm is properly producing counts and the data is accurate.

annotating an intersection

Screenshot: When the customer uploads a video to the Miovision Platform, they are required to annotate each leg of the intersection (or other road facility) and denote the camera position in relation to vehicle movements. This ensures proper configuration, and this metadata is stored with the count data and video for posterity. The annotation provided in this step is then deconstructed one step further at Miovision and made into counting tasks.

How Miovision Defines Data Accuracy: ±5 / 95%

  • For volumes of up to 100 vehicles within a 15 minute period, the data will be accurate to ±5 vehicles.
  • For volumes greater than 100 vehicles within a 15 minute period, the data will be accurate within 5%.
  • Accuracy guaranteed with proper setup of the Scout Video Collection Unit or other video devices
good setup

Proper Set-up: The stop bars of all four approaches are within frame and the total intersection diameter is approximately 150′. The entry and exit of very vehicle movement will be captured. All cross walks are entirely in-frame.

bad setup

Improper Set-up: The camera is shifted too far to the left. The view is missing one approach, and contains only a partial of another. Entire vehicle movements would not be captured, and the bottom right right-turn movement is entirely missed.

Why not 100%?

Producing data that is verified and reconciled to be 100% accurate is time consuming. With multiple measurements, we can eventually converge on ground truth; however, that comes with high overhead and is a trade-off between cost, turnaround time, and an acceptable accuracy threshold.

At Miovision, we verify all computer vision counts with a 12% human verification overlap that is divided across every hour of video. In our experience over the past 1.5M hours of video, 12% has been the proper balance between appropriately verifying the computer vision algorithm and keeping overheads contained.

When computer vision detection produces an error, it does so logically and consistently. Humans, however, are quite good at making random one-off errors. For this reason, we validate that the datasets are within 5% of each other, and publish the computer vision data as truth data. In areas of low confidence a human will make a more comprehensive review of the video segment.

task comparison

Screenshot: This table displays four movements that have been processed using computer vision algorithms, then verified by a human. Most movements are accurate, however, the Southbound ‘Thru’ displays a material error with computer vision detection at 9:16am and 9:24am. This is an example of a movement that needs to be verified in more detail by a Data Services Technician.

Why ±5 in low volumes?

At volumes less than 100 vehicles of a single classification within a 15 minute bin, the 5% accuracy threshold can be less than a whole vehicle, and therefore, not an applicable measure.

For example, the computer vision algorithm counts 1147 cars, and ten articulated trucks in 15 minutes; our human overlap verifies 1159 cars (1% error), but only nine articulated trucks (10% error).

At this point, we have a choice to delay turnaround and perform additional verification measures to converge on 100% accuracy, or publish the data for our customers immediately at the consolidated accuracy of 99%.

In our experience, transportation professionals would rather receive their data quickly and cost effectively, rather than take additional time to search for the possibility of one vehicle in a low-volume classification.

How We Account for Error

Computer vision doesn’t always produce perfect counts and there are circumstances, such as blizzards, hurricanes, or intense sun glare, where the visual scene is outside of our algorithm’s scope. Every video is different, and weather, wind, and time of day can all affect the computer vision algorithms performance. That is why we need to account for detection errors in the algorithm.

When the algorithm detects that a video segment is outside of its scope and cannot be counted automatically, that segment is immediately distributed to a Data Services Technician to be manually reviewed and counted. Miovision has the ability to do large scale human correction of any video segment where the computer vision algorithm has a low confidence of detection and reporting.

When a human is required to intervene due to low computer vision confidence, the manually processed video segment is incorporated as part of the training process for the evolution and iteration and improvement of our algorithms.


Screenshot from Video: This image is clear and well set-up for counting. All approaches are visible and all movements can be seen. This video segment is within scope of of the computer vision detection algorithm.


Screenshot from Video: At this specific point in the day, the setting sun combined with a dirty lens causes significant glare that is outside of computer vision detection scope. This video segment would be manually enumerated.

Train in Order to Count, Count in Order to Train

The continual addition of new video segments to our training algorithm ensures that similar video segments have a better chance at automatically being counted with future iterations of the algorithm. We employ a team of computer vision scientists and engineers with specialities in statistical modelling, machine learning, and image processing, whose work continues to evolve our detection algorithm. Algorithm release candidates are continually introduced into the product testing environment and are compared against the existing algorithm and manually observed ground truth data.

Prior to full release and algorithm update, several members of our Data Services Team rigorously validate randomly sampled video segments for true positive and true negative computer vision detections.  Once our Data Services Team has observed that the release candidate achieves our quality standards, it is released to production and the product is seamlessly updated.

Our goal is to make computer vision handle as much video volume as possible to reduce turnaround times and manual interventions, while preserving accuracy.

Quality Data is our Brand

The accuracy and quality of our data is of paramount importance. It’s more than our product, it’s our brand. We work hard to ensure that Miovision customers are getting the best quality data on the market, and the customer service they need to have their operations up and running 24/7.