How to Build Successful Computer Vision Applications at Scale
Navigating Computer Vision Projects From Prototype to Production
Computer vision is quickly gaining traction in several industries as the availability of image data grows, and artificial intelligence (AI) becomes increasingly paramount to companies worldwide. Computer vision, or CV, is a form of machine learning (ML) that helps computers see and interpret images similar to the human eye. By classifying images and the objects within, computers can then react to what they see and provide enhanced predictions, customer experiences, and security depending on the use case.
There are many computer vision applications when it comes to AI, with its usage expecting to increase with time exponentially. CV in healthcare, for instance, is expected to grow from about $400 million in 2019 to $1.3 billion by the end of 2025, while 30% of retailers will have up-to-date CV technology in place over the next 12 months. The CV market as a whole is projected to be worth $18.24 billion in 2025, a massive chunk of the global AI market (which will reach an impressive $68 billion by 2026).
Despite the rapid growth of computer vision projects, many companies still struggle to find the confidence to deploy it due mainly to a lack of high-quality data and limited understanding of building automated AI pipelines. Unlocking business value will require overcoming these challenges and doing so in a scalable way.
What Are Some Successful Computer Vision Applications?
Many organizations have already found success with their computer vision applications, unlocking business value. These case studies highlight successes across various industries:
E-COMMERCE
Shotzrprovides an image database for marketing professionals comprising over 70 million images. They sought us out for high-quality training data to help create a more personalized and localized search experience for marketers. Leveraging image classification CV, Shotzr used a diverse crowd to tag numerous images with relevant categories, such as fashion, nature, and lifestyle. These images were then fed into the search algorithm for their platform, improving the recommendation and search experience. Engagement increased by 20% because marketers were able to find more relevant images and content.
RETAIL
Robotics is an exciting area of AI that relies on CV. In retail, companies are placing robots on their store floors to track inventory and identify which items are low-stock or out-of-stock. Given that out-of-stock items cost $448 billion in revenue globally per year, there’s a potential for enormous cost-savings for major retailers.
The robots use object detection using image annotation to identify if a product is out-of-stock, in addition to optical character recognition using image transcription to scan barcodes and output product name and price.
AGRICULTURE
John Deere is shaping pesticide use by applying computer vision algorithms to identifying weeds on farms. With pixel-level image segmentation, the AI is trained to differentiate which part of an image is a crop and which part is a weed. That way, farmers can use drones to spray pesticides only on the weeds, leading to a potential 90% reduction in pesticide costs.
AUTOMOTIVE
HEREis a company that creates accurate maps for many industries by leveraging video, image, and text data. Their street sign detection algorithm has ML-assisted video object tracking, and their platform can identify businesses using an optical character recognition algorithm with bounding boxes on commercial signage. HERE uses pixel-level semantic segmentation on satellite imagery to annotate buildings for pedestrian entrances, floor counts, and more.
The company also uses video annotation to track cars, vehicles, and pedestrians. Our tools provide heightened machine assistance, with the model able to track each object’s movement to make human annotation of that object much more manageable.
These examples demonstrate CV’s power to unlock critical cost-savings for companies across significant industries while also emphasizing the value of training data in their success.
How to Approach Computer Vision Projects
The key to approaching computer vision projects is building a scalable, automated model pipeline. The following steps will guide you through the process, using the example of self-driving cars.
1. Business Problem
Define a clear business problem that will provide value to your organization. Recognize the critical stakeholders involved in executing a solution and receive their sign-off and understanding of the project. Be sure to evaluate the endeavor’s priority and the level of investment your organization is willing to make.
In the case of developing a self-driving car, the business value may be greater revenue or desire to gain a competitive edge.2. Data
Preparing training data involves numerous steps, including collecting, cleaning, segmenting, annotating, processing, and analyzing. You’ll also want to have data governance procedures in place to monitor security concerns. (See more about the importance of training data in the next section.)
In our self-driving car example, synchronized sensor data from cameras, LiDAR, and RADAR is collected from a car and ported to a central storage unit. Automotive manufacturers may also choose to leverage relevant sensor data from open-source or off-the-shelf datasets. The data may be annotated using various methods; for example, point cloud video object tracking is a CV annotation technique that can track objects in 3D space (perhaps helpful to see how a car interacts with other items).3. Model Build
The model build phase requires training your algorithm using prepared data and hyperparameters, optimizing feature extraction, analyzing the output, and retraining until the model achieves the desired accuracy threshold. You can use the champion-challenger model for testing, where you have an initial model that acts as the one to beat. You present another model, the challenger, and run A/B testing on both; whichever one performs better than becomes the champion model. You may have to iterate through this process hundreds or thousands of times before you have the model you want.
With a self-driving car, you may need to train one or five different models that require diverse data collection or annotation processes and fuse them together to create your final model. As you test models, you add complexities through iterations (e.g., temperatures, what’s happening above the skyline, or other factors relevant to driving). Self-driving cars also require in-situ testing in real environments to ensure the vehicle can perform in varying conditions.4. Deployment
Once you have a champion model, evaluate whether your solution addresses the business problem you defined at the start and whether it will provide the intended business value. If not, go back through the process to make adjustments. If your model is ready, integrate it with your existing business processes, and deploy. Have tooling in place to continue to measure the model’s performance post-deployment.
Deployment can mean a lot of things. With self-driving cars, there’s a physical component, most likely a wearable piece of technology that needs to be on the vehicle itself.5. Active Learning and Tuning
After your model is deployed, you’re not done. You’ve now entered into model maintenance mode, which requires continuous updates and monitoring. Use a human-in-the-loop approach to provide ground truth and success monitoring, with the intention to mitigate model drift. Also, continue to check for biases in your model’s predictions and provide feedback to the model if needed.
As road conditions evolve worldwide, updates will need to be made continuously to the AI in a self-driving car.
Training Data: The Core of Computer Vision Projects
An equally accurate header could say that training data is the core of all machine learning projects. Without quality training data, the AI model will struggle to make accurate, high-confidence predictions that serve the end user well. When building AI, this is a component that you have to get right to be successful. So what are some considerations you should have about your data? The following questions will help you create an effective data management strategy:
Goals and Project Priorities
What are your quality goals?
How do you plan to train and tune your model?
What are your data requirements?
Data Collection
How much data do you need?
Where are you sourcing your data from?
Is your data diverse enough to avoid overfitting?
How will you move your data around?
How will you keep collecting it post-deployment?
Data Labeling
What type of data labeling do you need?
What labeling tools best suit your needs?
Who is labeling your data? Do you need specific skills, languages?
Data Pipeline and Scaling
How do you plan to automate with an AI data pipeline?
Will you incorporate a human-in-the-loop?
How will you provide your model with continuous training?
While these questions are by no means exhaustive, they’ll help you explore the needed pathways to preparing high-quality training data and building and maintaining a successful model.
Optimizing for the Future
Building efficient, high-performing CV models is a matter of optimizing your data and model pipelines and avoiding common errors. You’ll want to tackle data drift and the issue of stale models by building a continuous learning loop to keep retraining and challenging your champion model. You’ll want to design your models to scale by setting up repeatable, automated workflows. You’ll also want to create a comprehensive data governance framework to facilitate high-quality training data preparation. These actions will collectively help advance you out of the pilot phase and into deployment, production, and beyond.
How We Can Help
At Appen, we have over 20 years of expertise, including collecting and annotating the necessary data to build deep learning systems and neural networks for computer vision. This high-quality image annotation data is tailored to computer vision projects specific training needs.
Learn more about how our annotation capabilities can support a broad range of computer vision tooling, including object tracking, pixel-level semantic segmentation, and image transcription.