Like many educators, I’m proud to be jaded to the unexpected question. I only needed to be asked “Are all French dogs boys?” once to learn that most valuable of lessons for teachers: “Assume nothing.” So, when I had the opportunity to expand my role at Azavea, in part to provide training for remote workers, I was confident that I’d heard just enough of it all to gauge how much (or how little) I knew when preparing documentation. Collaborating with a data labeling team of Cloud Workers in Kathmandu, however, showed me how many lessons I’ve yet to learn.
Azavea and machine learning
Some here have been involved in Artificial Intelligence (AI) since a degree in it was considered as useful as a degree in basket-weaving. As a company, however, Azavea first began exploring deep learning after receiving a Small Business Innovation Research (SBIR) grant in 2016. Our work using deep learning for semantic segmentation of aerial imagery led to the development of Raster Vision, an open-source framework for deep learning projects.
Our Research and Development (R&D) team continued development of Raster Vision the following year, experimenting with multi-label image classification for the Understanding the Amazon from Space Kaggle competition. In 2018, Azavea collaborated with the Inter-American Development Bank (IDB) on a machine learning pilot. Using OpenStreetMap (OSM) building footprints as labels, we built a model to predict the location of buildings in South American cities using Raster Vision. Excited by the work we’ve done with and for our partners as well as the potential applications of neural networks, the logical next step for R&D was investing further in machine learning projects.
In early 2019, we partnered with CloudFactory to provide data labeling for machine learning projects. I provided support to the R&D team through the process. One job was creating instructional materials to train Cloud Workers. The task was two-fold: train them in the use of our in-house data labeling tool as well as in the parameters of particular use cases. With no public models available, I created materials using my best judgment. Some of the calls I made were correct, while others proved less accurate. As the materials I developed evolved, I learned quite a bit. Here are five key takeaways.
Show, don’t tell
I wrote the first piece of documentation for our team with the help of a colleague. While thorough, the result, a Google Doc, was…verbose. My initial answer to the question “How do I explain fine detail when I’m not in the same room as the people I’m working with?” was: words, lots and lots of words. The problem? The result was visually overwhelming and didn’t succeed in communicating what I had hoped.
The next iterations of the training document involved adding more and bigger screenshots. The screenshots made it more legible and provided a second method for learners to access the information it contained. While it’s always important to consider different learning types, images are particularly important when you and your team members have different first languages. In my experience, a clear, properly contextualized image communicated more and more precisely than an entire paragraph of text.
The real breakthrough, however, was upgrading from screenshots to screen captures. Our Nepalese team leads suggested using Loom. The use of video necessitated a switch to Google Slides, which made the material easier to digest. Our Cloud Workers supported this switch, reporting, “It’s clear and understandable as a form of
Screen captures also made it easier to give feedback. Reviewers captured video as they corrected errors, which allowed them to spread information quickly and clearly. Likewise, labelers captured confusing instances and flagged them for my review. Our Cloud Workers proactively provided us with visual documentation, which deepened our understanding of our data and their needs.
Speak often and h
Open lines of communication are fundamental in creating quality data for your machine learning project and providing appropriate support for your team. Timely and thorough conversation prevents obstacles in your workflow and provides opportunities for retraining and relearning.
Workflows developed by CloudFactory on this project proved indispensable. Their staff worked with us to schedule an appropriate number of check-ins with our Team Leads, and their messaging platform allowed us to chat easily with our team in Kathmandu. In addition, our Cloud Workers took the lead in developing and sharing other communication tools with us. Shared spreadsheets made it easy to track questions and detailed notes on daily work enabled asynchronous conversation and alerted us to any issues.
You should also leverage your team’s knowledge by making sure they feel comfortable offering feedback. Whether they are requesting additional documentation, noticing a bug, or experiencing a more mundane problem, they should feel free to come to you. When there are nearly ten hours and 8,000 miles between you and your colleagues, even the need to move a meeting can be the difference between dinner with your family or dinner at the office.
When our team leads noticed conversation in weekly meetings was slowing down, they suggested we switch to a bi-weekly schedule. Worthwhile conversations were happening, but at nearly 10 p.m. for our team members
Embrace the edge (c
No matter how well defined your classes are, you and your data labeling team will consistently encounter objects that manage to test their boundaries. While it may feel disheartening to have your carefully crafted tool riddled with questions almost as soon as you share it, those edge cases are critical to
In one use-case, we asked Cloud Workers to identify, label, and classify crosswalks in images of New York. It was quickly clear that far more thought was needed for the definition of “crosswalk.” Questions we hadn’t considered included:
- Does a crosswalk have to be striped?
- Does a crosswalk have to be a certain color?
- Is a path in a parking lot a crosswalk?
- What about a path on the grounds of a school? A bike path?
- Are the “islands” connecting crosswalks on wider streets considered a part of the crosswalk?
- If, as in the above case, two or more portions are significantly offset from each other do they count as one crosswalk or two?
Edge cases force you to evaluate and reevaluate your priorities and goals. Again and
Alegion, a data labeling platform provider, recently conducted a survey that revealed that 96% of companies engaged in machine learning run into problems due to low quality labels. Without trained and accurate data analysts to annotate your imagery, implementation of your deep learning project can stall. The proven benefits of a managed workforce led us to choose CloudFactory, and our choice was proven wise.
One of the most surprising ways our partnership with CloudFactory proved valuable was in their assistance with improving our in-house labeling tool. As the first outside users, they’ve helped us shape a more intuitive and user-friendly tool. Simple changes such as the ability to hide an annotation made it easier to create accurate labels.
Our team leads also advocated for a “dashboard” that would allow them to track productivity. The dashboard also features an insightful “Collaborators” section that tracks key metrics such as label speed. CloudFactory’s expertise has so enhanced our tool, we may decide to repackage it for public use at some later date.
Challenge your cultural assumptions
As machine learning becomes a more prominent segment of the AI field, many are working to ensure that the ethical implications of such work are
What’s data labeling got to do with it?
While training documents might seem an odd place for cultural exchange, it’s surprising how many cultural assumptions are implicit in even the most granular of documents. Consider the truck. One challenge was teaching our Cloud Workers to classify vehicles in three categories, including “Passenger Vehicle” and “Truck.” While the difference between a pick-up truck and, you know, a “truck truck”, was clear to those of us in the room when we selected those terms, it most certainly was not to those who we were
The cultural gap need not be as great as that between the U.S. and Nepal to cause issues, either. While it might seem strange for a former Southern Californian, I’m not a driver. In fact, I failed the only driving test I’ve ever taken within three blocks (I still maintain that I was tricked!). In any case, this was not something I expected to come up in my machine learning work until I needed to distinguish crosswalks from say, speed bumps or gore points. I’ll be honest, I didn’t even know what a gore point was. Understanding your own cultural viewpoints is an important step in creating useful documentation that both respects your teammates and ensures you achieve the results you desire.
In creating documentation to train computer vision workers, I’m sure that I learned as much as I taught. Working carefully and thoughtfully — with judicious revision — is vital in ensuring that you are feeding accurate labels into your machine learning model. Most importantly, you need to trust and respect your labeling team, and your training materials should reflect that. Whether you’re sizing imagery, determining the best mode of presentation, or deciding what exactly is a crosswalk, a user-focus and a collaborative spirit are two of your best tools.