The Finovate Debuts series introduces new Finovate alums. CrowdFlower won Best of Show in its Finovate debut last September at FinovateFall 2014. The company’s platform automates the management of online data workers, making it easier and faster for data scientists to maintain quality control over both the process and the result.
CrowdFlower is a data-enrichment platform that enables data scientists to easily and accurately collect, clean, and label data from an online workforce.
- Founded in 2009
- Headquartered in San Francisco, California
- Raised $29 million in funding
- Operates with more than 80 employees and more than 5 million contributors
- Backed by investors including Bessemer Venture Partners and Trinity Ventures
- Customers include Bloomberg, eBay, Intuit, LinkedIn, and Microsoft
- Lukas Biewald is founder and CEO
The task of collecting, cleaning, and labeling the enormous amounts of data generated every day may seem like the kind of thankless task that rarely receives proper recognition.
So credit the attendees at September’s FinovateFall for awarding CrowdFlower Best of Show honors.
Writing about CrowdFlower’s win, Jon Ogden of Money Summit
focused on the “crowd” part of CrowdFlower’s innovation, saying the platform “represents the first time this many people have been put to work on a single crowdsourcing platform … This means that tasks that used to be impossibly expensive (or just plain impossible) are now manageable.”
CrowdFlower is a platform that helps institutions and organizations manage online workforces easier and more accurately. Importantly, with CrowdFlower’s “people-powered data” approach, human beings are very much a part of the data enrichment process. In those instances where algorithms are not yet capable of discerning subtle details in certain data – such as automobiles in satellite photos of shopping mall parking lots – human data workers remain crucial.
But what has been challenging historically has been finding non-cumbersome ways of managing these workers and their work. This is the problem that CrowdFlower solves.
“There are a few reasons why humans are still necessary,” Tatiana Josephy, VP of Product explained. “There is the problem of low and high confidence data, for example. It’s better for humans to handle low confidence data – a fuzzy image, for example – to fill in where computers fall down.”
The data workers involved are typically stay-at-home moms or students, mostly citizens of the United States, India, the U.K., or Europe. And CrowdFlower gives companies the ability to reach these workers, wherever they are, and put their talents and abilities to use.
“What would you do with 100s of millions of workers on demand?” she asked.
CrowdFlower was founded by data scientists who had worked with “messy data” at Yahoo for years and were looking for ways to outsource the labor. They found that the messiness of the data meant that upwards of 80% of their time was spent in the manual work of just labeling the data.
“Big data is nothing if you don’t have clean data,” Josephy said.
Rich data is how CrowdFlower conceptualizes what businesses really need. And the CrowdFlower platform lets organizations and institutions manage everything from task development and worker procurement to quality control and performance evaluation with a single, integrated solution.
The use cases for CrowdFlower are fascinating. One company uses the technology to handle the data derived from its satellite imaging of everything from ships docked in ports to oil in Saudi oil tanks in order to predict market price moves. Another company leverages the CrowdFlower platform to extract data from SEC documents – something algorithms still do unevenly. In both cases, all that the companies had to do was build the job on the CrowdFlower platform and then launch it to CrowdFlower’s online workforce.
One last use case. A client of CrowdFlower uses sentiment analysis on Twitter to conduct market intelligence. Within 24 hours of the announcement of Apple’s iWatch, CrowdFlower’s online workforce of 1,400 analyzed 27,000 tweets at a cost of $280 to the customer.
“Big data is millions of pixels and images. Rich data is the number of cars in parking lots,” Josephy explained from the Finovate stage. “It’s clean and complete data that you can actually use.
Financial use cases range from reading the information on credit card statements (which is often incomplete or written in cryptic abbreviations), collecting and verifying merchant data, collecting data from SEC filings, and, as the company showed from the stage last fall, analyzing satellite imagery for business intelligence or market analysis.
And importantly, according to Josephy, the average business user is likely capable of using CrowdFlower. “You don’t have to be an engineer,” she said. “If you can understand Excel, then you have all the knowledge you need.”
Typically the work done by CrowdFlower is done through outsourcing. But the company believes there are significant issues with outsourcing that make it a poor choice for many companies. Outsourcing is expensive and time-consuming. “Every time you need to collect new data, you reach out to your outsourcing vendor and you engage in weeks of back and forth about the job,” Josephy said. “CrowdFlower is so much easier.”
CrowdFlower is revamping the tool that allows clients to built the initial job, as well as improving quality control technology to support a broader set of uses cases. The goal is also to continue to develop the platform to enable it to complete more complex, “longer form” tasks, such as transcribing a half-hour video or long tax documents.
(Above: Tatiana Josephy, VP of Product, and Seth Teicher, Head of Content and Business Development)
Meanwhile the company recently announced support for eight new languages – Arabic, Chinese, Hindi, Indonesian, Italian, Russian, Turkish, and Vietnamese – and enhanced support for four others (French, German, Portuguese, and Spanish). Lukas Biewald, founder and CEO, said the new “Language Crowds” will “make it even faster for customers to get the high quality data they need.”
While so many innovations at the intersection of human labor and technology seem fraught with problems (see the debates over technologies like Kensho
), CrowdFlower shows how critical human work is when collecting data, as well as how technology can help organizations manage these new workforces.
“By connecting companies to an online, scalable, fully-vetted workforce,” Josephy concluded from the Finovate stage in September, “CrowdFlower turns big data into rich data faster, cheaper, and easier than any alternative on the market.” And while it is hard to say just how much the Finovate audience knew about the alchemy of turning big data into rich data before awarding CrowdFlower Best of Show, it is clear they they recognized the value of true innovation in the space when they saw it.