How do you quickly turn texts into labels for your machine learning?
Your manager and clients want you to use machine learning to predict an outcome. You know how to tackle the structured data you have, but what to do with the column of text data, such as customer comments or technician notes? What if the outcome itself needs to be coded up based on the text? You need to quickly and accurately code documents into categories but you don’t have the time and resources to read and categorize thousands of documents.
QuickCode will help you classify documents quickly and accurately by suggesting keywords to build precise queries.
A data scientist used QuickCode to create a query in less than 15 minutes that classified more than 5,500 SMS messages as spam or not spam. This classifier had 95% accuracy relative to human coders. Furthermore, the posts classified by the QuickCode-built query helped train a machine-learning model to predict future spam texts. The model had similar performance to a one trained with human-coded labels, but was trained in a fraction of the time. The data scientist found it valuable because:
1) They could use the query to explain to their managers what words and numbers were commonly found in spam text.
2) When spammers changed their patterns, the data scientist iterated again with QuickCode to update the query and prediction models, which helped the team stay abreast of changes in spam tactics.
How do you quickly search and label vast quantities of data for new insights into healthcare, while at the same time explaining your labeling decisions to others?
You’ve identified a factor that could dramatically affect your patients’ well-being and have thousands of records that might help test your hypothesis.
But there are too many documents to review and label on your own. Even if you could, it would take time to explain your categorization system to others or change it as new hypotheses emerge.
QuickCode helps you identify healthcare documents relevant to your interests quickly and accurately, and explain your data labeling approach to peers.
A data scientist used QuickCode to rapidly identify a subset of medical patients and transparently explain their labeling process. The data scientist began with a hypothesis that familial or social support improved patient outcomes. They then searched 50,000 discharge summaries made by healthcare providers with a single-word query: “social.” Using QuickCode, the scientist then selected 66 recommended labels, leading to the discovery of 33,210 relevant documents in less than 15 minutes.
QuickCode also allowed the data scientist to share the validity of their data labeling findings with other subject matter experts, and select more precise labels based on their input.
How do you train and refine a machine learning model to identify complaints from your customers?
You want to use a machine learning application to correctly identify and route messages from your customers to the relevant departments. But how do you teach your model to sort customer messages appropriately? Even if you are able to build a classification system, how can you easily explain your labeling criteria to supervisors and others?
Use QuickCode to create labeled training data that can be used to train machine learning models, while also providing the transparency needed to discuss and refine your work with others.
A user wanted to train a model to recognize complaints of cyber theft. The user started with a single-word query—"hack"—which they used to search 160,000 customer messages collected from over 3,000 financial services. Using QuickCode, they iterated through the recommended labels and expanded their training dataset in less than 10 minutes to more than 50 times as many complaints. The expanded data set also had more than 10 times as many affected financial institutions as the original set, providing a robust selection of training data with which to build predictive models. The labels also provided transparency, allowing the user to discuss their labeling decisions with supervisors and adjust based on their input.
Are agencies making mission-critical decisions with incomplete data?
Government data scientists, analysts, lawyers, and researchers share many of the same challenges as their private sector counterparts. But some of their needs are different. Their analysis, models, and predictions shape policies that affect citizens’ lives and inform national security. When the stakes are this high, you need to have the most complete data set possible.
QuickCode helps your agency’s data experts quickly and transparently curate datasets for analysis, modeling, and prediction. And they can use QuickCode with your data on your cloud. Better words mean better data and better data mean better predictions.
Examples of How Thresher’s QuickCode Supports Agencies’ Missions
1) Finding codewords to better understand sensitive online conversations
2) Categorizing writings about suicide bombings and domestic violence
3) Labeling foreign language texts by dialect for better sentiment analysis
4) Creating labels from the slang used to talk about drugs and human trafficking online
Thresher's QuickCode was built from the ground up with security in mind. We work in the most sensitive environments across intelligence, defense, and civilian government agencies.
· Install QuickCode on-premise behind your firewall
· Leverage QuickCode in the cloud through Amazon GovCloud
· Compliant with FISMA and NIST standard protocols
We are proud recipients of contracts from the DARPA-sponsored Small Business Innovation Research (SBIR) program. Their support is an important part of our broader commitment to continuous innovation and rigorous testing of our core technologies.
Working With Government Agencies
Thresher is a U.S. Small Business Administration (SBA)-certified small business with a robust federal partnering ecosystem. Contact us today to get started with a proof of concept or pilot program.