How do you quickly turn texts into labels for your machine learning?
Your manager and clients want you to use machine learning to predict an outcome. You know how to tackle the structured data you have, but what to do with the column of text data (e.g. customer comments, technician notes)? What if the outcome itself needs to be coded up based on the text? You need to quickly and accurately code documents into categories but you don’t have the time and resources to read and code thousands of documents.
Thresher will help you quickly code documents by suggesting keywords to build accurate and precise queries to classify documents.
A data scientist used Thresher ‘Quick Code’ mode to create a query in less than 15 minutes that classified more than 5,500 SMS messages as spam or not spam. This classifier had 95% accuracy relative to human coders. Furthermore, the posts classified by the Thresher-built query were used to train a machine learning model to predict future spam texts. The model had similar performance to a model trained with human coded labels. The ‘Quick Code’ approach was accurate and fast. But in addition the data scientist found it valuable because:
1) They could use the query to easily explain to their managers what words and numbers were commonly found in spam text.
2) When spammers changed their patterns, the data scientist iterated again with Thresher’s ‘Quick Code’ to update the query and prediction models, which helped the team stay abreast of the changes in spam tactics.
How do you quickly search and label vast quantities of data for new insights into healthcare, while at the same time explaining your data labeling decisions to others?
You’ve identified a factor that could dramatically affect your patients’ well-being, and have thousands of records that might help test your hypothesis.
But there are too many documents for you to review and label on your own. Even if you could, it would take time to explain your categorization system to others or to change it on the fly if new hypotheses emerge.
Quickcode helps you speedily identify healthcare documents relevant to your interests and explain your data labeling approach to your peers.
A data scientist used Quickcode to rapidly identify a subset of medical patients and transparently explain their labeling process to others. The data scientist began with a hypothesis that familial or social support improved patient outcomes. They then searched 50,000 discharge summaries made by healthcare providers with a single-word query: “social.” Using Quickcode, the scientist then selected 66 recommended labels, leading to the discovery of 33,210 relevant documents in less than 15 minutes. Quickcode subsequently allowed the data scientist to transparently discuss the validity of their data labeling decisions with subject matter experts and supervisors and select more precise labels based on their input.
How do you train and refine a machine learning model to identify complaints from your customers?
You want to use a machine learning application to correctly identify and route messages from your customers to the relevant departments. But how do you teach your model to sort customer messages appropriately? Even if you are able to build a classification system, how can you easily explain your labelling criteria to supervisors and others?
Use Quickcode to create labeled training data that can be used to train machine learning models while also providing the transparency needed to discuss and refine your work with others.
A user wanted to train a model to recognize complaints of cyber theft. The user started with a single-word query — hack — which they used to search 160,000 customer messages collected from over 3,000 financial services. Using Quickcode, they then iterated through the recommended labels and expanded their training data set in less than 10 minutes to more than 50 times as many complaints. The expanded data set also had more than 10 times as many affected financial institutions than the original set, providing a robust selection of training data with which to build predictive models. The labels also provided transparency, allowing the user to discuss their data labeling decisions with supervisors and adjust based on their input.
Are agencies making mission-critical decisions with incomplete data?
Government data scientists, analysts, lawyers, and researchers share many of the same challenges as their private sector counterparts. But some of their needs are different. Their analysis, models, and predictions shape policies that affect citizens’ lives and inform national security. When the stakes are this high, you need to have the most complete data set possible.
Thresher helps your agency’s data experts quickly and transparently curate data sets for analysis, modeling, and prediction. And they can use Quickcode with their data on their cloud. Better words mean better data and better data mean better predictions.
Examples of Thresher’s Support of Agencies’ Missions
1) Finding codewords to better understand sensitive online conversations
2) Categorizing writings about suicide bombings and domestic violence
3) Labelling foreign language texts by dialect for better sentiment analysis
4) Creating labels from the slang used to talk about drugs and human trafficking online
Thresher was built from the ground up with security in-mind. We work in the most sensitive environments across intelligence, defense, and civilian government agencies.
· Install Quickcode on-premise behind your firewall
· Leverage Quickcode in the cloud through Amazon GovCloud
· Compliance with FISMA and NIST standard protocols.
We are proud recipients of contracts from the DARPA-sponsored Small Business Innovation Research (SBIR) program. Their support is an important part of our broader commitment to continuous innovation and rigorous testing of our core technologies.
Working With Government Agencies
Thresher is a U.S. Small Business Administration (SBA)-certified small business with a robust federal partnering ecosystem. Contact us today to get started with a proof of concept or pilot program.