How can a character from Sesame Street help you increase customer sales and satisfaction?
Out of the millions of events happening every day, there’s only a handful that is relevant to your company. By combining the current state-of-the-art in NLP research with our AI-driven sales intelligence platform, we at Mito.ai have figured out a way to filter out the noise.
Software Engineer at Mito.ai
Take the example; one company got a new contract, while another announced a massive scale-up. If you are a bank, you may want to give the first company a loan; while as an insurance company, the latter might be a chance to land a new customer on your latest insurance deal.
Signals like those above are spread out all over the internet, hidden in everything from news articles, social media, public announcements and so forth. At Mito.ai we process an enormous number of these sources from all languages every day to help our customers, typically large enterprises, stay on top of their existing customer portfolio and potentially expand them.
An important step in understanding the semantics of these signals is textual categorization. In our machine learning pipeline, this means we have to do the following:
- Tagging the textual contents of documents, e.g. “finance” or “sport”
- Identifying and categorizing events, e.g. “company won a contract” or “hired new CEO”
Furthermore, we’re interested in doing this in a language-agnostic manner. This means that if we train a model to classify a news article as “new contract”’ we want to train the model in such a way that it can solve this task in any language, regardless of whether or not it was actually trained for that language. This is an extremely complicated task, and traditional feature-engineering methods such as bag-of-words, which requires large amounts of training data for each classifier, is simply not going to cut it for a startup with scarce resources.
In this blog post, we’ll shed some light on how we’ve achieved this using BERT embeddings and neural networks. BERT is a new method for language-representation by Google from Google Research Lab, which has revolutionized the way we translate text into numbers.
Note to readers: the next part may be a bit technical.
A recurring problem when doing AI on text is processing the text itself. It is hard for computers to use text as input since computers only truly understand math, and text is an entirely different animal. As a result, methods for transforming the text into numbers have been developed. Word embeddings are one such method where each word in the vocabulary is mapped to its own numerical vector.
BERT embeddings to the rescue
But there’s a problem: what if words are homographs, that is, words that are spelled the same, but with a different meaning?
Perhaps we could solve this by incorporating the information in the surrounding words to determine which version of the word we were talking about! This is where BERT comes into the picture. A BERT model uses the context of each word to generate an embedding which encapsulates the context and the meaning of the word at the same time.
This difference can be illustrated by fetching embeddings for the following sentences: “This bank just announced massive layoffs” and “There’s an alligator down on the river bank”. An “ordinary” word embedding model would just walk through each sentence and fetch one embedding for each of the given words.
But there’s a pitfall: the word “bank” is used in both sentences. While humans can easily tell that these are different types of banks, a simple word embedding model would treat the words equally since it has no way to differentiate between them.
But BERT is different! This is because it tries to imitate what humans do when we read a sentence. In this case, it would see the word “bank” in each sentence, but at the same time, it will look at the surrounding words and see that the first sentence includes the word “deposit” and it would modify the embedding for bank accordingly. A simple word embedding would be caught in limbo between the “financial bank”-meaning and the “river bank”-meaning when constructing its vector and would, therefore, lose valuable information.
Neural network classifier
What we’re left with using BERT, is that we have transformed the text into vectors which a computer is able to process. However, the task of making sense of this information in our categorization problem remains.
One possible way to solve this is to build a simple neural network that can take these vectors as an input and then produce predictions. Combining this neural network with BERT’s embeddings means that we can build a language agnostic categorizer trained with limited amounts of training data!
This is also great news from a business point of view, as the reduced requirement for large piles of training data makes it possible to deliver tailored models for each of your customers, even though they may have completely different needs.
Finally; a big shout out to the great people at Google Research Labs that make research models like BERT open to the public, as we at Mito.ai are strong believers in that the competitive advantage should lie in putting the models to use, and not in the methodology itself.
Want to work with BERT embeddings and deep learning? We are hiring!
Shoot us an email at firstname.lastname@example.org.