Could pre-trained neural networks show-off their innate talent? Zero-shot classification in NLP

5 min readDec 12, 2020

Introduction

One of the problems in building machine learning models is the lack of sufficient data. Building these models requires a lot of data, and to be able to collect this amount of data on the subject, a high cost of data labeling by individuals must be paid. The cost is so high that sometimes the project does not start due to a lack of data. In recent years, a solution called Transfer Learning has enabled AI models to achieve higher quality.

Image by myrfa from pixabay. Could we classify texts without any labeled data? Zero-shot classification seems a valid solution for this.

In Transfer Learning, we start training the model based on a model that has already been trained for similar or related problems. Then we retrain this model in the main problem (called fine-tuning) so that we can get a higher quality with fewer data. If we want to have an intuitive example for Transfer Learning, we can consider starting from basic sports and then going to the desired sport instead of starting the desired sport directly. The person coming from basic sports can acquire more skills in less time, as is the case with Transfer Learning. If we use a model that has been trained on the same problem at the beginning of the model training, we will get better results with fewer data.

In recent years, the increase in computing power, the growth of data on the Internet, and the design of new models led to the emergence of pre-trained models in natural language processing, which broke records and improved the best quality achieved by models on different datasets continuously. Pretrained models in natural language processing in recent years have often been training on a problem called language modeling. Language model in simple terms means to estimate how likely a sentence is to happen in the language. or how can an incomplete sentence be completed correctly? For this purpose, we can collect data from the Internet, including blogs, Wikipedia, news agencies, etc., to create the datasets needed for Pretraining. If our model can perform well, it needs to get a lot of information about all the concepts to understand the language, and this means that for example, if we give the phrase “Iran is located in …” to be filled with the model, For the model to be able to fill in the blanks properly, it needs to store information about geography in some way. After the Pretrained model was developed, it has already included information about the world and concepts. It can be used as a starting point to solve other natural language processing problems. The result is a higher quality model in comparison to training from scratch. Albeit, this method is also criticized by some people and raises problems that leaving important and vital tasks in the hands of such models is criticized. These models have little interpretability and may produce bizarre results that are not acceptable for these important problems. In “GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about” some examples of these weird behaviors in the GPT-3 model are explained.

The next step in this direction is to expect the model, after finding information about the topics and concepts of the world, to predict without using any additional data in the field, the solution of zero-shot classification follows this issue. This solution, which has been proposed since 2016, is possible in several ways, the obtained sentence vector can be compared with the word vector of the desired categories in terms of cosine similarity, and with a few extra steps in this direction, the quality could be increased. One of the newly introduced methods is the use of Natural Language Inference problem, which has led to the acquisition of new qualities. One of the newly introduced methods for Zero-Shot classification is the use of Natural Language Inference problem, which has led to breakthroughs in results.

Natural Language Inference

This problem is defined by receiving two sentences, the primary assumption (premise) and the second assumption (hypothesis). The model must recognize that the second sentence is true (entailment) or false (contradiction) or neutral based on the first sentence.

For example, if the premise is “Joe plays professional tennis.” and the hypothesis is “Joe is an athlete.”, The inference is correct (entailment). But considering the hypothesis “Joe does not know tennis.” for the same premise, The inference is wrong. Also if the hypothesis is “Joe is a bank employee.” for the same premise, This sentence is neutral compared to the previous sentence.

Classification as Natural Language Inference

The problem of Natural Language Inference tells us the connection between two sentences. Classification is that we want to classify what topic a sentence relates to. Suppose our topics are: “entertainment”, “politics”, “sports”. To do this, if our sentence is for example “Messi and Ronaldo have played 35 games against each other so far.” We can consider that as the premise and define the hypothesis as “This text is about sports.”. By giving these two sentences to the natural language inference model, we can obtain the probability that the combination of these two sentences is a correct inference. Then, we can compare this probability with the probability derived from the hypothesis like “This text is about politics.” And “This text is about entertainment.” and thus wherever there is a higher probability we assume that as the correct classification for this text.

Conclusions

The lack of labeled data decelerates the progress of artificial intelligence applications. Using Zero-shot classification, we can classify the subject without the need for data. The quality of zero-shot classifications has not yet reached the quality when we have sufficient data, but it is an evolving research topic. In case a required amount of data is not available, this solution can help organizations to have a quick start. One of the positive points of this method is its high flexibility compared to previous methods so that the end-user can only declare the names of the topics and classification happens according to the user’s needs, instead of spending high costs for labeling data. One possible application of this could be to categorize emails or social media feeds or mobile SMS. To do this, the user can receive information in a categorized way just by defining the names of the topics. Currently, there are only predefined classes of contents that the platforms themselves specify for users, and this amount of personalization and definition of the new classes is not available in popular platforms. To conclude, we have to wait for more exciting machine learning developments in the future of content-based platforms.

Further Reads

Live demo of Zero-Shot Classification

Zero-Shot Learning in Modern NLP

Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach, Paper

Zero-Shot Text Classification with Hugging Face

Semi-supervised learning, Google FixMatch Achievement on CIFAR-10