Deep Learning for Natural Language Processing Tasks

Posted ByPradeepta Mishra

CategoryWebinar

Date10 Jul 2020

Watch the Complete Video Youtube

Unleash the Potential of Deep Learning for Natural Language Processing Tasks

Deep learning is the subcategory of machine learning in the artificial intelligence domain. It has neural networks that are capable of unsupervised or supervised learning from unstructured data.

The architectures of deep learning such as recurrent neural networks, convolutional neural networks, deep neural networks, and deep belief networks have been used in various fields. Computer vision, speech recognition, machine vision, natural language processing, social network filtering, machine translation, audio recognition, etc., are some of the fields that use deep learning architectures.

Natural Language Processing

Natural Language Processing or NLP deals with the interaction between human natural languages and computers. NLP helps to extract text information and integrate into algorithms and computations. Deep learning models enable various NLP tasks to achieve high performance.

Why do we need deep learning models for NLP?

Linguistic expertise is scarce and hard to find
A constant need for making trainable systems to learn linguistic diversity
Deep learning models capture nuances of linguistics pretty well, given the fact that there is:
1. Well-designed system
2. Near perfect labelling
3. Well-trained models

A large structure of DL models captures all permutations of linguistic extraction and the performance of these models will be human-like or closer to it.

NLP Tasks

NLP tasks are classified into macro-level tasks and micro-level tasks. The macro-level tasks include text summarization, sentiment analysis, topic modeling, and question and answer system, and text classification. The micro-level tasks include grammar error correction (GEC), coreference resolution, dependency parsing, lexical normalization, relation prediction, and taxonomy learning

Macro-Level Tasks

Text Summarization: is the process of producing clear and concise text summary from various resources such as articles, research papers, tweets, books, emails, and blog posts.

Sentiment Analysis: Also known as emotional AI or opinion mining, it is the process of identifying and extracting the polarity (positive or negative emotions) in the text.

Topic Modeling: used to discover abstract topics from an assortment of documents. Most frequently, it is used as a text-mining tool to discover the semantic structures hidden in the body of the text.

Question and Answer System: is a discipline in which systems that can answer the questions raised by humans in their language are developed.

Text Classification: is the process of classifying or tagging text into organized groups based on the content.

Micro-Level Tasks

Grammar Error Correction (GEC): is a process in which errors in the text such as sentence structure, word choices, punctuation, grammar, and spelling errors are identified to provide the corrected version.

Coreference Resolution: is the process of collecting expressions of the same object, which is important for several higher level of NLP tasks involving natural language understanding.

Dependency Parsing: is the task of establishing a linkage between the headwords and the words that modify the headwords.

Lexical Normalization: is the process of transforming a non-standard text into a standard text. It is a preprocessing method used to convert text into speech.

Relation Prediction: predicts the relation between two semantic entities.

Taxonomy Learning: classifies the concepts from the text corpora.

Before moving into macro-level tasks, it is mandatory to complete micro-level tasks or else it will lead to text ambiguity.

Source: http://arxiv.org/pdf/2003.01200.pdf

Construction of NLP tasks

The construction of NLP tasks includes syntax, semantics, discourse, and speech. Convolutional Neural Networks (CNN) are usually used in performing NLP related tasks.

Typically, a neural network will have input, output and hidden layers. Convolutional neural networks are very similar to ordinary neural networks, however, in CNN the main difference is the number of layers. CNN has several numbers of layers that consist of several convolutional and subsampling layers along with fully connected layers.

Deep Learning Applications

Image Processing with CNN

In neural networks, the input will be a vector whereas, in CNN, the input will be a multi-channeled image. The most important aspect of CNN is the pooling layers that are applied after the convolutional layers. The pooling layer is used to reduce the resolution of the feature map.

The operation of the pooling layer involves several filters that summarize the image features. Finally, a fully connected layer is used to train a neural network model.

Text processing with CNN

In CNN, images are represented using pixels values and texts as vectors. The process starts with a sentence that is classified into different words (word embedding). These words are then imported into the convolutional layer to receive a representative number.

This representative number is used by the fully connected layer for text classification.

Advanced Models

CNN models find it difficult to maintain a sequential order even though they can extract semantic clues. Hence, Recurrent Neural Network models are used for such advanced levels.

Recurrent Neural Network (RNN)

Recurrent Neural Network (RNN) is an advanced approach ideal for the processing of sequential information. The major asset of the RNN is its ability to memories the outcome of previous computations and incorporate the information into the current computations.

There are two major variations of RNN such as bidirectional RNN and deep RNN. Bidirectional RNN is used to predict each sequential element by studying the past and future context of the element. Deep RNN consists of several hidden recurrent layers and it is used for large sentences with more words and data sparsity

Long Short-term Memory Network (LSTM)

Why do we need LSTM?

In RNN, there are two problems such as Exploding Gradient and Vanishing Gradient problems. LSTMs are artificial RNNs that can learn long-term dependencies by remembering information for a longer period. An LSTM consists of a cell, input gate, forget gate, and output gate.

Recursive Network

A recursive neural network is a special type of deep neural network formed by using the same set of weights recursively over a structure. It will reduce a structured prediction over the input, basically a scalar prediction over it. This is done by navigating a given structure through a topological order. It is typically being used when there is a dependency of various tokens when you try to understand sentimental analysis, text classification, etc. These depending tendencies will constitute a single entity and can be defined.

By combining with a specific task, the recursive neural network can capture useful semantic as well as structural information with the help of convolution and pooling layers.

Lexical Feature

Lexical feature is the representation of word embedding, which is nothing but adding relevant features from the data.

What is lexical level representation? How are these representations used for deep learning tasks?

Lexical level representation is nothing but the ranking process. There are two kinds of representations, sentence feature representation and lexical level feature representation. Sentence level feature representation is generic, whereas the lexical level representation is dynamic and helps in capturing more context than the sentence level. It is used for short text categorization, e.g. tweets and also in semantic clustering.

Multi-column Network

Multi-column networks are used for question-answering systems. MCCNNs use various column networks for extracting the types of answers, relations and context from the questions. This architecture is useful for text summarization and machine translation.

Context-Dependent Network

Context-Dependent Network is used for relational dependencies. Context is a combination of various words with meaning, and the context-dependent network uses the word embedding in the context and summarizes the meaning using convolution and pooling layers. The context-dependent network is mainly used for topic modelling and paraphrase generation.

Transfer Learning

Transfer learning is used to excerpt knowledge from the source and apply that knowledge into a different target setting. It is a process in which large scale dataset is used to train a model and this pre-trained model is used to conduct a learning process for another target task. A transformer architecture is used to create trained NLP models using large datasets.

Transfer learning can be educated in different ways by using LSTM models, transformers, recurrent transformers, etc. This learning can be applied to sentiment classification, parts-of-speech identification, topic modeling, etc.

Source: https://arxiv.org/pdf/1910.07370.pdf

Wrapping Up

Translation of languages from one language to another is a complex and time-consuming process. The meaning of each word differs from speaker to speaker based on various reasons. Deep learning approaches usually accomplish the best results to challenge the machine learning issues such as translation of the text and explaining of images.

Machine Learning Source: https://arxiv.org/pdf/2006.03541.pdf

AUTHORS

Pradeepta Mishra

Head of AI-ML, L&T Infotech (Lymbyc)

Pradeepta Mishra is the Head of AI at L & T (LYMBYC) and leads a group of Data Scientists, computational linguistics experts, Machine Learning and Deep Learning experts in creating value in the industry. He has expertise across various branches of Artificial Intelligence including Image Processing, Audio Processing, NLP, NLG and NLI, and design and implementation of expert systems and personal digital assistants.

RACE LABSScholarly insights on Emerging Technologies