Banca de QUALIFICAÇÃO: MARIO ANDRE DE MENEZES COSTA



Uma banca de QUALIFICAÇÃO DE MESTRADO foi cadastrada pelo programa.

DISCENTE: MARIO ANDRE DE MENEZES COSTA
DATA: 07/06/2019
HORA: 09:00
LOCAL: LaCCAN/CPMAT
TÍTULO:
Using Deep Neural Networks and Word-embedding Model for Short Text Categorization and Sentiment Analysis

RESUMO:
The process of classification and sentiment analysis in small texts based on their similar characteristics between words is a huge computational challenge. Different approaches are adopted depending on the purpose of the classification and communication channel adopted for extracting this information, such as blogs, social networks or news sites. All this diversity arouses great interest of the researchers and one of the most adopted approaches to classification and sentiment analysis in small texts is using Deep Neural Networks like Convolutional Neural Networks (CNN). 

But there are some obstacles involved in the process. Because it usually depends on a supervised learning model, with the training base provided a priori. This raises a great challenge, since some conventional approaches make use of the Bag-of-Words model, however we do not always have enough data to achieve good results. For this an alternative is to employ the use of word-embedding models, like Word2Vec, for example. 

Word-embedding-based models, using only the average sum of their factors, present good results for different situations, such as informing which class a given word is most related to, or if that small text has its words closest to words with positive or negative connotation. But as a disadvantage comes the inability to deal with criteria of relationship and order between words. 
In this way, using Convolutional Neural Networks, a small dictionary, plus an extra information base (Word2Vec) we can mitigate the problem of small dictionaries for training our model, adding criteria of relationship and order between words. Each word is represented by an embedded vector and neighboring words are related through the convolutional matrix. Using MaxPooling and a dense neural network. 

So, this chapter aims to evaluate the performance of classification models that use only the average sum of word-embedding representations of each word, however with some limitation, and compare with models that make use of CNN plus word-embedding. We will use data collected from Twitter news channels (social sensing), which specifically talk about climate information in the city of Chicago and from there to extract our small dictionary, which will have its expanded capabilities with the use of Word2Vec involving multiple filters with variable window sizes and to verify the level of accuracy of the two approaches in detecting Tweets generated from the city of chicago that talk about climate.

PALAVRAS-CHAVE:
convolutional neural network, deep neural network, word embedding model, neural network, sentiment analysis

PÁGINAS: 15
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação

MEMBROS DA BANCA:
Interno(a) - 310567 - ELIANA SILVA DE ALMEIDA
Notícia cadastrada em: 03/04/2019 16:11
SIGAA | NTI - Núcleo de Tecnologia da Informação - (82) 3214-1015 | Copyright © 2006-2024 - UFAL - sig-app-2.srv2inst1 08/05/2024 01:02