Topic Modeling & Sentiment Analysis of Electric Vehicles
Project Brief
Social networking sites like Twitter are valuable sources of information. Data from these sites can be extracted and analyzed to enhance the reach of any company products/services. In this project, the data sourced from Twitter using Twitter API is used to understand the public feelings and perceptions on electric vehicles.
The purpose of the project is to develop a fully-functional web app that can analyze how people are communicating their feelings and thoughts on different products/services. The Twitter data was used to develop topic modeling and sentiment analysis models to predict topics and sentiments associated with electric vehicles.
The problem statement addressed is that when user comments are unstructured, the data requires pre-processing to receive meaningful data for analysis. The extracted data might contain @ symbol, hashtags, emoji, repeated words, and so on. The goal of the project is to understand user sentiments through which enterprises can make better business decisions related to electric vehicles.
Approach
During this study, the first step is to create a pre-processed data model based on natural language processing (NLP) methods that help in selecting the tweets. In the second step, topic modeling, word cloud, and EDA were used to examine several aspects of electric vehicles. By using Latent Dirichlet Allocation (LDA), topic modeling was performed to infer the various topics of electric vehicles. In the third step, the “Valence Aware Dictionary (VADER)” and “Sentiment Reasoner (SONAR)” are used to analyze whether sentiments related to electric vehicles are positive, negative, or neutral.
The Methodology followed in this project is the Team data science process (TDSP), a process that describes the data science life cycle including:
- Business understanding
- Data acquisition and understanding
- Modeling
- Deployment
- Customer Acceptance
In this project, the topic modeling performance was explored using the tweet’s content to determine topics. Twitter data is used to develop topic modeling and sentiment analysis models that can predict electric vehicle-related topics and sentiment.
In this project, NLP techniques of Text Mining, Topic modeling, VADER, and SONAR sentiment analysis have been used to solve the problem stated by analyzing the topics and sentiments. Streamlit framework is used to develop the functional web app.
Results
In this project, 45000 tweets, related hashtags, user location, and different topics of electric vehicles were collected from Twitter API. As per VADER, 47.1% of tweets are positive, 42.4% are neutral, and 10.5% are negative. SONAR identified critical tweets that were hateful and offensive. Topic models, VADER and SONAR sentiment analysis models were built and deployed. Based on the analysis, there is an increase in trust and positive sentiment that shows electric vehicles will be widely accepted in near future.
Keywords: Twitter, Tweets, Topic Modeling, Sentiment analysis, VADER, SONAR, pyLDA Latent Dirichlet Allocation, Machine Learning, Natural Language Processing, Streamlit, Heroku, Deployment, Polarity, Word Cloud