Go back to Blogs
Artificial Intelligence: Natural Language Processing for accurate Sentiment Analysis
Introduction
Sentiment analysis leveraging natural language processing (NLP) technique is one of the most exciting topics in this decade due to the widespread use of social media, which helps analyze interactions between people and determine the important structural patterns in communications. Sentiment Analysis uses intelligence methods to analyze, process and reveal people’s feelings, sentiments and emotions hidden behind a text or interaction through natural language processing. By determining the emotional tone behind words, it helps understand the attitudes, opinions, and emotions expressed in online content.
This classification of text into categories such as positive, negative, or neutral provides valuable insights for businesses, researchers, and policymakers. Sentiment analysis is widely applied in various domains, including customer feedback, market research, and social media monitoring. By categorizing sentiments, organizations can better understand consumer opinions, identify emerging trends, and make informed decisions.
The figure below illustrates a common sentiment categories described above:
In this blog, we have developed a Python script that automates the process of sentiment analysis, from data preparation to performance analysis and visualization. We utilized the VADER (Valence Aware Dictionary and Sentiment Reasoner) sentiment analysis tool from NLTK to analyze and score the sentiment of each input text.
The diagram below illustrates the Sentiment Analysis workflow:
Sentiment Analysis – The script to perform the above process is provided below:
# -*- coding: utf-8 -*-
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
import matplotlib.pyplot as pyplot
import seaborn as seaborn
import json
import logging
# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def download_nltk_data():
try:
nltk.download('vader_lexicon')
logging.info('Downloaded NLTK data.')
except Exception as e:
logging.error("Error downloading nltk_data " + str(e))
def load_texts(file_path):
try:
with open(file_path, 'r') as file:
texts = file.readlines()
logging.info(f'Loaded {len(texts)} texts from {file_path}.')
return [text.strip() for text in texts]
except Exception as e:
logging. error("Error loading texts from file: " + str(e))
def analyze_sentiment(texts, analyzer):
results = []
try:
for text in texts:
sentiment = analyzer.polarity_scores(text)
results.append({'Text': text, 'Sentiment': sentiment})
logging.info(f'Analyzed sentiment for {len(texts)} texts.')
except Exception as e:
logging. error("Error analyzing sentiment: " + str(e))
return results
def create_dataframe(results):
try:
df = pd.DataFrame(results)
df[['Negative', 'Neutral', 'Positive', 'Compound']] = df['Sentiment'].apply(pd.Series)
df.drop(columns=['Sentiment'], inplace=True)
logging.info('Created DataFrame from sentiment analysis results.')
return df
except Exception as e:
logging. error("Error creating dataframe: " + str(e))
def visualize_sentiment_distribution(df):
try:
# Visualize compound scores distribution
pyplot.figure(figsize=(10, 6))
seaborn.histplot(df['Compound'], kde=True, bins=30)
pyplot.title('Sentiment Compound Score Distribution')
pyplot.xlabel('Compound Score')
pyplot.ylabel('Frequency')
pyplot.savefig('compound_score_distribution.png')
pyplot.close()
# Visualize sentiment scores distribution
pyplot.figure(figsize=(10, 6))
seaborn.boxplot(data=df[['Negative', 'Neutral', 'Positive']])
pyplot.title('Sentiment Scores Distribution')
pyplot.ylabel('Score')
pyplot.savefig('sentiment_scores_distribution.png')
pyplot.close()
logging.info('Visualized sentiment distribution and saved as images.')
except Exception as e:
logging. error("Error visualizing results: " + str(e))
def save_results(results, file_path):
try:
with open(file_path, 'w') as file:
json.dump(results, file, indent=4)
logging.info(f'Saved results to {file_path}.')
except Exception as e:
logging. error("Error saving results to json: " + str(e))
def perform_operation():
try:
download_nltk_data()
analyzer = SentimentIntensityAnalyzer()
texts = load_texts(input_file)
results = analyze_sentiment(texts, analyzer)
df = create_dataframe(results)
visualize_sentiment_distribution(df)
save_results(results, output_file)
except Exception as e:
logging. error("Error performing analysis operation " + str(e))
if __name__ == '__main__':
input_file = 'sample_texts.txt' # Path to the input file containing texts
output_file = 'sentiment_analysis_results.json' # Path to save the results
try:
perform_operation()
except Exception as em:
logging. error("Error in main " + str(em))
Key Features – overview of the key components of the script:
- Downloading NLTK Data – The script starts by downloading essential NLTK data, including the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon, which is crucial for sentiment analysis.
- Loading Texts – read text data from file.
- Sentiment Analysis – Implements VADER sentiment analysis to evaluate the sentiment of each text, providing scores for negative, neutral, positive, and compound sentiments.
- Creating DataFrame – The results are structured into a Pandas DataFrame, facilitating further analysis and visualization. This step organizes the sentiment scores in a tabular format, making it easier to manipulate and examine the data.
- Visualizing Results – Two key plots are generated to visualize the sentiment analysis results:
- Histogram of Compound Scores – This plot displays the overall sentiment distribution across all texts.
- Box Plot of Sentiment Scores – This plot shows the spread and variability of negative, neutral, and positive sentiments.
- Saving Results – Saves the results to a JSON file for better readability and structure.
Sample Input File – Create a text file named `sample_texts.txt` with one text per line. Sample text input is provided below:
The city is so wonderful, it is a wonderful visiting destination throughout the year.
Your new website design is fantastic! I love it!
The movie was too long, but it was quite amazing.
The product that I ordered was delivered late and the package was damaged.
The new policy of the company is unclear, I'm not sure how I feel about it.
It is such a beautiful day outside, and I'm feeling great!
The new software update has too many bugs. It is very disappointing.
The location is ideal to reach to historic sites as well as to the coastline.
Artificial Intelligence (AI) is transforming the world in an amazing way!
The customer service is unhelpful! I never get a response for any of my requests.
Run the Script – Execute the script using Python bash command as illustrated below:
python3 sentiment_analysis.py
Results and Analysis
There are two plots generated by the script which provide insights into the sentiment analysis result. Let’s break down what each plot indicates:
Sentiment Compound Score Distribution
The histogram shows the distribution of the compound sentiment scores for the analyzed texts. The compound score is a single value that summarizes the overall sentiment of a text, ranging from -1 (most negative) to +1 (most positive)
From the plot, we can observe the following:
- Distribution – how sentiment scores are spread across the texts.
- Central Tendency – Where most of the texts’ sentiment scores are clustered. For instance, a peak near a positive score indicates most texts with positive sentiment.
- Spread – The range of sentiment scores, indicating variability in sentiment.
Sentiment Scores Distribution
The box plot shows the distribution of negative, neutral, and positive sentiment scores across all texts.
From the plot, we can observe that:
- Negative Sentiment – Typically, texts with negative sentiment have lower scores, as indicated by the box plot for the negative categories.
- Neutral Sentiment – Most texts might have higher neutral scores, indicating a predominance of neutrality in the sentiment.
- Positive Sentiment – Positive sentiment scores show how often texts are rated as positive.
JSON output
The result below illustrates the json output of the sentiment analysis:
[
{
"Text": "The city is so wonderful, it is a wonderful visiting destination throughout the year.",
"Sentiment": {
"neg": 0.0,
"neu": 0.537,
"pos": 0.463,
"compound": 0.8881
}
},
{
"Text": "Your new website design is fantastic! I love it!",
"Sentiment": {
"neg": 0.0,
"neu": 0.417,
"pos": 0.583,
"compound": 0.855
}
},
{
"Text": "The movie was too long, but it was quite amazing.",
"Sentiment": {
"neg": 0.0,
"neu": 0.615,
"pos": 0.385,
"compound": 0.7677
}
},
{
"Text": "The product that I ordered was delivered late and the package was damaged.",
"Sentiment": {
"neg": 0.209,
"neu": 0.791,
"pos": 0.0,
"compound": -0.4404
}
},
{
"Text": "The new policy of the company is unclear, I'm not sure how I feel about it.",
"Sentiment": {
"neg": 0.234,
"neu": 0.766,
"pos": 0.0,
"compound": -0.4519
}
},
{
"Text": "It is such a beautiful day outside, and I'm feeling great!",
"Sentiment": {
"neg": 0.0,
"neu": 0.417,
"pos": 0.583,
"compound": 0.8687
}
},
{
"Text": "The new software update has too many bugs. It is very disappointing.",
"Sentiment": {
"neg": 0.241,
"neu": 0.759,
"pos": 0.0,
"compound": -0.5413
}
},
{
"Text": "The location is ideal to reach to historic sites as well as to the coastline.",
"Sentiment": {
"neg": 0.0,
"neu": 0.645,
"pos": 0.355,
"compound": 0.6808
}
},
{
"Text": "Artificial Intelligence (AI) is transforming the world in an amazing way!",
"Sentiment": {
"neg": 0.0,
"neu": 0.556,
"pos": 0.444,
"compound": 0.8016
}
},
{
"Text": "The customer service is unhelpful! I never get a response for any of my requests.",
"Sentiment": {
"neg": 0.0,
"neu": 1.0,
"pos": 0.0,
"compound": 0.0
}
}
]
Insights from the Analysis
- Positive Sentiment Prevalence – The histogram of compound scores indicated a skew towards positive sentiment, suggesting that the analyzed texts were generally positive.
- Neutral Sentiment Dominance – The box plot revealed that neutral sentiment scores were typically higher, implying that many texts were balanced in tone.
- Outliers and Variability – The presence of outliers in both negative and positive sentiment scores showed that while most texts were neutral, there were significant deviations reflecting strong emotions in some cases.
The result offers a summary of the sentiment analysis, providing a quick way to understand the emotional tone of the input texts. They reveal trends, central tendencies and variability, which are crucial for interpreting the sentiment of the analyzed data set.
Practical Applications
The insights derived from this sentiment analysis can drive various business and research applications:
- Sentiment Analysis Utility – The plots help in understanding the overall sentiment distribution of the given texts, which can be useful for gauging public opinion, customer feedback or social media sentiments.
- Data Quality – If there are many outliers or a wide spread, it might indicate diverse opinions or noisy data.
- Decision Making – Businesses and researchers can use these insights to make informed decisions, such as improving products based on negative feedback or leveraging positive sentiment for marketing.
Conclusion
The Sentiment Analysis approach presented in this article outlines a robust pipeline for conducting sentiment analysis using Python and NLTK. By systematically cleaning and preprocessing text data, applying VADER for sentiment scoring, and visualizing the results, this method transforms raw textual data into valuable emotional insights. These insights can significantly enhance decision-making and strategy formulation across diverse fields such as business intelligence, market research, and academic studies. With its comprehensive coverage of data handling, sentiment scoring, and result visualization, this solution serves as a strong foundation for any sentiment analysis task.
References
- Shedthi, Shabari, and Vidyasagar Shetty. “Role of machine learning in sentiment analysis: trends, challenges, and future directions.” Computational Intelligence Methods for Sentiment Analysis in Natural Language Processing Applications. Morgan Kaufmann, 2024. 1-21.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” NAACL-HLT.