Machine Learning - Particle Swarm Optimization (PSO) and Twitter

Introduction

We all live in a world where analyzing a massive set of unstructured data is becoming a business need. And the time we spend on the internet is basically the time we spend on social media. Even our daily life is affected by the people around us. And we are tending to change our opinions and thoughts towards something based on other people’s ideas and opinions. Some may call this bad and at the same time, some may call it good. Anyway, if we have the ability to think of other’s opinion that’d be really helpful and that’s when sentiment analyzes comes up. sentiment analysis we do classification of the data samples into positive, negative or neutral classes and bring out proper conclusions from the data.

A swarm is a large number of agents interacting locally with themselves. In swarm, there’s no supervisor or central control to give orders on how to behave. Swarm-based algorithms are popular in this age hence its’s passion for nature-inspired, population-based algorithms that can produce high-quality product with low cost and fast solutions to situations that are identified as complex and hard to solve. Because of that reason Swarm intelligence is becoming a million-dollar gem in the category of Artificial Intelligence that is ready to collect the pattern, lifestyle and behavior of social swarms in the environment, for example, bird flocks, honey bees, ant colonies, and fish schooling.

Fish schooling

Even though these agents (These insects/ Swarm individuals) are simpleminded with not enough experience on their own, they all work together to achieve a common goal for their survival. It’s not a matter of knowledge and the main mechanism of this interaction is called stigmergy which uses agents or actions for indirect coordination.

“ Stigmergy is a well-defined form of self-organization. It produces complex structure, seemingly intelligent structures, without need for any control, planning or any direct communication in between the individual agents.”

Based on Swarm Intelligence a simple mathematical model was developed by Kennedy and Eberhart in 1995, they majorly want to describe and discuss the social behavior of fish and birds and it was called the Particle Swarm Optimization (PSO). PSO is one of the most famous and very useful metaheuristics in the current age hence it showed the success of various optimization problems after applied on. The basic principle of this model is self-organization that describes the dynamics of complex systems. PSO uses an extremely streamlined model of social conduct to take care of the optimization problems, in a cooperative and intelligent framework.

(a) Experimental setup and drawings of the selection of the short branches by a colony of Linephitema humile, 4 and 8 min after the bridge was placed. (b) Distribution of the percentage of ants that selected the shorter branch over n experiments. The longer branch is r times longer than the short branch. The second graph (n = 18, r = 2) corresponds to an experiment in which the short branch is presented to the colony 30 min after the long branch: the short branch is not selected, and the colony remains trapped on the long branch.

The PSO algorithms are most fascinating and pulled in to the researcher in the field of fuzzy logic system, neural network, optimization, pattern recognition, robot technology, signal processing, etc.

To delete unnecessary duplicate attributes from a dataset we can use feature selection methods. Removing those attributes is a must because at the end of the day we do not want to decrease the accuracy of the algorithm which we use to predict something in the future.

Feature selecting is not an easy task because the size of the search space, Search space is proportionately increased with the features that are included in the data set. The key principle of the feature selection is to improve the quality and performance of the predictor. Feature selection acts as a bridge for the preprocessing and we are defining the feature and selection process just before we step into the extraction process. Applications of Feature selection includes data classification, image classification, cluster analysis, data analysis, image retrieval, opinion mining, review analysis, etc. Using a method called wrapper method, feature extraction can be done using two stages: The stage one is, extracting all the twitter data as a dataset and we transform this tweet into a normal text stage. What we do in the next stage is all about adding more features to the feature vector. A class label is assigned to each tweet data in the training data sector and then pass these data to several classifiers for the process of classification and at the end we get classified tweet data which are either positive, negative or neutral. Wrapper method achieves superior results than filter methods. FS is seen as an optimization problem because obtaining an optimal subset of relevant features from irrelevant and redundant data is very important. Many evolutionary algorithms have been used for optimizing the feature selection, which includes genetic algorithms and swarm algorithms. Some of the swarm-based optimization algorithms for feature selection include Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC) and Ant Colony Optimization (ACO).

Methods and Methodologies

Based on Swarm Intelligence a simple mathematical model was developed by Kennedy and Eberhart in 1995, they majorly want to describe and discuss the social behavior of fish and birds and it was called the Particle Swarm Optimization (PSO). PSO is one of the most famous and very useful metaheuristics in the current age hence it showed the success of various optimization problems after applied on. The basic principle of this model is self-organization that describes the dynamics of complex systems. PSO uses an extremely streamlined model of social conduct to take care of the optimization problems, in a cooperative and intelligent framework.

In twitter, a user is permitted to write their own opinion and views in a short message on the social platform twitter are termed as tweets. Tweets may contain characters, links, media or a recording. These tweets are not written to a standard and it has mostly short structures, slangs, abbreviations, mistakes in grammar, half-written sentences, misspellings and so forth. So, it is perplexing to expel the useful data from tweets because of their unstructured style. By applying clustering we’re trying to reduce the complexity of finding the occasion or theme of the tweet composed. To extract specific words that are matching to a particular event data is addressed by vector space demonstrate using term recurrence and inverse term recurrence. So, Particle swarm change framework is the best method to handle this issue.

Tweet data clustering is categorized into 4 major stages which are:

Gathering of twitter data

2. Preprocessing of gathered twitter data

3. Feature extraction

4. Data clustering (Using PSO)

Gathering of twitter data

The twitter API acts as the major role by giving us the opportunity to extract the definite category tweets as we need. In the first phase, Collection of twitter data is done with the help with the Tweepy library on Python.

We use a separate CSV file (We term it as a document) to store all individual tweet in the given folder of its category.

Preprocessing of Tweets

In the next stage, pre-processing of documents includes the following stages: data cleansing and a corpus of data are created for cluster analysis.

Feature extraction

In the third phase, Tokenization is done to divide the tweets into words, phrases and symbols termed as tokens and Stemming is done to reduce the actual word to their base or root form. Stop word list is maintained to observe the common words in tweets.

Data clustering using PSO:

We’re applying the PSO algorithm for clustering after the tweet pre-processing has been finished. As the first step of PSO, we’re counting and declaring the number of particles. A particle means a potential clarification to constellation the streaming tweets. So, a swarm consists of a group of contestant clustering solutions of streaming tweets. Each particle is represented as follows:

P = (C1, C2, C3,C4…Cm. Cn)

where ‘Cm’ signifies the clusters centroid vector and ’n’ is the number of clusters.

After initialization of the particles, the closest centroid vector is in tweets are assigned to each particle. The health of each particle is calculated by studying the average similarity between the cluster centroid and a tweet in the document vector space, belonging to that cluster using cosine correlation measure. Experimental results show that PSO clustering outperforms over hierarchical and partitioning clustering techniques.

Here, I was trying to extract tweets related to Telemedicine, Epilepsy and Heart Strokes and save each tweet into three different CSV files. To follow up the next part, Refer these two articles I’ve piblished.

Healthcare Tweet extraction, Sentiment Analysis and Visualization using Python
Home Research Healthcare Tweet extraction, Sentiment Analysis and Visualization using Python Abstract We all live in a…www.jayasekara.blog

Extracting Twitter Data, Pre-Processing and Sentiment Analysis using Python 3.0
Because that's a must, nowadays people don't tweet without emojis, as in a matter of fact it became another language…www.jayasekara.blog

Thank you.

Keen on getting to know me and my work? Click here for more!

1 Comments

Vale Co XeniaJuly 18, 2024 at 2:24 PM
Particle Swarm Optimization (PSO) is a powerful computational optimization technique inspired by the collective behavior of social creatures like swarming birds or schools of fish. In machine learning, PSO is used to find optimal solutions for various problems, particularly when dealing with complex, multi-dimensional functions.

Here's how PSO works in machine learning:

Particle Swarm: Imagine a swarm of particles representing potential solutions to your problem. Each particle has a position in the solution space and a velocity that determines its movement.
Fitness Function: Every position in the solution space is evaluated using a fitness function that measures how good a solution it is for your problem (e.g., minimizing error in a classification task).

Machine Learning Final Year Projects

Deep Learning Projects for Final Year Students

Personal Best: Each particle keeps track of its own best-known position (pBest) based on the fitness function. This represents the best solution it has encountered so far.
Global Best: The entire swarm also tracks the global best position (gBest), which is the best position discovered by any particle in the swarm. This allows particles to learn from each other and explore promising areas of the solution space.
Velocity Update: Based on its current position, pBest, and gBest, each particle updates its velocity. This update considers how close the particle is to its own best solution and the swarm's best solution, encouraging movement towards optimal regions.
Position Update: Using the updated velocity, each particle updates its position in the solution space. This iterative process continues until a stopping criterion is met (e.g., a certain number of iterations or reaching a sufficiently good solution).

Hot

Machine Learning - Particle Swarm Optimization (PSO) and Twitter

Introduction

Methods and Methodologies

Gathering of twitter data

Feature extraction

Data clustering using PSO:

Keen on getting to know me and my work? Click here for more!

Post a Comment

1 Comments

Featured

Make 21% more money during the week at your Pub using music data

My Music

Poetry

Follow

Popular Posts

Ravana is not a Demon King to Sri Lankans

How to deploy a Docker Container on AWS Elastic Container Service (ECS) using Elastic Container Registry (ECR)

Creating Calendar events using Google Sheets data with AppScript

Creating interactive dashboards in R Shiny using Python scripts as the backend.

Becoming a Data Scientist with Online Education (Zero to hero)

Mental Health

R Shiny

Motivational

Topics

Road So Far

Random Posts

Recent in Life

Popular Posts

Ravana is not a Demon King to Sri Lankans

Creating Calendar events using Google Sheets data with AppScript

3 Key Essential Nutritions That Help You Maintain Your Health As Women

Menu Footer Widget

Hot

Machine Learning - Particle Swarm Optimization (PSO) and Twitter

Introduction

Methods and Methodologies

Gathering of twitter data

Feature extraction

Data clustering using PSO:

Keen on getting to know me and my work? Click here for more!

You may like these posts

Post a Comment

1 Comments

Featured

Social Handles

My Music

Poetry

Follow

Popular Posts

Mental Health

R Shiny

Motivational

Topics

Road So Far

Random Posts

Recent in Life

Popular Posts

Menu Footer Widget