Introduction
We all live in a world where analyzing a massive set of unstructured data is becoming a business need. And the time we spend on the internet is basically the time we spend on social media. Even our daily life is affected by the people around us. And we are tending to change our opinions and thoughts towards something based on other people’s ideas and opinions. Some may call this bad and at the same time, some may call it good. Anyway, if we have the ability to think of other’s opinion that’d be really helpful and that’s when sentiment analyzes comes up. sentiment analysis we do classification of the data samples into positive, negative or neutral classes and bring out proper conclusions from the data.
A swarm is a large number of agents interacting locally with themselves. In swarm, there’s no supervisor or central control to give orders on how to behave. Swarm-based algorithms are popular in this age hence its’s passion for nature-inspired, population-based algorithms that can produce high-quality product with low cost and fast solutions to situations that are identified as complex and hard to solve. Because of that reason Swarm intelligence is becoming a million-dollar gem in the category of Artificial Intelligence that is ready to collect the pattern, lifestyle and behavior of social swarms in the environment, for example, bird flocks, honey bees, ant colonies, and fish schooling.
Even though these agents (These insects/ Swarm individuals) are simpleminded with not enough experience on their own, they all work together to achieve a common goal for their survival. It’s not a matter of knowledge and the main mechanism of this interaction is called stigmergy which uses agents or actions for indirect coordination.
“ Stigmergy is a well-defined form of self-organization. It produces complex structure, seemingly intelligent structures, without need for any control, planning or any direct communication in between the individual agents.”
Based on Swarm Intelligence a simple mathematical model was developed by Kennedy and Eberhart in 1995, they majorly want to describe and discuss the social behavior of fish and birds and it was called the Particle Swarm Optimization (PSO). PSO is one of the most famous and very useful metaheuristics in the current age hence it showed the success of various optimization problems after applied on. The basic principle of this model is self-organization that describes the dynamics of complex systems. PSO uses an extremely streamlined model of social conduct to take care of the optimization problems, in a cooperative and intelligent framework.
The PSO algorithms are most fascinating and pulled in to the researcher in the field of fuzzy logic system, neural network, optimization, pattern recognition, robot technology, signal processing, etc.
To delete unnecessary duplicate attributes from a dataset we can use feature selection methods. Removing those attributes is a must because at the end of the day we do not want to decrease the accuracy of the algorithm which we use to predict something in the future.
Feature selecting is not an easy task because the size of the search space, Search space is proportionately increased with the features that are included in the data set. The key principle of the feature selection is to improve the quality and performance of the predictor. Feature selection acts as a bridge for the preprocessing and we are defining the feature and selection process just before we step into the extraction process. Applications of Feature selection includes data classification, image classification, cluster analysis, data analysis, image retrieval, opinion mining, review analysis, etc. Using a method called wrapper method, feature extraction can be done using two stages: The stage one is, extracting all the twitter data as a dataset and we transform this tweet into a normal text stage. What we do in the next stage is all about adding more features to the feature vector. A class label is assigned to each tweet data in the training data sector and then pass these data to several classifiers for the process of classification and at the end we get classified tweet data which are either positive, negative or neutral. Wrapper method achieves superior results than filter methods. FS is seen as an optimization problem because obtaining an optimal subset of relevant features from irrelevant and redundant data is very important. Many evolutionary algorithms have been used for optimizing the feature selection, which includes genetic algorithms and swarm algorithms. Some of the swarm-based optimization algorithms for feature selection include Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC) and Ant Colony Optimization (ACO).
Methods and Methodologies
Based on Swarm Intelligence a simple mathematical model was developed by Kennedy and Eberhart in 1995, they majorly want to describe and discuss the social behavior of fish and birds and it was called the Particle Swarm Optimization (PSO). PSO is one of the most famous and very useful metaheuristics in the current age hence it showed the success of various optimization problems after applied on. The basic principle of this model is self-organization that describes the dynamics of complex systems. PSO uses an extremely streamlined model of social conduct to take care of the optimization problems, in a cooperative and intelligent framework.
In twitter, a user is permitted to write their own opinion and views in a short message on the social platform twitter are termed as tweets. Tweets may contain characters, links, media or a recording. These tweets are not written to a standard and it has mostly short structures, slangs, abbreviations, mistakes in grammar, half-written sentences, misspellings and so forth. So, it is perplexing to expel the useful data from tweets because of their unstructured style. By applying clustering we’re trying to reduce the complexity of finding the occasion or theme of the tweet composed. To extract specific words that are matching to a particular event data is addressed by vector space demonstrate using term recurrence and inverse term recurrence. So, Particle swarm change framework is the best method to handle this issue.
Tweet data clustering is categorized into 4 major stages which are:
- Gathering of twitter data
2. Preprocessing of gathered twitter data
3. Feature extraction
4. Data clustering (Using PSO)
Gathering of twitter data
The twitter API acts as the major role by giving us the opportunity to extract the definite category tweets as we need. In the first phase, Collection of twitter data is done with the help with the Tweepy library on Python.
We use a separate CSV file (We term it as a document) to store all individual tweet in the given folder of its category.
Preprocessing of Tweets
In the next stage, pre-processing of documents includes the following stages: data cleansing and a corpus of data are created for cluster analysis.
Feature extraction
In the third phase, Tokenization is done to divide the tweets into words, phrases and symbols termed as tokens and Stemming is done to reduce the actual word to their base or root form. Stop word list is maintained to observe the common words in tweets.
Data clustering using PSO:
We’re applying the PSO algorithm for clustering after the tweet pre-processing has been finished. As the first step of PSO, we’re counting and declaring the number of particles. A particle means a potential clarification to constellation the streaming tweets. So, a swarm consists of a group of contestant clustering solutions of streaming tweets. Each particle is represented as follows:
P = (C1, C2, C3,C4…Cm. Cn)
where ‘Cm’ signifies the clusters centroid vector and ’n’ is the number of clusters.
After initialization of the particles, the closest centroid vector is in tweets are assigned to each particle. The health of each particle is calculated by studying the average similarity between the cluster centroid and a tweet in the document vector space, belonging to that cluster using cosine correlation measure. Experimental results show that PSO clustering outperforms over hierarchical and partitioning clustering techniques.
Here, I was trying to extract tweets related to Telemedicine, Epilepsy and Heart Strokes and save each tweet into three different CSV files. To follow up the next part, Refer these two articles I’ve piblished.
Home Research Healthcare Tweet extraction, Sentiment Analysis and Visualization using Python Abstract We all live in a…www.jayasekara.blog
Because that's a must, nowadays people don't tweet without emojis, as in a matter of fact it became another language…www.jayasekara.blog
Thank you.
1 Comments
Particle Swarm Optimization (PSO) is a powerful computational optimization technique inspired by the collective behavior of social creatures like swarming birds or schools of fish. In machine learning, PSO is used to find optimal solutions for various problems, particularly when dealing with complex, multi-dimensional functions.
ReplyDeleteHere's how PSO works in machine learning:
Particle Swarm: Imagine a swarm of particles representing potential solutions to your problem. Each particle has a position in the solution space and a velocity that determines its movement.
Fitness Function: Every position in the solution space is evaluated using a fitness function that measures how good a solution it is for your problem (e.g., minimizing error in a classification task).
Machine Learning Final Year Projects
Deep Learning Projects for Final Year Students
Personal Best: Each particle keeps track of its own best-known position (pBest) based on the fitness function. This represents the best solution it has encountered so far.
Global Best: The entire swarm also tracks the global best position (gBest), which is the best position discovered by any particle in the swarm. This allows particles to learn from each other and explore promising areas of the solution space.
Velocity Update: Based on its current position, pBest, and gBest, each particle updates its velocity. This update considers how close the particle is to its own best solution and the swarm's best solution, encouraging movement towards optimal regions.
Position Update: Using the updated velocity, each particle updates its position in the solution space. This iterative process continues until a stopping criterion is met (e.g., a certain number of iterations or reaching a sufficiently good solution).