TF-IDF Clustering: Facebook Antivax Posts (Pre-censorship)

TL;DR: Of 6 clusters, cluster 3 and 4 are mostly related to pets. Cannot analyze comments due to posts removed by Facebook.

0.1.0 - 27/5/21
While looking for texts from antivaxxers to analyze, I found a news article. The author explained that anti-vaxxers' posts are growing rampant on Facebook. He provided a Google Sheets that contains a list of such alleged posts. I was delighted when I saw that it has post message and number of likes and other reactions. Thus, I decided to use the dataset for a new data science project. The Jupyter Notebook code is vailable here. I began with data cleaning and imputation. I was lucky that the dataset is mostly clean, but my computer took forever to clean the text.

0.2.0 - 31/5/21
Had some emergency to deal with about my thesis, but glad I settled it. For text pre-processing, I learnt to split the dataset into chunks. Through text pre-processing, I removed non-English words.

0.3.0 - 1/6/21
It was a good day. Not only did I added TF-IDF and text clustering, I made sure to use dashboards wherever possible to make visualization easier. With tabs, I don't need to keep scrolling up and down in my notebook to compare different visuals. You can see them in the next section on this page.

0.4.0 - 2/6/21
Well, looks like I hit a roadblock. I wanted to analyze the comments, but Facebook has removed most of the most commented posts due to censorship. That's a shame. Nevertheless, I will start another antivax analysis project using another data source.

Dashboard

current_freq

current_cloud

current_pages

current_corr