Importing Dataset & Libraries

Pandas, numpy and matplotlib are standard libraries. Seaborn and matplotlib are for visualization. Nltk and sklearn are mostly used for natural language processing in this case. Geopy, folium, pygal and IPython.display allows me to work with country coordinates and plot on world map.

Data Pre-processing

Had to change 'Burma' to 'Myanmar', otherwise geolocator library won't be able to find its coordinates. Same goes for other countries. I also dropped 'World'

I add in continent manually for Gambia and East Timor, since pycountry_convert just doesn't recognize them.

Get the coordinates of each country.

Turns out Africa has the most number of countries as a continent.

Use pycountry_convert to get country code.

Code is not recognized for The Gambia, so I manually added with Wikipedia's help.

Natural Language Processing

Delete 'current situation: ' from text, since it is repeated in all country descriptions.

Text preprocessing

Compute TF-IDF

Algeria seems to occur frequently in the world map.

Compute polarity and subjectivity based on pre-processed text

South America has the lowest polarity

However, it also has the highest subjectivity.

Visualization

I have to use the display_svg function to show the svg in my notebook. Using pygal library on country code and polarity/subjectivity, I can use colour to represent the intensity of the values in each country.

Because I want to zoom in and see the actual values, I use folium to get interactive map.

And that's it. If you want to contact me, feel free to reach out to my email.