English version and some details below the image keep scrolling!

Lista tagów zawartych w tweetach zebranych w okresie wakacji (Czerwiec – Sierpień) w roku 2017. Tweety były filtrowane słowem ‘polityka’.

Clustering twitter data is not easy. I know that now :).  A single tweet is mostly a noise. So at the end you will get a sparse matrix of corpus with density far below 1%.  Simply if all corpus contains around 50 000 words (dimensions) but a single tweet has only 10 words (and probably only one or two has some meaning). Lots of noise and sparsity. And finally clustering like k-means put all tweets to one big cluster.

So I decided to change approach and try get something different from collected data.

On my first attempt was to try make words cloud with hashtags.

To do it I used nice python module:

  • website: http://amueller.github.io/word_cloud/
  • blog: http://peekaboo-vision.blogspot.co.uk/2012/11/a-wordcloud-in-python.html

This image was done on twitter data collected during June, July and August in 2017. The data was filter by polish word ‘polityka’.

English version is a bit lower keep scrolling!

Pierwsze i bardzo podstawowe rezultaty analizy tweetów na temat polityki. Analiza obejmuje tweety zebrane wciągu jednego miesiąca. Na temat metody analizy i dalszych planów będę pisał więcej w najbliższych postach.


English version!

My first and really basic results of a twitter data mining project. The data set includes an one month collection of tweets. The tweets were filter by polish word ‘politics’ and declination of it. In the next couple posts I will talk more about a technique to get this data and the future goal for this project.