General interest in NLP, plus starting a new job where management had expressed an interest in social media analytics.
Tools and Techniques
MySQL (data modelling, etl procedures)
This project delivers three capabilities. First the ability to harvest tweets relating to pre-configured search terms, second a set of NLP analytics run on these tweets and thirdly a dashboard to visualise and explore the tweets and analytic results.
Tweet Harvester – This is a solution that can be left running indefinitely, which periodically (e.g. hourly) pulls tweets relating to categories of pre-set search terms. (e.g. [McDonalds, Maccy D, Big Mac], [Burger King, Whopper]). Tweets are processed for properties (user, retweet etc) and archived into a sensibly-normalised database for later exploration. In addition to the raw tweet message, each tweet is tokenised and stemmed to base words, which are also stored in the normalised database.
NLP Analytics – A set of analytics (e.g. topic modelling, term-frequency-inverse-document-frequency etc) to derive insight from the tweet data. In addition, higher level analytics aimed at providing insight on differences between sets of search terms. e.g. McDonalds terms versus Burger King terms.
Dashboard – A Tableau dashboard, both to present the twitter data (e.g. tweet explorer) and the results of the analytics.