Students will learn the basics of cleaning, transforming, and formatting text data. They will pull specific elements out of text strings, and pull simple metrics from text data, such as word counts, syntax quantification via part of speech (POS) tagging, and sentiment polarity. Students will be introduced to topic modeling and word2vec methods. The libraries used are NLTK, TextBlob, and gensim. This is an interactive, hands-on workshop, in which students will complete challenges related to each text analysis task.
Prior knowledge: Completion of D-Lab's Python for Everything Series.
Technology Requirements: Laptop required; please install the Anaconda distribution of Python 3 or its equivalent. The workshop will utilize the Jupyter Notebook, but IDEs are also acceptable.
Please install the python packages “gensim”, “textblob” and “NLTK”:
- pip install gensim
- pip install nltk
- pip install textblob
Or if you have anaconda:
- conda install gensim
- conda install nltk
- conda install textblob