Sign up for our weekly newsletter!
Students will learn the basics of cleaning, transforming, and formatting text data. They will build corpora, pull specific elements out of text strings, and pull simple metrics from text data, such as word counts and sentiment polarity. Students will be introcued to document classification. This workshop intoduces the basics of NLTK and gensim.
Knowledge requirements: Python for Everything series
Technology Requirements: Laptop required; please install the Anaconda distribution of Python 3 or its equivalent. The workshop will utilize the Jupyter Notebook, but IDEs are also acceptable.
Please install the python packages “gensim”, “textblob” and “NLTK”:
Or if you have anaconda: