Sign up for our weekly newsletter!
This workshop addresses clustering and topic modeling in Python, primarily through the use of scikit-learn and gensim. We first read in a corpus, prepare the data, create a tfidf matrix, and cluster using k-means. We will then compare results to LSI and LDA topic modeling approaches.
Prerequisites: Attendees should either already have a thorough knowledge of Python, or have attended the Python for Everything series.
Please install the following packages ahead of the workshop:
Python 3 (https://www.continuum.io/downloads)
Packages:
Dataset: http://www.cs.cmu.edu/~dbamman/booksummaries.html