Part 3: Unsupervised Approaches (12-2pm)
This hands on workshop builds on part 2 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.
Prior knowledge: A basic familiarity with Python is required if you wish to follow along with the tutorial.
This workshop is one of a four-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications. Please register for each day seperately.
Part 4: Supervised Methods (2-4pm)
In this workshop we will cover two main supervised text analysis methods, the dictionary method, and supervised classification. We will use list comprehension to implement the dictionary method, using sentiment analysis as our example. Using the Python library scikit-learn, we will also implement a few supervised classification techniques, including Naive Bayes and Support Vector Machines. Specific skills covered include a) measuring themes in text using dictionaries, b) feature selection, c) Support Vector Machines, d) Naive Bayes, e) cross-validation, and f) feature importance.
Prior knowledge: Basic familiarity with Python is required if you wish to follow along with the tutorial. Completion of D-Lab's Python FUN!damentals workshop series will be sufficient.
This workshop is one of a four-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications.