Text Analysis FUN!damentals: Unsupervised Approaches, Supervised Methods

Instructors:

Ben Gebre-Medhin

Ben (www.gebre-medhin.com) is a PhD Candiate in sociology. His interests are in the subfields of organization, the professions, and higher education. His dissertation focuses on elite universities and the MOOC movement. For part of that project it uses topic models and text analysis tools to document changes in discourse within an organizational or professional field over time.

Read more about Ben Gebre-Medhin

When & Where

Date:

Fri, August 18, 2017 - 12:00 PM to 4:00 PM

Location:

Barrows 356: Convening Room

Description

Type:

Workshop

Part 3: Unsupervised Approaches (12-2pm)

This hands on workshop builds on part 2 by introducing the basics of Python's scikit-learn package to implement unsupervised text analysis methods. This workshop will cover a) vectorization and Document Term Matrices, b) weighting (tf-idf), and c) uncovering patterns using topic modeling.

Prior knowledge: A basic familiarity with Python is required if you wish to follow along with the tutorial.

This workshop is one of a four-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications. Please register for each day seperately.

Part 4: Supervised Methods (2-4pm)

In this workshop we will cover two main supervised text analysis methods, the dictionary method, and supervised classification. We will use list comprehension to implement the dictionary method, using sentiment analysis as our example. Using the Python library scikit-learn, we will also implement a few supervised classification techniques, including Naive Bayes and Support Vector Machines. Specific skills covered include a) measuring themes in text using dictionaries, b) feature selection, c) Support Vector Machines, d) Naive Bayes, e) cross-validation, and f) feature importance.

Prior knowledge: Basic familiarity with Python is required if you wish to follow along with the tutorial. Completion of D-Lab's Python FUN!damentals workshop series will be sufficient.

This workshop is one of a four-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications.

Keyword:

Software Tools, Python, Text Analysis

Details

Training Host:

D-Lab

D-lab Facilitator:

Ben Gebre-Medhin

Format Detail:

Interactive, hands-on

Intelligent research design for data intensive social science

Text Analysis FUN!damentals: Unsupervised Approaches, Supervised Methods

Services

Instructors:

Ben Gebre-Medhin

Connect with us