NLP with NLTK in Python

Instructors:

Natalie Ahn

Natalie is a PhD student in Public Policy, studying executive authority and institutional change in developing countries. She is currently using natural language processing to extract event data from text, and is generally interested in computational approaches to analyze sequences of events and nonlinear relationships. Natalie leads workshops in R and Python at the D-Lab.

Read more about Natalie Ahn

When & Where

Date:

Wed, November 16, 2016 - 12:00 PM to 3:00 PM

Location:

Barrows 356: D-Lab Convening Room

Description

Type:

Workshop

This workshop addresses various topics in Natural Language Processing, primarily through the use of NLTK. We'll work with a corpus of documents and learn how to identify different types of linguistic structure in the text, which can help in classifying the documents or extracting useful information from them. We'll cover tokenization, part of speech (POS) tagging, chunking of phrases, named entity recognition (NER), and dependency parsing.

Prior knowledge: Attendees should have thorough knowledge of Python. Completion of D-Lab's Python for Everything series will be sufficient.

Technology requirements: Please install Python 3 and the following packages before the workshop.

NLTK (In Bash: $ pip install nltk)
Brown corpus from NLTK (In Python: >>> nltk.download('brown'))
Movie Reviews corpus from NLTK (In Python: >>> nltk.download('movie_reviews'))
Stanford Parser: Download the Stanford Parser 3.6.0 and unzip to a location that's easy for you to find (e.g. a folder called SourceCode in your Documents folder)

Keyword:

Python, Machine Learning, Text Analysis

Details

Training Host:

D-Lab

D-lab Facilitator:

Stephanie Smith

Format Detail:

Interactive, hands-on

Intelligent research design for data intensive social science

NLP with NLTK in Python

Services

Instructors:

Natalie Ahn

Connect with us