Sign up for our weekly newsletter!
This four part, interactive workshop series is your complete introduction to the capabilities of the Python language. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to collect data, process unstructured data, analyze tabular data, and automate the entire process.
Part 4: Using Python to analyze data
Students will plan, build, and execute an analytical pipeline using both text-based, schemaless data; and numeric, tabular data. Students will learn the basics of cleaning, transforming, and formatting text data, and be able to pull simple metrics from corpora. Students will summarize and analyze tabular data.
Requirements:
Please ensure the Python packages NLTK and Pandas are installed:
NLTK
$ pip install nltk
or
$ conda install nltk
Pandas
$ pip install pandas
$ conda install pandas