Need to move multiple data files with similar names to a different folder? Perform text replace functions in a large dataset or document? Get word counts of most frequently occurring words in a text?
Regular expressions come up in a variety of contexts in data manipulation and analysis. In this workshop, we will cover use of regular expressions in Python. This workshop will give an introduction to commonly used functions and libraries in Python that employ regular expressions, including re.match, re.search, str.replace and shutil. At the end of the workshop, we will also briefly introduce the National Language Toolkit (nltk) library.
This workshop is designed to give you basic familiarity with regular expressions and commonly used functions and tools, so you'll know where to look for your specific task. As such, we'll be focusing on applying regular expressions in various commonly occurring contexts, and spend less time on the ability to craft complex queries in regular expressions.
Basic knowledge in Python is strongly recommended.
Requirements:
Please come to class with Python 2.7 installed! For nltk, see installation instructions for Mac & Windows here: http://www.nltk.org/install.html. Installation of nltk is not required but recommended if you wish to try out the last few examples in the workshop on your machine.