This is the second of five posts on History 100S: Text Analysis for Digital Humanists ans Social Scientists, a Spring 2017 course taught by Laura Nelson that exposed UC Berkeley students to cutting eduge computational text analysis techniques.  Here we focus on a project by Hist100S student Justin Germain entitled Looking Through Legacies: the Role of Identity and Profession in Biographies.  In his Jupyter notebook Justin uses text analysis "to examine how biographies preserve the legacies of individuals with different identities and backgrounds. To what extent do different social factors influence biographies?"  He uses a random selection of texts from the Gale Literature Resource Center (GLRC), limited to Twentieth-Century Americans, for his corpus.

Justin describes how his work uses methods of text analysis for biographical analysis. He writes:

"I plan on conducting three forms of analysis on this corpus. First, I aim to use lexical selection techniques in order to examine how biographers memorialize individuals of different professional backgrounds. Professions are often the most common way of distinguishing biographies of famous individuals, and therefore I have devoted one part of my analysis to them. Second, I will use topic modeling on the complete set of biographies in order to see which identities, if any, supercede others when attempting to create an individual's legacy. Finally, I will use vector-space models to explore the similarities and differences between biographies of people with different social identities."

View Justin's notebook to see how these techniques are implemented and the insights he draws from his results.

https://github.com/justin-germain/text-analysis/blob/master/JustinGermain_FinalProject.ipynb

Posts in this series:

Text Analysis for Digital Humanists and Social Scientists, Part 1: Introduction

Text Analysis for Digital Humanists and Social Scientists, Part 2:  Looking Through Legacies: the Role of Identity and Profession in Biographies

Text Analysis for Digital Humanists and Social Scientists, Part 3:  The Evolution of Modern Hip Hop

Text Analysis for Digital Humanists and Social Scientists, Part 4: An Exploratory Topical Analysis of Obama's Speeches

Text Analysis for Digital Humanists and Social Scientists, Part 5: Text Encoding and Decoding

Author: 

Laura Nelson

Laura Nelson is an Assistant Professor of Sociology at Northeastern University and author of “Computational Grounded Theory: A Methodological Framework” and a contributor to various blog forums, most recently the orgtheory.net forum on data analytics and inclusivity.