This workshop will give users a familiarity with various automated text analysis approaches including classification (machine learning), clustering/topic modelling, and TF-IDF. The workshop will begin with a theoretical explanation of each approach and then provide an example derived from texts of the feminist movement.
Sign up for our weekly newsletter!
This is an archive of our past training offerings. We are looking to include workshops on topics not yet covered here. Is there something not currently on the list? Send us a proposal.
This workshop will introduce 3 different web platforms for exploring and mapping demographic data - SimplyMap, Social Explorer, and Policy Map. While there is some overlap between the different applications, each has its own strengths and unique features. For each platform I will provide a short demo, followed by some time to explore the data and visualization tools that each offers.
The Statistics Department is offering a two-session workshop on distributed computing using Spark. Spark is the Berkeley AmpLab's variant on Hadoop that allows for Map-Reduce calculations to be done in computer memory when possible, speeding computation.
This very basic STATA workshop will cover:
Mailing lists, or listserves, are a fascinating wealth of social scientific data that can be used to answer questions from fields as diverse as linguistics and sociology. Much of this data is publicly available through the Web. But mailing list data can also be messy and hard to work with.
Participants in this workshop will learn about some of the issues surrounding the collection of health statistics, and will also learn about authoritative sources of health statistics and data. We will look at tools that let you create custom tables of vital statistics (birth, death, etc.), disease statistics, health behavior statistics, and more.
The Statistics Department is offering a two-session workshop on distributed computing using Spark. Spark is the Berkeley AmpLab's variant on Hadoop that allows for MapReduce calculations to be done in computer memory when possible, speeding computation.
This first session provides an introduction to distributed file systems, Map Reduce, basic data processing using Spark
This 2-hour workshop introduces the basics of data analysis in R, a powerful and free open-source programming environment. You will learn the core properties of the R programming language through step-by-step examples and scaffolded exercises with data provided by the instructor.
Sociologists can ask the objects of their studies about their lived experiences, their motivations, their feelings and their aspirations for the future. This workshop teaches students how to engage in scientific research using question-based data.
This two-hour workshop will introduce address geocoding - the process of determining the geographic location of a street address. The first part of this workshop will be an introduction to the process of geocoding and the various online and desktop tools available for geocoding.
In this workshop we will learn about the basic concepts involved in georeferencing and get some hands-on experience with the process in ArcGIS, a common GIS software platform. Georeferencing involves “spatializing” scanned maps or aerial imagery so that they can be used in a GIS. We will cover the basics of topics such as projections, coordinate systems, and the theory behind georeferencing.
Learn how to map rooftops in cities and the paths and roadways to remote villages using Open Street Map and satellite images. Your training will enable you to help Ebola first responders.
* You’ll learn about this global collaboration
Learn how to map rooftops in cities and the paths and roadways to remote villages using Open Street Map and satellite images. Your training will enable you to help Ebola first responders.
* You’ll learn about this global collaboration
The goal of this lab is to understand basic concepts and functions of Geographic Information Systems (GIS) and produce geo-coded maps (displaying data) using ArcGIS. We will go over the very basics of spatial data and coordinate systems used in GIS, and then create thematic maps with demographic data representing the Bay Area from the US Census Bureau.
This 2-hour workshop introduces the basics of data analysis in R, a powerful and free open-source programming environment. You will learn the core properties of the R programming language through step-by-step examples and scaffolded exercises with data provided by the instructor.
This 2-hour workshop is geared towards applied researchers looking to use R for basic data analysis. It will introduce participants to the basics of data manipulation (notably using plyr and reshape2), regression, and regression diagnostics.
Relational database management systems (RDBMS) offer many advantages for researchers whose work involves the analysis of data stored in multiple and interrelated datasets. In this workshop, participants will learn how to use MySQL, a popular open source RDBMS, and Structured Query Language (SQL) to create, modify, and extract data from a relational database.
This workshop is designed to help you learn about Factor Analysis in a practical context. In this workshop I will be using Stata to undertake the Factor Analysis.
Plotly is a "github for data". Teams in industry and academia that work with Excel, Python, MATLAB, R & other programs use Plotly for sharing and discussing graphs and data within these teams. Free Plotly accounts can be made here: https://plot.ly/.
This workshop will focus on becoming familiar with searching and downloading the data published by the US Census Bureau. This will include understanding the data format, Census region mapping, the different surveys used in the US Census, and statistical concept of sampling error in the American Community Survey.