Students will be introduced to the basics of machine learning using the caret R package. They will learn proper syntax for model construction, changing model tuning parameters, comparing models, and visualizing their outputs.
Prior knowledge:
Sign up for our weekly newsletter!
This is an archive of our past training offerings. We are looking to include workshops on topics not yet covered here. Is there something not currently on the list? Send us a proposal.
Students will be introduced to the basics of machine learning using the caret R package. They will learn proper syntax for model construction, changing model tuning parameters, comparing models, and visualizing their outputs.
Prior knowledge:
A picture paints a thousand words! But in reality, with the exception of using sample survey weights, graphing data and results is often one of the most troublesome aspects of data analysis.
Wordpress is an easy-to-use and powerful web hosting platform. Its "famous" 5-minute installation procedure really can be done in 5 minutes, and installing an attractive theme and uploading simple content doesn't take much longer. Once that's done, the possibilities are endless; we'll look at some of the plug-ins that can help you display and publicize your research data.
This workshop addresses various topics in Natural Language Processing, primarily through the use of NLTK. We first scrape and clean a long text; create a corpus in NLTK; explore tagged corpora; build basic machine learning POS taggers; finally, we construct a grammar for chunking and tree building. This workshop concentrates of linguistic issues in NLP and text analysis.
This is the second workshop in a 3-part Stata series offered at the D-Lab that includes: 1) Intro to Stata, 2) Data Analysis in Stata and 3) Stata Programming. You can register for one, two or all three of these worksh
Join the Qualitative Methods Group (QMG) for a conversation with Dr. Stephen Small about providing broader context to your case study. Dr. Small will share his thoughts about why context matters and what kind of context to provide.
In this workshop, students will learn about and get hands-on experience using SAS macros. Students will learn about various use cases, how macros help reduce repetitive code, and how they can customize SAS programs to be dynamic. We will cover syntax, creating macro variables based on data, and conditionally executing data and proc steps.
We'll learn about spatial data projection, cartography, and basic spatial analysis in Python. In particular we'll use geopandas, which spatializes pandas dataframes, and matplotlib's basemap toolkit. We'll introduce how to perform some basic GIS functions in pure Python code, how to plot point data over shapefiles and shaded relief maps, and how to plot choropleth maps in Python.
Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. The Super Learner algorithm, also called "stacking", learns the optimal combination of the base learner fits.
XML is a standard from which many encoding languages are created, and it structures much of the data on the Internet. XML is a language of the web (as xHTML); it is widely used in mapping and geographic information systems (as KML, the language of Google Maps and Google Earth); it is the basis of TEI, a set of archivals standard for the creation and preservation of electronic texts; and Databa
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities like web surveys. This class will focus on survey implementation. Designing a questionnaire is major component of survey research, but how and when to contact sample members and how to follow up with non-responders are important research considerations. This
This workshop will introduce the D3 javascript library and provide hands-on examples to create charts and maps with sample data. This is a two part workshop!
This tutorial will introduce attendees to the NIMBLE system for programming with hierarchical models in R. The tutorial will first show how to specify a hierarchical statistical model using BUGS syntax and fit that model using MCMC.
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities including surveys, data entry, training, quality control, market research, event feedback. Following the general overview, this class will focus on questionnaire design using Qualtrics. This class will cover some best practices for designing questionnaires and
This is the first workshop in a 3-part Stata series offered at the D-Lab that includes: 1) Intro to Stata, 2) Data Analysis in Stata and 3) Stata Programming.
The D-Lab is proud to present our new brown bag series, "How We Did It", a bi-weekly, hour-long presentation on specific topics from quantitative data cleaning to best practices for entering the field for participant observation. Each brown bag will feature a short presentation/demonstration on techniques followed by an informal question and answer/comment session.
In this interactive workshop we will discuss how to develop and implement an analytical strategy for a qualitative research project. We will discuss how this analytical strategy changes across the research process from coding to writing up results. Researchers at any stage in the qualitative research process are welcome. No prior knowledge/experience is required.
We'll learn how to create animated 3-d data visualizations in Python. This sort of visualization method is useful to demonstrate different perspectives of a three dimensional data set. In particular, we'll demonstrate these methods using Python's pandas, matplotlib, and pillow packages.
In this workshop we will learn about the basic concepts involved in georeferencing and get some hands-on experience with the process in ArcGIS, a common GIS software platform. Georeferencing involves “spatializing” scanned maps or aerial imagery so that they can be used in a GIS. We will cover the basics of topics such as projections, coordinate systems, and the theory behind georeferencing.
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities. Primarily, Qualtrics is designed to make web surveys easy to write, test, and implement, but the software can be used for data entry, training, quality control, evaluation, market research, pre/post-event feedback, and other uses with some creativity. This ov