Basic model fitting using Spark, including linear models, GLMs, and Lasso

Instructors:

Christopher Paciorek

Chris Paciorek is an adjunct professor in the Department of Statistics, as well as the Statistical Computing Consultant in the Department's Statistical Computing Facility and a user support consultant for Berkeley Research Computing. He teaches and presents workshops on statistical computing topics, with a focus on R.

Read more about Christopher Paciorek

When & Where

Date:

Fri, November 14, 2014 - 4:00 PM to 5:15 PM

Location:

Evans 1011

Description

Type:

Workshop

The Statistics Department is offering a two-session workshop on distributed computing using Spark. Spark is the Berkeley AmpLab's variant on Hadoop that allows for Map-Reduce calculations to be done in computer memory when possible, speeding computation.

This is the second of the two sessions, and will cover basic model fitting using Spark, including linear models, GLMs, and Lasso. (Also simulation in Spark, plus time for collective discussion.)

The instrcutor will be setting up an Amazon account with free credits that participants can use to start up their own virtual Linux cluster to try Spark on. If you want to get an account, please fill out this form.

Materials will be available at https://github.com/berkeley-scf/spark-workshop-2014 (under construction).

I will assume no prior knowledge. Some familiarity with Python will be helpful as we'll run Spark via Python, but I think you'll get something out of it even if you're not familiar with Python syntax.

No need to register in advance!

Details

Training Host:

D-Lab

Intelligent research design for data intensive social science

Basic model fitting using Spark, including linear models, GLMs, and Lasso

Services

Instructors:

Christopher Paciorek

Connect with us