The focus of this workshop is machine learning using the h2o and h2oEnsemble R packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it is easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster).
The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine.
H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others. The ability to create stacked ensembles, or "super learners," from a collection of supervised base learners is provided via the h2oEnsemble R package.
R scripts with H2O machine learning code examples will be demoed live and made available on GitHub for attendees to follow along on their laptops.
Prior knowledge: Familiarity with R is recommended. Some basic familiarity with topics in machine learning is also recommended. Examples topics are: classification, regression, training set, test set, cross-validation, etc.