Earlier this week, Robert Grossman and Collin Bennett from Open Data Group gave a lecture as part of a tutorial at the SC 12 Conference in Salt Lake City about big data. They described some of the ways of building predictive models over big data using Hadoop streams and Hadoop’s implementation of MapReduce.
They illustrated the lecture with an example of building a predictive model over data provided by the City of Chicago about CTA busses using Amazon’s Elastic MapReduce.
You can find some of the materials for the lecture on the web page tutorials.opendatagroup.com. The materials also contain links to some best practices for deploying analytics in operational systems using PMML and PMML-compliant scoring engines, such as Augustus.
On October 23, 2012, Robert Grossman and Collin Bennett from Open Data Group will give a tutorial at the O’Reilly Strata Conference in New York City on “Best Practices for Building and Deploying Predictive Models over Big Data.”
The slides and some related materials can be downloaded from tutorials.opendatagroup.com.
The 3.5 hour tutorial consists of 12 modules:
- Building Predictive Models – EDA and Building Features
- Case Study: MalStone
- Working with Multiple Models: Ensembles and Segments
- Case Study: CTA
- Deploying Predictive Models Using PMML-based Scoring Engines
- Three Ways to Build Models over Hadoop Using R
- Case Study: Building Trees over Big Data
- Improving the Impact of a Model In Operations – The SAMS Methodology
- Case Study: AdReady
- Quantifying the Lift of a Predictive Models and Improving It
- Case Study: Matsu
Open Data Group helped pioneer some of the technology behind topics 2, 3, 6, 7, 8 and 9. For example, you can follow the links to learn more about the MalStone Benchmark, the Multiple Model component of the DMG’s PMML standard, and Project Matsu, which uses MapReduce to process and analyze images.
If you are at the Strata Conference, please stop by to say hello.