There is pretty good practice out there of how to build data warehouses. There is not a lot of good practice or knowledge out there about how to build statistical models over big data.
The quote above is from an interview by WashingtonExec of Open Data Group Founder Robert Grossman. He was interviewed about big data, predictive modeling and related topics. You can find the interview here.
In the interview, he briefly discusses some of the rules he has developed over the years for building predictive models over big data. One of the top three is: “Do you have an environment where you can deploy the models you build into operational systems?”
Open Data Group often uses the Augustus system for deploying models into operational systems. Augustus is open source and follows the PMML standard. It supports segmented models and pre-processing of the inputs to models and post-processing of the scores produced by models. Augustus support for pre- and post-processing was described in a recent post.
He was also asked about the disruptive nature of predictive modeling over big data.