Since it founded in 2002, Open Data has developed several innovative technologies to improve developing and deploying predictive models over big data. Here is a list of some Open Data Group White Papers and Technical Reports describing some of these innovations.
- During the period 2008-2010, Open Data worked with the Open Cloud Consortium to develop the MalStone Benchmark, which we have found very useful when developing analytics over big data. You can find out more about MalStone from this technical report and from the MalStone web site. The code to generate the data for MalStone is open source and can be found on the web site.
- In 2006, Open Data introduced a technology called cubes of models for modeling large data sets and complex high volume data streams. We have continued to refine the technology since then. With this approach, a separate statistical or data mining model is automatically estimated for each cell in a multi-dimensional data cube. Open Data has used this technology for a building statistical and data quality models in a variety of application areas, including financial services and cyber defense. You can find more details in this technical report. The Augustus application for statistical modeling supports cubes of models.
- From its founding in 2002, Open Data Group has utilized best practices in predictive modeling over big data, including using ensembles of models, hierarchical modeling, tree-based models, and standards such as PMML for efficiently deploying models. Here is a white paper, written over ten years ago, that provides a brief high level description of these ideas. Unfortunately, these methodologies are still not as widely deployed as they should be and often what passes for predictive modeling is simply the use of an ad hoc rule by a software engineer that is neither empirically derived nor statistically validated, which are two criteria commonly used when building predictive models.