In this post, I discuss some of the different options available when building analytic models. For the purposes here, a good short definition of analytics is to view analytics as using data to make predictions. The term predictive analytics is applied (appropriately enough) to this type of analytics. A longer definition is to view predictive analytics as building statistically valid models from data that can be used to make predictions about future events, to take actions, and to make decisions.

In this post, the point of view is that a business owner of a problem in a company that requires a model and is considering whether to build the model in-house, outsource the model to a vendor providing analytic services, or simply to give up on building a model and produce a report instead. I don’t recommend the latter option, but unfortunately, in practice, it is all too common.
Broadly speaking, from a business owner’s point of view, there are several phases required to build a model for a new project. The process looks a bit different from the modeler’s point of view. It is also a bit simpler if the same model has been built before and all that is required is to update the model using new data. Here are the basic steps required to build a model from the business owner’s point of view.
- Working with IT to obtain all the data required for the project and making it available to the modeler.
- Answering questions from the modeler about the data.
- Agreeing upon the output of the model.
- Reviewing the first model with the modeler.
- Reviewing the second and subsequent models with the modeler.
- Working with IT to deploy the model.
All steps except for Step 1 and Step 6 are collaborative between the business owner and the modeler. At the beginning of many projects, Step 3 looks obvious. It turns out that it is often not so obvious until the project is towards the end, the data has been cleaned, and the deployment well underway. One way to understood why this is so is because often one doesn’t have a good understanding of the most appropriate output of a model until the data has been cleaned and there is a good understanding of how the model will be deployed in operational systems.
Let’s look at this same process now from the viewpoint of the modeler. To simplify, the following steps are required:
- Waiting for the data.
- Cleaning the data.
- Asking the business owner questions about the data.
- Agreeing upon the output of the model.
- Developing a set of features for the model.
- Estimating the parameters of the model.
- Building a measure to evaluate the model.
- Evaluating the model using the measure.
- Developing post-processing rules for the scores produced by the model.
- Repeating the steps above for the second and subsequent versions of the model until everyone is happy, or there is no more time or funding left.
- Deploying the model.
Building a new model requires completing all the steps above. Generally, a series of models (version 1 of the model, version 2 of the model, etc.) are produced and reviewed by the business owner and the modeler (Step 10). The more time available for Step 10, the better the quality of the model.
To understand these steps a bit better, it might be helpful to review post about the SAMS Methodology. The SAMS methodology explains how to think of models in terms of the Scores they produce, the Actions these enable, the Measures used to evaluate the actions, and whether these actions support a targeted Strategy or not.
Sometimes a model has been built before and only some of these steps need to be repeated. For example, refreshing a model only require completing steps 6 and 8 for a series of models. Rebuilding a model usually only requires repeating Steps 5, 6, 8 and 9 for a series of models.
Sometimes, the data is supplied in a standard format (for example, it is provided by a third party) and the deployment uses a standard format (for example, only a list is required that contains a list of names and corresponding offers). In this case, after a model has been built once, all that is required when a business owner supplies new data is to perform Steps 6 and 8. Call this a standard model. Standard models are substantially less work to build then models that require completing all the steps above. These more labor intensive models are often called custom models.
Most requests for models fit into some standard categories of models. For example, models that predict whether a prospect will respond to an offer (response models), whether a customer will remain a customer (attrition models), whether a customer will keep current with their payments (credit model), whether a transaction is valid or fraudulent (fraud models), etc.
Sometimes, models that don’t fit into these familiar categories of models are built. Call these new types of models. A new type of model also requires that the modeler develop new types of features, new types of measures for evaluating the models, etc. New types of custom models are the most labor intensive to build.
In practice, it usually takes four to six months or longer to build a custom model, once the data has arrived. As the size and complexity of the data grows, each of the steps usually requires more time.