Scores, Models and Rules

A common approach to deploying predictive models in operational systems in which data arrives event by event is to implement an event loop:

  1. an event associated with an entity comes in, say a user (the entity) visits a page (the event) on a web site
  2. the features associated with the entity are retrieved, say features that describe the user and his or her behavior at the site, and the features are updated with the event
  3. the features are used as the input to a predictive model, which produces a score, say the likelihood that the user will click on an ad in a certain category
  4. an action is taken on the basis of the score, say to display an ad in a certain category if the score is above a threshold.

This may be easier to remember if you use the acronym EEFM for Event, Entity, Features, and Model. If the events are all available at the same time, the feature vectors for all the entities can be created at the same time and the features can all be scored “in batch”.

At Open Data Group, we used the Predictive Model Markup Language (PMML) to express our models in XML so that models may be built in a development environment with one application and then easily deployed in a production environment with another application. We also use an open source scoring engine (Augustus) to deploy our models in operational environments.

In practice, when actually deploying models, it is usually a bit more complicated.

First, various business rules are usually used to process the event prior to using the associated state as input to the predictive model (pre-processing). Second, the score of the model is usually processed by additional business rules (post-processing) prior to selecting an action. For example, if the event is associated with the visit of someone who is likely to be under 18, the site may choose different types of ads as part of the pre-processing. Second, as part of the post-processing, inventory rules and rules about how often to show ads (exposure rules) may be used to exclude certain ads.

Until recently, pre- and post-processing usually needed to be coded manually and couldn’t be expressed easily in PMML.

That has changed with Augustus 0.5.2. With this version of Augustus, Python code can be embedded in the PMML file to express pre- and post-processing rules, as well as to combine multiple models to produce scores, and to process multiple models and scores in a variety of different ways.

We call this augmented processing and in our experience over the past several months it has significantly simplified the deployment of predictive models into operational systems.

Here is an example from a white paper that we are writing about augmented processing using Augustus. With segmented modeling, you can use multiple models in different segments to score an event and in this way produce multiple scores. Assume that you want to use the minimum score produced in this way as the score for the event. Here is some Augustus code to do this that can be embedded in the PMML file for the segmented model:


def action():
    segmentScores = []
    for segment in segments:
        segmentScores.append(segment.score()[PREDICTEDVALUE])

    if len(segmentScores) == 0:
        finalScore = MISSING
    else:
        finalScore = min(segmentScores)

    output.xmlopen(“Event”, attrib={“number”: eventNumber})
    output.xmlfield(“Score”, finalScore)

This entry was posted in Blog. Bookmark the permalink.