<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Data Group &#187; baseline models</title>
	<atom:link href="http://opendatagroup.com/tag/baseline-models/feed/" rel="self" type="application/rss+xml" />
	<link>http://opendatagroup.com</link>
	<description>Open Data Group&#039;s Home Page and Blog</description>
	<lastBuildDate>Sat, 04 Sep 2010 00:51:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Summer Augustus Release</title>
		<link>http://opendatagroup.com/2010/09/02/640/</link>
		<comments>http://opendatagroup.com/2010/09/02/640/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 21:10:36 +0000</pubDate>
		<dc:creator>jennarussell</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[analytic projects]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Augustus]]></category>
		<category><![CDATA[baseline models]]></category>
		<category><![CDATA[change detection models]]></category>
		<category><![CDATA[deploying analytic models]]></category>
		<category><![CDATA[open source analytics]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[Scoring Engine]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=640</guid>
		<description><![CDATA[August 2010
Open Data released a new version, 0.4.2 of Augustus.  It is written in python and is free and readily available at our project site.  The latest version of our PMML compliant coring engine includes feature and performance enhancements.

A new model consumer, Ruleset, has been added to the scoring engine.
Details regarding the model specification can [...]]]></description>
			<content:encoded><![CDATA[<h3>August 2010</h3>
<p>Open Data released a new version, 0.4.2 of Augustus.  It is written in python and is free and readily available at our <a href="http://code.google.com/p/augustus/" target="_blank">project site</a>.  The latest version of our PMML compliant coring engine includes feature and performance enhancements.<span id="more-640"></span></p>
<ul>
<li>A new model consumer, Ruleset, has been added to the scoring engine.</li>
<li>Details regarding the model specification can be found <a href="http://www.dmg.org/v4-0/RuleSet.html" target="_blank">here</a>.</li>
<li>Optional garbage collection in the model consumer has been disabled, driving significant performance improvement.</li>
<li>Improved data reads from Standard I/O</li>
<li>Metadata collection improved.</li>
</ul>
<p>At the <a href="http://code.google.com/p/augustus/" target="_blank">project site</a> you will also find a new regression tests.  Regression tests are now easier to run and cover Augustus components, PMML models and statistical scenarios.</p>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/09/02/640/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Augustus 4.1.1 Available</title>
		<link>http://opendatagroup.com/2010/04/19/augustus-4-1-1-available/</link>
		<comments>http://opendatagroup.com/2010/04/19/augustus-4-1-1-available/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 21:45:26 +0000</pubDate>
		<dc:creator>jennarussell</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Augustus]]></category>
		<category><![CDATA[baseline models]]></category>
		<category><![CDATA[change detection models]]></category>
		<category><![CDATA[deploying analytic models]]></category>
		<category><![CDATA[open source analytics]]></category>
		<category><![CDATA[PMML]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[Scoring Engine]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=594</guid>
		<description><![CDATA[April 2010
Open Data Group’s open source scoring engine has been updated with additional functions and features.   It is also compliant with the most recent PMML standard.
Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and [...]]]></description>
			<content:encoded><![CDATA[<h3>April 2010</h3>
<p>Open Data Group’s open source scoring engine has been updated with additional functions and features.   It is also compliant with the most recent PMML standard.</p>
<p><strong>Augustus</strong> is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.<span id="more-594"></span></p>
<p>There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.</p>
<p>Augustus is written in Python and is freely available under the GNU General Public License, version 2.  For support and additional resource, visit the project page at: <a title="Augustus project home " href="http://code.google.com/p/augustus/" target="_blank">http://code.google.com/augustus/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/04/19/augustus-4-1-1-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Health and Status Monitoring</title>
		<link>http://opendatagroup.com/2009/11/29/health-and-status-monitoring-2/</link>
		<comments>http://opendatagroup.com/2009/11/29/health-and-status-monitoring-2/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 01:31:58 +0000</pubDate>
		<dc:creator>Robert Grossman</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Blog]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Augustus]]></category>
		<category><![CDATA[baseline models]]></category>
		<category><![CDATA[change detection models]]></category>
		<category><![CDATA[CUSUM]]></category>
		<category><![CDATA[data quality]]></category>
		<category><![CDATA[GLR]]></category>
		<category><![CDATA[health and status monitoring]]></category>
		<category><![CDATA[Shewhart]]></category>
		<category><![CDATA[statistical quality control]]></category>

		<guid isPermaLink="false">http://blog.opendatagroup.com/?p=219</guid>
		<description><![CDATA[Service interruptions of digital systems can inconvenience millions of people and have a significant financial impact on the provider.  If the Amazon web site, or Google&#8217;s Gmail, or the Visa payments network goes down even for a few minutes, it can make front page news.
As digital systems grow larger and more complex, it can [...]]]></description>
			<content:encoded><![CDATA[<p>Service interruptions of digital systems can inconvenience millions of people and have a significant financial impact on the provider.  If the Amazon web site, or Google&#8217;s Gmail, or the Visa payments network goes down even for a few minutes, it can make front page news.</p>
<p>As digital systems grow larger and more complex, it can become very challenging to monitor their health and status, which is the first step in detecting potential problems, identifying the root causes, and taking appropriate preventive actions.  These types of systems can contain thousands of different data feeds, data flows and processes.  A problem with just one of them can interrupt payments, ads, and status updates, respectively.  Often there are hourly, daily, weekly and seasonal variations in the data that complicates the detection of problems.</p>
<p><span id="more-219"></span></p>
<p>One way to gain some insight into this problem is to look at the origins in the 1920&#8217;s of statistical quality control.   Walter Andrew Shewhart (1891 &#8211; 1967) was an engineer at the Western Electric Company, which manufactured hardware for the Bell Telephone Company, from 1918-1924.  From 1925 to 1956 he was a member of the Technical Staff of Bell Telephone Company [ASQ].</p>
<p><a href="http://opendatagroup.files.wordpress.com/2009/11/shewhart-cover.png"><img class="alignleft size-medium wp-image-227" title="Shewhart - Statistical Method from the Viewpoint of Quality Control" src="http://opendatagroup.files.wordpress.com/2009/11/shewhart-cover.png?w=181" alt="" width="181" height="300" /></a></p>
<p>One of the problems that concerned him was identifying potential problems in factory assembly lines.  For example, the dimensions and weight of metal parts that are sampled from an assembly can be recorded.  He distinguished between two types of variations in these measurements:</p>
<ul>
<li>Common cause of variation (or noise) occurs as a normal part of the manufacturing process.</li>
<li>A special cause of variation is not part of the normal manufacturing process, but represents a problem.</li>
</ul>
<p>One of the goals of <em>statistical quality control</em> is to distinguish between these two types of variation and to quickly identify special causes of variation.</p>
<p>Shewhart introduction control charts as a tool for distinguishing between common and special causes of variation.  A control chart had a central line and upper and lower control limits.  When the measurement exceeded either the upper or lower control limits, it was considered a potential special cause of variation and investigated.  Usually, the upper and lower control limits were three standard deviations above and and below the mean.</p>
<p>As anyone who has investigated potential data quality problems knows, identifying roots causes of potential problems is not easy and Shewhart also introduced a four step approach to these types of investigations that became known as the Shewhart Cycle, the Deming Cycle or the Plan-Do-Check-Act Cycle:</p>
<ul>
<li><strong>Plan.</strong> Identify an opportunity or potential problem and make a plan for improving it or changing it.</li>
<li><strong>Do.</strong> Implement the change on a small scale and collect the appropriate data.</li>
<li><strong>Check.</strong> Use data to analyze statistically the results of the change and determine whether it made a difference.</li>
<li><strong>Act.</strong> If the change was successful, implement it on a wider scale and continuously monitor and improve your results. If the change did not work, begin the cycle again.</li>
</ul>
<p>These same ideas are still used today as the basis for <strong>health and monitoring systems</strong>.  Well designed digital systems these days are designed from the ground up so that appropriate log data is produced.  Instead of a single assembly line producing physical items, there are thousands or millions of digital processes producing (nearly) continuous digital data.  Often this data is available through an http interface and is continually collected.</p>
<div id="attachment_226" class="wp-caption alignleft" style="width: 310px"><a href="http://opendatagroup.files.wordpress.com/2009/11/generic-dashboard0.png"><img class="size-medium wp-image-226" title="Augustus Baseline Dashboard" src="http://opendatagroup.files.wordpress.com/2009/11/generic-dashboard0.png?w=300" alt="" width="300" height="297" /></a><p class="wp-caption-text">This is dashboard from the open source Augustus system for health and status monitoring.</p></div>
<p>Instead of a control chart, a change detection model is used, such as a CUSUM or GLR statistical model [Poor].  Instead of building a single model, a model for each cell in a multi-dimensional cube of models is built [Bugajski].   Instead of looking at the charts each day, an online dash board is used that is at the hub of an operations center.</p>
<p>Baseline and change detection models for each cell in a multi-dimensional data cube of models can be built easily using the open source <a href="http://augustus.googlecode.com">Augustus</a> system.</p>
<p><strong>References</strong></p>
<p>[ASQ] ASQ, The History of Quality &#8211; Overview, retrieved from www.asq.org.</p>
<p>[Bugajski] Joseph Bugajski, Chris Curry, Robert L. Grossman, David Locke and Steve Vejcik, Data Quality Models for High Volume Transaction Streams: A Case Study, Proceedings of the Second Workshop on Data Mining Case Studies and Success Stories, ACM 2007</p>
<p>[Poor] H. Vincent Poor and Olympia Hadjiliadi, Quickest Detection, Cambridge University Press, 2008.</p>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2009/11/29/health-and-status-monitoring-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Comprehensive Change Detection Suite: Free &amp; Available</title>
		<link>http://opendatagroup.com/2009/10/15/comprehensive-change-detection-suite/</link>
		<comments>http://opendatagroup.com/2009/10/15/comprehensive-change-detection-suite/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 17:27:42 +0000</pubDate>
		<dc:creator>jennarussell</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[analytic infrastructure]]></category>
		<category><![CDATA[analytic projects]]></category>
		<category><![CDATA[baseline models]]></category>
		<category><![CDATA[open source analytics]]></category>
		<category><![CDATA[PMML]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://odg.opendatagroup.net/?p=285</guid>
		<description><![CDATA[October 2009
Open Data Group has launched a changed detection project on Google Code, http://code.google.com/p/change-detection/.
This is an introduction and demonstration of using open source software and the Data Mining Group&#8217;s Predictive Model Markup Language (PMML) standard to perform data analytics.  Specifically, we show how using multiple Baseline models over segments can be used to detect of [...]]]></description>
			<content:encoded><![CDATA[<h3>October 2009</h3>
<p>Open Data Group has launched a changed detection project on Google Code, <a href="http://code.google.com/p/change-detection/" target="_blank">http://code.google.com/p/change-detection/</a>.</p>
<p>This is an introduction and demonstration of using open source software and the Data Mining Group&#8217;s Predictive Model Markup Language (PMML) standard to perform data analytics.  Specifically, we show how using multiple Baseline models over segments can be used to detect of anomalous behavior.</p>
<p>Case studies, sample data sets, and access to open source analytic suite of software are available.</p>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2009/10/15/comprehensive-change-detection-suite/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
