Posts Tagged GLR
Health and Status Monitoring
Posted by Robert Grossman in Best Practices, Blog, analytics on November 29, 2009
Service interruptions of digital systems can inconvenience millions of people and have a significant financial impact on the provider. If the Amazon web site, or Google’s Gmail, or the Visa payments network goes down even for a few minutes, it can make front page news.
As digital systems grow larger and more complex, it can become very challenging to monitor their health and status, which is the first step in detecting potential problems, identifying the root causes, and taking appropriate preventive actions. These types of systems can contain thousands of different data feeds, data flows and processes. A problem with just one of them can interrupt payments, ads, and status updates, respectively. Often there are hourly, daily, weekly and seasonal variations in the data that complicates the detection of problems.