Feature Story
Seven Steps to Highly Available Systems
What does downtime cost you? That's the most important question to ask when measuring the impact of systems availability on your business. Downtime comes in two flavors: planned maintenance and unplanned interruption. The latter can come with a hefty price tag. Did you know that when a credit card center goes down, costs may run as high as $2.6 million per hour? When an unplanned interruption prevents customers from accessing critical applications, it's the equivalent of shutting off the lights, pulling down the shades, and locking the doors because essentially you're out of business.
Preparing your business for the highest levels of service availability starts with a keen understanding of what each application means to your business. Simply targeting a goal of 99.999 percent uptime for your IT infrastructure is not an effective barometer for availability or cost. A more effective strategy for availability planning is to look at a combination of factors that affect service levels. This approach to business availability will enable you to deliver a highly available system while also helping to reduce costs.
The Classic IT Dilemma
IT professionals who make purchase decisions and manage an infrastructure often get caught in a tug-of-war between the demand to improve service levels and the pressure to reduce service level costs. This means that IT professionals juggle two very difficult roles. The first is answering to the internal organization, namely the CIO and CFO. In this role, IT professionals are tasked with reducing or maintaining the cost of managing, sustaining, and evolving the infrastructure.
Simultaneously, they are burdened with guaranteeing that end users have access to business systems at any moment. Here, they must satisfy business managers' quality-of-service expectations for the company's customers. Striking a balance on this internal/external seesaw is tricky indeed. Those IT professionals who don't stay on budget or lack confidence in the system's ability to provide total business availability suffer through many sleepless nights. Embarking on a seven-step approach to availability positions you for success on both fronts.
Step 1: Inventory Systems by Business Impact
Not all systems require 99.999 percent uptime. By classifying systems as task-critical, business-critical, or mission-critical a categorization based on the impact of system downtime on the company's bottom-line you can identify and segregate the critical applications that require total uptime.
Consider this example: Failure of a task-critical application like a print server may be a big hassle, but the effect on the company as a whole is negligible. When a mail server goes down, however, the impact is more acute because it affects employees' ability to do business. Still, while there may be a loss of productivity, such an outage wouldn't typically have bottom-line consequences. But say you're a large ISP and your mail server goes down that's an expensive setback because email is the lifeblood of the company. Such a mission-critical application has direct impact on revenue and customer confidence, so maximum uptime is a must.
Inventorying systems by business impact serves as a baseline for negotiating acceptable service levels with business managers demanding continuous business availability. This also provides a practical way for prioritizing availability based on the criticality of the system or its ability to demonstrate fast ROI. Getting executives and business managers to jointly agree on the service levels required for each system and the cost to achieve such availability solves the IT predicament by helping you to skew investments toward mission-critical applications.
Step 2: Analyze Availability in Tiers
Having defined the business criticality, the next step is to understand what types of IT infrastructure components support each system. Sun recommends looking at availability from a tiered approach, which includes a system layer, data layer, application layer, as well as business practices for achieving required availability levels. At each tier, it's important to analyze the system's ability to diagnose a problem, its ability to recover from failure quickly, its overall reliability, and its ease of maintenance. Knowing your availability requirements at every tier and for each system helps you plan for different levels of availability based on the criticality of the application.
Step 3: Mitigate Availability Cost and Risks Architecturally
After these discovery phases have been completed, you can begin to design a system that provides the agreed-upon service levels at a cost proportional to the business criticality of the system. Architecture and availability go hand-in-hand, so putting together the right architecture based on technology dependencies, areas of strength, and gaps in your current architecture helps mitigate risks and creates a more robust availability solution.
Backed by field-proven availability expertise, Sun recommends a simple yet flexible architecture as a defense against the risks and costs associated with high availability systems. Simple strategies, such as fail-over schemas, redundancies in systems, flexible topologies, and integration ease realized via cross-family product support and general purpose storage architectures can substantially reduce infrastructure downtime without incurring too many costs.
Another factor to consider is the predictability of the system. A predictable system enables you to work your processes and people around it to achieve high service levels without taking on the cost structure of a mission-critical system. For example, if your system has a predictable need for maintenance, it's possible to schedule it at a time that is least disruptive to your business, and then proactively communicate the downtime to business stakeholders, customers, and partners.
Step 4: Reduce the Time to Value
Simply put, time to value is the amount of time it takes to draw value out of a solution after implementation. It's not enough to know what it costs to implement an infrastructure and what level of availability that implementation buys. It's also critical to figure out when that infrastructure will start paying off. For instance, a typical financial broker will miss out on $6.5 million for every hour the system isn't running. The faster you can offer services to customers and gain their confidence after implementation, the better it is for your bottom line. So, achieving high availability is not just about partnering with a vendor that provides the cheapest products for meeting service level requirements. A partner that hits service level targets in the shortest possible time will give you greater returns.
One way to keep costs at bay and reduce time to value is to choose an end-to-end solution from a vendor that can deliver a comprehensive availability infrastructure and also take responsibility for its performance. Taking the piecemeal route involves buying separate products, weaving them into the fabric of your infrastructure, and then configuring them. This takes time, is prone to glitches, and tacks on integration costs to your availability solution. It also increases the potential for finger pointing when failures occur because it's much more difficult to pinpoint the cause of system breakdown.
Step 5: Address the Complete Environment Products, Processes, People
Typically, costs are only associated with acquisition of products and implementation of highly available systems. But in reality, process and people issues often skew the economics of providing high service levels and must be factored into the equation. Delivering service levels required by system users demands more than product infrastructures that provide maximum uptime.
Increasing availability through products is really only part of the story. End-to-end service availability also involves the processes and people that support the environment. Systems failures are more likely to result from operational or human errors than software or hardware glitches. In fact, research shows that product problems cause downtime only about 20 percent of the time. So addressing the complete environment products, processes, and people is central to reducing the overall cost of business availability.
Implementing simple product infrastructures is one solution for reducing process and people errors. If your architecture calls for straightforward products, it will most likely entail a simple operational model. This in turn will require lower skill levels to operate the system. The end result can be a lower total cost of business availability with increased service levels.
Step 6: Design and Implement Comprehensive Recovery Plans
Any availability solution is incomplete without building in a comprehensive recovery plan for each system. Being able to rapidly detect failure is the first step to ensuring a speedy recovery when system failure strikes. The combination of a comprehensive recovery plan, including processes to support the plan, can help reduce the overall cost of downtime to the business. This spotlights the relevance of business processes to availability a swift response to failure reduces your overall business risk and helps ensure that your revenue stream will stay intact.
Having a complete recovery plan for every system, however, doesn't undermine the need for an enterprise-wide contingency plan for catastrophes. A solid, well-tested plan that runs through the entire business will help restore IT infrastructures quickly, minimizing interruptions for personnel, suppliers, vendors, partners, investors, and customers.
Step 7: Partner for Experience
When it comes to availability, the challenge for IT organizations is crystal clear provide a complete availability solution that meets service level requirements and also keeps costs at bay. While the destination is obvious, there's no reason to navigate the maze of business availability alone. Beyond its architectural expertise and comprehensive, highly predictable availability solution sets, Sun brings practical methodologies, real customer engagement experience, and evidence of successful implementations so IT professionals can confidently resolve their IT dilemma.
We invite you to take a look at our article titled, Get Predictable Business Availability from Sun, to understand fully what you get when you partner with Sun.
For further information, please contact innercircle@sun.com.
|