|
Live system upgrades.Sun Fire servers stay available during planned maintenance--typically 80 percent of total downtime. 28.Jan.03--Many businesses believe that planned system downtime is unavoidable due to the necessity of hardware and software upgrades or administrative tasks. Recent analyses have shown that planned downtime is roughly 80 percent of total system downtime. Sun Microsystems, Inc. has proven that a significant amount of planned downtime can now be avoided with Sun's unique Live System Upgrade capabilities, a suite of technologies that includes full hardware redundancy, the powerful Solaris Operating Environment (OE), sophisticated resource use technology, and rock-solid, out-of-the-box integrity. Live System Upgrades combined with Sun's Sun Fire servers and services like Sun Remote Services (SRS) Net Connect's, deliver virtually unmatched availability in UNIX® systems by minimizing both planned and unplanned downtime. In plain terms, this means you can better serve customers and partners, fulfill service level agreements, and satisfy users.
Planned Versus UnplannedUnplanned downtime is caused by unexpected failures, and the resulting assessment and resolution of the causes. Planned downtime, on the other hand, is the time required for upgrades, administration, and other maintenance, that typically requires taking systems offline. "In the past, unplanned downtime had the biggest effect on availability," says Bruce Batten, Sun senior product marketing manager. "But availability now means much more than just reliability." Batten explains that as hardware and software have become more reliable, unplanned downtime now represents only about 20 percent of total downtime. Recognizing this shift, Sun has engineered its midframe and high-end servers so administrators can perform all routine maintenance without removing a system from production. Dynamic reconfiguration is the mechanism that enables the Sun Fire servers to remain operational during upgrades. When a component is removed from service or in the unlikely event of failure, the remaining, redundant components can pick up the slack without a break in service or degradation in performance. The Sun Fire system redundant components include the following:
This provides administrators with a remarkable degree of flexibility to upgrade systems without taking them offline. Companies are often reluctant, for instance, to upgrade CPUs for better performance because they cannot afford a service interruption. Fortunately, because the Sun Fire 3800 through 15K servers support mixed-speed CPUs, administrators can add new processors or replace existing ones in a domain without having to reboot the domain. The Solaris OE Keeps It GoingSun's Solaris OE also helps reduce planned downtime with its powerful and comprehensive system management features. The UNIX File System snapshot copy feature allows administrators to complete file system backups from a snapshot copy rather than from the actual file system. Because it takes only seconds to make the snapshot copy, planned downtime related to backups is dramatically reduced. The Solaris OE also includes several software provisioning tools--such as the Solaris JumpStart software, the Solaris Flash software, and the Sun Management Center Change Manager--that help you sidestep downtime. The Solaris Live Upgrade software even lets you install a new version of the Solaris OE onto a running system, virtually eliminating the service outage typically associated with an operating system upgrade. When a Solaris Live Upgrade software-enabled installation is complete, you simply reboot to the new version of the Solaris OE. Keeping software patches up-to-date is crucial to system availability, and using the Solaris Patch Manager Tool is a powerful way to reduce both planned and unplanned downtime. The Solaris Patch Manager Tool determines which patches need to be installed, manages their secure download, and applies them in the proper order. In addition to minimizing the planned downtime devoted to this type of administration, using the Solaris Patch Manager Tool reduces the potential for unplanned downtime by helping to keep your systems stable and secure. Dynamic DuoMuch of the high-availability functionality found in Sun servers is enabled and supported by a pair of powerful resource use technologies: Dynamic Reconfiguration (DR) and Dynamic System Domains (DSD). Part of the Solaris OE, DR lets you install, reconfigure, and remove server hardware--such as CPUs, memory, and other subsystems--while the system is running. What's more, rebooting is not required after making any of these changes. This live manageability is crucial to delivering high service levels for business-critical applications. DR is the enabling technology behind DSD, which lets administrators divide a single system into multiple fault- and security-isolated logical servers (domains), each running their own copy of the Solaris OE. In addition to enhancing resource efficiency, DSD improves server availability. If a component fails or is removed in one domain, other domains continue to function uninterrupted. Preparing For the UnexpectedAlthough it accounts for only about one-fifth of total downtime, unplanned downtime is at best inconvenient, and at worst crippling. "Compared with planned downtime, there's no doubt it raises the blood pressure of administrators disproportionately," says Batten. Sun minimizes unplanned downtime with a seven-step approach that boosts reliability from the very first day you deploy a Sun Fire server. The approach:
Before they become a part of your enterprise, Sun Fire servers are rigorously engineered, assembled, and tested. Sun designs systems with full hardware redundancy and employs Sun's Six Sigma practice to continuously improve the quality of components, suppliers, and manufacturing. All major Sun Fire system components face extensive testing, with a 200-hour burn-in period, and the entire system undergoes a 48-hour test. Sun employs end-to-end error detection and correction on all data paths within a Sun Fire system. This helps intercept and autocorrect errors before they cause corrupted or lost data, or otherwise affect system stability. In the unlikely event of failure, the end-to-end process quickly identifies the faulty component. Another autocorrecting mechanism is the Sun Fire system's Proactive Self Diagnostics, or Field-Replaceable Unit Identification (FRUID), feature. "It's like the status monitor that an astronaut wears," says Batten. "FRUID literally takes the temperature of key components, checks the power, checks for errors, and so on." The Sun Management Center software and Sun's Automatic System Recovery procedures also help dramatically reduce unplanned downtime. In addition to the built in reliability and serviceability features of Sun Fire servers, Sun Services offers several availability services to help decrease downtime, manage maintenance and speed meantime to repair. SRS Net Connect is a portfolio of Web-delivered services that features self-monitoring functionality and event detection and notification, as well as availability, trend, configuration and patch reporting. Also offered in an upcoming release of SRS Net Connect is RAS System Analysis, which will provide detailed configuration, patch analysis and recommendations for remediation. As the world's largest UNIX vendor, Sun offers a customer-focused Availability Services Suite to help keep your data center up and running. Programs such as High Availability Service Packs, the SunReady Availability Assessment and the Sun Fire Application Readiness Service work with you and your organization to reduce planned and unplanned downtime. "Many businesses believe that system downtime is unavoidable due to the regular cycles of hardware and software upgrades, administrative tasks, or outright component failure," says Batten. "At Sun, we're working to make all downtime avoidable." | |||||||||||||