Best Practices for Using Service Management Facility (SMF)Videhi Mallela, June 2006 This article describes the utility and benefits of Service Management Facility (SMF), as well as the world of managing services before SMF. We also point out some parts of the Solaris Operating System that have changed significantly and show you how to accomplish typical administrative tasks using SMF. BigAdmin also offers a comprehensive guide to SMF and Predictive Self-Healing. 1. Introduction to ServicesUNIX operating systems have traditionally included a set of services: software programs not associated with any interactive user login that listen for and respond to requests to perform certain tasks, such as delivering email, responding to ftp requests, or permitting remote command execution. These traditional services were usually individual applications that executed as a single process that started at boot time and executed continuously while a system was up and running, servicing any requests that were received. Today, administrators must contend with a collection of services that has grown to such a point that it has exceeded the utility of this original model. Sun has created SMF to simplify management of these system services. SMF is a new feature of the Solaris OS that creates a supported, unified model for services and service management on each Solaris system. It is a core part of the Predictive Self-Healing technology available in the Solaris 10 OS, which provides automatic recovery from software and hardware failures as well as administrative errors. 2. The Problem: Managing Services Before SMF
Before the advent of SMF, a booting Solaris system ran the init daemon, which parsed the
Life with init, rc scripts, and inetd was less than pleasant. To change the parameters of a daemon, for example, you had to determine where the daemon was started and figure out how to change the parameters associated with the start method. Changing an rc script was fraught with peril -- one false move, and the system would fail to boot properly or even hang during booting. Testing the rc script change meant rebooting the system. Debugging problems with rc scripts meant turning on debugging options (such as adding But perhaps the most unappealing aspect of the whole mess was the hand-created interdependencies and the ramifications if a dependency failed. For example, the rc scripts had to start the proper components in the proper order, such that the network interfaces were initialized before the routing services started, and all that had to be done before a network daemon started. If one of those components failed while the system was running, the results were unpredictable and the problem difficult to debug. 3. Overview of SMFAll of these issues drove Sun to design an entirely new service management facility. SMF is part of the one-two punch of the new Solaris 10 Predictive Self-Healing feature set. (The other component is the Fault Management Architecture). SMF understands:
This understanding of dependencies also allows a new level of service functionality -- if a service fails, SMF can restart that service and all of the services that depended on it. Thus SMF can fully restore the system to a given run level, even if a core service fails. To provide all of these features, SMF needed to be significantly different from the "olden days" of rc scripts and inetd daemons. The Utility of SMF
SMF is enabled by default on the Solaris 10 OS, so exploration is as easy as booting a machine running that version of Solaris. But, be aware that even boot is affected by SMF. By default, logging during boot is now very quiet. With the new boot The Benefits of Managing Services With SMF SMF has improved several aspects of the Solaris administrative model; here are some of the most notable examples:
Despite these changes, compatibility with existing administrative practices has been preserved wherever possible. For example, most site-local and ISV-supplied rc scripts still work as usual. Using SMF to Accomplish Common Tasks SMF is a particularly notable change in the Solaris platform because it impacts the administrative model. Although we encourage you to read more about the features of SMF (see the More Information section), you may want to start by learning how to do some common system administration tasks. Enabling and Disabling Services
Releases prior to the Solaris 10 OS haven't offered a good way to permanently disable a service. The typical method used is to rename the relevant rc script to a name that won't get executed, but that change will be overlooked the next time the system is upgraded. Furthermore, inetd-based services are enabled and disabled by a totally different method -- editing a configuration file. Under SMF, both types of services can be configured using the
The last argument to
Note that Stopping, Starting, and Restarting Services
Traditionally, services have been started by an rc script run at boot, run with the argument
The
As with the enabling and disabling of services, Observing the Boot Process As mentioned in the Notable Changes section of the QuickStart guide, the boot process is much quieter by default than in previous releases of Solaris. This was done to reduce the amount of uninformative "chatter" that might obscure any real problems that might occur during boot.
Some new boot options have been added to control the verbosity of boot. One that you may find particularly useful is
{1} ok boot -m verbose
Rebooting with command: boot -m verbose
Boot device: /pci@1c,600000/scsi@2/disk@0,0:a File and args: -m verbose
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2004 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
[ network/pfil:default starting (pfil) ]
[ network/loopback:default starting (Loopback network interface) ]
[ system/filesystem/root:default starting (Root filesystem mount) ]
Oct 18 13:53:02/13: system start time was Mon Oct 18 13:52:57 2004
[ network/physical:default starting (Physical network interfaces) ]
[ system/filesystem/usr:default starting (/usr and / mounted read/write) ]
( more service messages elided )
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ network/ntp:default starting (network time protocol (NTP)) ]
[ system/utmp:default starting (utmpx monitoring) ]
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ system/console-login:default starting (Console login) ]
demobox console login: checking ufs filesystems
/dev/rdsk/c0t0d0s7: is logging.
Oct 18 13:53:14/50: system/system-log:default starting
Oct 18 13:53:14/51: network/inetd:default starting
Oct 18 13:53:14/52: system/cron:default starting
( more service messages elided )
The order of the service start messages may change from boot to boot, because SMF starts services in parallel according to their dependency relationships. If a service fails to start successfully, warning messages will be printed in addition to the start message. Here's an example where the NTP service failed to start up:
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ network/ntp:default starting (network time protocol (NTP)) ]
Oct 25 13:58:42/49 ERROR: svc:/network/ntp:default:
Method "/lib/svc/method/xntp" failed with exit status 96.
Oct 25 13:58:42 svc.startd[4]: svc:/network/ntp:default:
Method "/lib/svc/method/xntp" failed with exit status 96.
[ network/ntp:default misconfigured (see 'svcs -x' for details) ]
[ system/utmp:default starting (utmpx monitoring) ]
( more service messages elided )
The first two error messages would appear during both normal boot and verbose boot; the last one ( Discovering What's Going Wrong
The Solaris OS has not had a comprehensive place to look for problems with system services. Some solutions exist to help catch and diagnose these problems, ranging from Let's continue with the example of the NTP service failing to start up: # svcs -x svc:/network/ntp:default (Network Time Protocol (NTP).) State: maintenance since Mon Oct 18 13:58:42 2004 Reason: Start method exited with $SMF_EXIT_ERR_CONFIG. See: http://sun.com/msg/SMF-8000-KS See: ntpq(1M) See: ntpdate(1M) See: xntpd(1M) Impact: 0 services are not running.
The NTP service has been placed into maintenance mode because the startup script indicated a problem with the service's configuration. Further information about the service failure is available in the service's log file in the
Another example shows SMF's ability to track dependencies and point out problems relating to disabled services. We use the
# svcs -x -v
svc:/application/print/server:default (LP Print Service)
State: disabled since Mon Oct 18 16:17:27 2004
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: man -M /usr/share/man -s 1M lpsched
Impact: 1 service is not running:
svc:/application/print/rfc1179:default
Here, the Observing Services
In earlier versions of Solaris, the only way to see what services were available was to use the
The new
% svcs -p network/smtp:sendmail
STATE STIME FMRI
online 18:20:30 svc:/network/smtp:sendmail
18:20:30 655 sendmail
18:20:30 657 sendmail
% ps -fp 655,657
UID PID PPID C STIME TTY TIME CMD
root 655 1 0 18:20:30 ? 0:01 /usr/lib/sendmail -bd -q15m
smmsp 657 1 0 18:20:30 ? 0:00 /usr/lib/sendmail -Ac -q15m
The % svcs -d network/smtp:sendmail STATE STIME FMRI online 18:20:14 svc:/system/identity:domain online 18:20:26 svc:/network/service:default online 18:20:27 svc:/system/filesystem/local:default online 18:20:27 svc:/milestone/name-services:default online 18:20:27 svc:/system/system-log:default online 18:20:30 svc:/system/filesystem/autofs:default % svcs -D network/smtp:sendmail STATE STIME FMRI online 18:20:32 svc:/milestone/multi-user:default
We can see that sendmail requires networking, local file systems, name services, the syslog daemon, and the automount daemon to be running before it will run, and sendmail itself must be running before the multi-user milestone can be reached. The service start times (the Changing Run Levels
SMF has introduced the concept of milestones, which supplant the traditional notion of run levels. Run levels provide a basic description of the set of services running on the machine, traditionally grouped as the services necessary for one user to log in on the machine console (run level S), and for multiple users to log in to the machine (run levels 2 and 3). These system states are represented in SMF as milestones, which are stable services that represent a group of other services.
The
The boot process has been updated to be aware of milestones. In addition to the traditional
Booting to the single-user milestone (with Enabling, Disabling, and Monitoring Legacy Services
Services that are started by traditional rc scripts (referred to as legacy services) will generally continue to work as they always have. They will show up in the output of As mentioned in the Notable Changes section of the guide, rc scripts may not run at exactly the same point in boot as they had in earlier versions of Solaris. In particular, problems may arise for scripts that depend on running before certain rc scripts provided in the Solaris OS. The vast majority of scripts should continue to work without any trouble, though. Adding New Services to
The Internet services daemon, 4. ConclusionAlthough the new SMF feature is totally different from the previous boot and daemon management within the Solaris OS, it includes many welcome changes. The system boots faster and can recover from errors, such as hardware failures, that cause services to fail. SMF allows exact knowledge of the state of the system and its services, and allows easy management of those services. Overall, there is a lot to like, with only the fear of learning something new standing in the way of progress. Of course, if the new facility isn't learned, causing mayhem within a Solaris 10 system is a likely outcome. So it's time to roll up your sleeves and make sure you understand the new world before you are surprised by some new, unknown creature there. Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License. |
| ||||||||||||||||||||||||||