BigAdmin System Administration Portal
Feature Article
Print-friendly VersionPrint-friendly Version

Best Practices for Using Service Management Facility (SMF)

Videhi Mallela, June 2006

This article describes the utility and benefits of Service Management Facility (SMF), as well as the world of managing services before SMF. We also point out some parts of the Solaris Operating System that have changed significantly and show you how to accomplish typical administrative tasks using SMF. BigAdmin also offers a comprehensive guide to SMF and Predictive Self-Healing.


1. Introduction to Services

UNIX operating systems have traditionally included a set of services: software programs not associated with any interactive user login that listen for and respond to requests to perform certain tasks, such as delivering email, responding to ftp requests, or permitting remote command execution. These traditional services were usually individual applications that executed as a single process that started at boot time and executed continuously while a system was up and running, servicing any requests that were received.

Today, administrators must contend with a collection of services that has grown to such a point that it has exceeded the utility of this original model. Sun has created SMF to simplify management of these system services. SMF is a new feature of the Solaris OS that creates a supported, unified model for services and service management on each Solaris system. It is a core part of the Predictive Self-Healing technology available in the Solaris 10 OS, which provides automatic recovery from software and hardware failures as well as administrative errors.


2. The Problem: Managing Services Before SMF

Before the advent of SMF, a booting Solaris system ran the init daemon, which parsed the /etc/inittab file, which fired off a series of run control (rc) scripts, depending on the run level the system was trying to attain. The default run level was "3", multi-user mode with networking. The inetd daemon spawned other daemons, as necessary, to provide network services. And all was good. Or was it?

Life with init, rc scripts, and inetd was less than pleasant. To change the parameters of a daemon, for example, you had to determine where the daemon was started and figure out how to change the parameters associated with the start method. Changing an rc script was fraught with peril -- one false move, and the system would fail to boot properly or even hang during booting. Testing the rc script change meant rebooting the system. Debugging problems with rc scripts meant turning on debugging options (such as adding set -x to the script) and rebooting, perhaps multiple times as fixes were tried. Consider also that the system booted inefficiently because it marched through the rc scripts sequentially, even if some of the activities would have worked correctly if done in parallel.

But perhaps the most unappealing aspect of the whole mess was the hand-created interdependencies and the ramifications if a dependency failed. For example, the rc scripts had to start the proper components in the proper order, such that the network interfaces were initialized before the routing services started, and all that had to be done before a network daemon started. If one of those components failed while the system was running, the results were unpredictable and the problem difficult to debug.


3. Overview of SMF

All of these issues drove Sun to design an entirely new service management facility. SMF is part of the one-two punch of the new Solaris 10 Predictive Self-Healing feature set. (The other component is the Fault Management Architecture). SMF understands:

  • Daemons (or services) and what to do with them.
  • How to start, stop, and monitor these services.
  • The relations of the services to one other (which allows it to boot the operating system to a designated run level much more efficiently).

This understanding of dependencies also allows a new level of service functionality -- if a service fails, SMF can restart that service and all of the services that depended on it. Thus SMF can fully restore the system to a given run level, even if a core service fails. To provide all of these features, SMF needed to be significantly different from the "olden days" of rc scripts and inetd daemons.

The Utility of SMF

SMF is enabled by default on the Solaris 10 OS, so exploration is as easy as booting a machine running that version of Solaris. But, be aware that even boot is affected by SMF. By default, logging during boot is now very quiet. With the new boot -m verbose option, SMF outputs a line for each service that it's starting, which can help reassure those new to the Solaris 10 OS that everything is working. Gone, however, are the days of grepping through /var/adm/messages in hopes of finding an error that it is actually labeled with the name of the service that is having a problem. Rather, each service has its own persistent log file. These are in /var/svc/log for the most part, with pre-single-user milestone service logs in /etc/svc/volatile. The system reaches the "login" prompt much more quickly now, as only the services depended on by login need to start before login is started. This is just one example of the advantages of SMF.

The Benefits of Managing Services With SMF

SMF has improved several aspects of the Solaris administrative model; here are some of the most notable examples:

  • Services are represented as first-class objects that can be viewed (using the new svcs(1) command) and managed (using svcadm(1M) and svccfg(1M)).
  • Failed services are automatically restarted in dependency order, whether they failed as the result of administrator error or a software bug, or they were affected by an uncorrectable hardware error.
  • More information is available about misconfigured or misbehaving services, including an explanation of why a service isn't running (using svcs -x), as well as individual, persistent log files for each service.
  • Problems during the boot process are easier to debug, as boot verbosity can be controlled, service startup messages are logged, and console access is provided more reliably during startup failures.
  • Snapshots of service configurations are taken automatically, making it easier to backup, restore, and undo changes to services.
  • Services can be enabled and disabled using a supported tool (svcadm(1M)), allowing the changes to persist across upgrades and patches.
  • Administrators can securely delegate tasks to non-root users more easily, including the ability to configure, start, stop, or restart services (as described in the smf_security(5) man page).
  • Large systems boot faster by starting services in parallel according to their dependencies.

Despite these changes, compatibility with existing administrative practices has been preserved wherever possible. For example, most site-local and ISV-supplied rc scripts still work as usual.

Using SMF to Accomplish Common Tasks

SMF is a particularly notable change in the Solaris platform because it impacts the administrative model. Although we encourage you to read more about the features of SMF (see the More Information section), you may want to start by learning how to do some common system administration tasks.

Enabling and Disabling Services

Releases prior to the Solaris 10 OS haven't offered a good way to permanently disable a service. The typical method used is to rename the relevant rc script to a name that won't get executed, but that change will be overlooked the next time the system is upgraded. Furthermore, inetd-based services are enabled and disabled by a totally different method -- editing a configuration file. Under SMF, both types of services can be configured using the svcadm(1M) command, and the changes will persist if the machine is upgraded. Here's a comparison of how to enable and disable some services.

Table 1: Comparison of Methods for Enabling and Disabling Services
Old Method
SMF Method
mv /etc/rc2.d/S75cron /etc/rc2.d/x.S75cron
svcadm disable system/cron:default
edit /etc/inet/inetd.conf, uncomment the finger line
svcadm enable network/finger:default
 

The last argument to svcadm in these examples is the Fault Managed Resource Identifier (FMRI) of the service.

Note that svcadm should only be used for SMF services -- legacy rc script-controlled services work the same as in past releases.

Stopping, Starting, and Restarting Services

Traditionally, services have been started by an rc script run at boot, run with the argument start. Some rc scripts provide a stop option, and a few also allow restart. In SMF, these tasks are all accomplished with the svcadm(1M) command, as shown in the following table.

Table 2: Comparison of Methods for Stopping, Starting, and Restarting Services
Old Method
SMF Method
/etc/init.d/sshd stop
svcadm disable -t network/ssh:default
/etc/init.d/sshd start
svcadm enable -t network/ssh:default
/etc/init.d/sshd stop; /etc/init.d/sshd start
svcadm restart network/ssh:default
kill -HUP `cat /var/run/sshd.pid`
svcadm refresh network/ssh:default
 

The -t option to svcadm enable and svcadm disable indicates that the requested action should be temporary -- it will not affect whether the service is started the next time that the system boots. This is in contrast to the Enabling and Disabling Services example.

As with the enabling and disabling of services, svcadm should not be used to control rc script-controlled services; they continue to work the same as in past releases.

Observing the Boot Process

As mentioned in the Notable Changes section of the QuickStart guide, the boot process is much quieter by default than in previous releases of Solaris. This was done to reduce the amount of uninformative "chatter" that might obscure any real problems that might occur during boot.

Some new boot options have been added to control the verbosity of boot. One that you may find particularly useful is -m verbose, which prints a line of information when each service attempts to start up. This is similar to the default boot mode for some other UNIX-based and UNIX-like operating systems. Verbose boot looks like this:

{1} ok boot -m verbose

Rebooting with command: boot -m verbose
Boot device: /pci@1c,600000/scsi@2/disk@0,0:a  File and args: -m verbose
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2004 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
[ network/pfil:default starting (pfil) ]
[ network/loopback:default starting (Loopback network interface) ]
[ system/filesystem/root:default starting (Root filesystem mount) ]
Oct 18 13:53:02/13: system start time was Mon Oct 18 13:52:57 2004
[ network/physical:default starting (Physical network interfaces) ]
[ system/filesystem/usr:default starting (/usr and / mounted read/write) ]
    ( more service messages elided )
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ network/ntp:default starting (network time protocol (NTP)) ]
[ system/utmp:default starting (utmpx monitoring) ]
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ system/console-login:default starting (Console login) ]

demobox console login: checking ufs filesystems
/dev/rdsk/c0t0d0s7: is logging.
Oct 18 13:53:14/50: system/system-log:default starting
Oct 18 13:53:14/51: network/inetd:default starting
Oct 18 13:53:14/52: system/cron:default starting
    ( more service messages elided )

The order of the service start messages may change from boot to boot, because SMF starts services in parallel according to their dependency relationships.

If a service fails to start successfully, warning messages will be printed in addition to the start message. Here's an example where the NTP service failed to start up:

[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ network/ntp:default starting (network time protocol (NTP)) ]
Oct 25 13:58:42/49 ERROR: svc:/network/ntp:default: 
     Method "/lib/svc/method/xntp" failed with exit status 96.
Oct 25 13:58:42 svc.startd[4]: svc:/network/ntp:default: 
     Method "/lib/svc/method/xntp" failed with exit status 96.
[ network/ntp:default misconfigured (see 'svcs -x' for details) ]
[ system/utmp:default starting (utmpx monitoring) ]
    ( more service messages elided )

The first two error messages would appear during both normal boot and verbose boot; the last one (network/ntp:default misconfigured ...) would only appear during verbose boot.

Discovering What's Going Wrong

The Solaris OS has not had a comprehensive place to look for problems with system services. Some solutions exist to help catch and diagnose these problems, ranging from coreadm(1M) logging to site-specific monitoring scripts to comprehensive products such as Sun Cluster. The new svcs(1) command includes an "explain" option (svcs -x), which prints out detailed, solution-driven messages about the services that are not running. svcs -x shows when and why the service failed, provides pointers to more information about the problem, and lists what other services are affected by this problem.

Let's continue with the example of the NTP service failing to start up:

# svcs -x
svc:/network/ntp:default (Network Time Protocol (NTP).)
 State: maintenance since Mon Oct 18 13:58:42 2004
Reason: Start method exited with $SMF_EXIT_ERR_CONFIG.
   See: http://sun.com/msg/SMF-8000-KS
   See: ntpq(1M)
   See: ntpdate(1M)
   See: xntpd(1M)
Impact: 0 services are not running.

The NTP service has been placed into maintenance mode because the startup script indicated a problem with the service's configuration. Further information about the service failure is available in the service's log file in the /var/svc/log directory (or the /etc/svc/volatile directory). The log file name is based off the short form of the FMRI, with instances of "/" replaced by "-". So the log file for the svc:/network/ntp:default service is /var/svc/log/network-ntp:default.log. This log file quickly led to the conclusion that the NTP daemon's configuration file, /etc/inet/ntp.conf, had been removed.

Another example shows SMF's ability to track dependencies and point out problems relating to disabled services. We use the -v option in this example to see the list of impacted services.

# svcs -x -v
svc:/application/print/server:default (LP Print Service)
 State: disabled since Mon Oct 18 16:17:27 2004
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: man -M /usr/share/man -s 1M lpsched
Impact: 1 service is not running:
        svc:/application/print/rfc1179:default

Here, the application/print/server:default service has been explicitly disabled, but another service that depended on it (application/print/rfc1179:default) was not disabled. So the disabling of the first service has kept the second one from running.

Observing Services

In earlier versions of Solaris, the only way to see what services were available was to use the ps(1) command and list all the active processes on the system, and then look around for the names of processes that matched the names of service applications. Unfortunately, it's very difficult to track things this way since most systems have many processes, and new services are introduced with each new version of Solaris and when other software packages are added. To further complicate the situation, many modern services are no longer implemented as single processes. Some services are implemented as collections of processes, multithreaded processes, or both simultaneously.

The new svcs(1) command makes it much easier to observe the status of a system service. The -p option shows all the processes associated with a service:

% svcs -p network/smtp:sendmail
STATE          STIME    FMRI
online         18:20:30 svc:/network/smtp:sendmail
               18:20:30      655 sendmail
               18:20:30      657 sendmail

% ps -fp 655,657
     UID   PID  PPID   C    STIME TTY   TIME CMD
    root   655     1   0 18:20:30 ?     0:01 /usr/lib/sendmail -bd -q15m
   smmsp   657     1   0 18:20:30 ?     0:00 /usr/lib/sendmail -Ac -q15m

The -d option shows what other services this service depends on, and the -D option shows what other services depend on this service:

% svcs -d network/smtp:sendmail
STATE          STIME    FMRI
online         18:20:14 svc:/system/identity:domain
online         18:20:26 svc:/network/service:default
online         18:20:27 svc:/system/filesystem/local:default
online         18:20:27 svc:/milestone/name-services:default
online         18:20:27 svc:/system/system-log:default
online         18:20:30 svc:/system/filesystem/autofs:default
% svcs -D network/smtp:sendmail
STATE          STIME    FMRI
online         18:20:32 svc:/milestone/multi-user:default

We can see that sendmail requires networking, local file systems, name services, the syslog daemon, and the automount daemon to be running before it will run, and sendmail itself must be running before the multi-user milestone can be reached. The service start times (the STIME column) illustrate that these dependencies have been followed.

Changing Run Levels

SMF has introduced the concept of milestones, which supplant the traditional notion of run levels. Run levels provide a basic description of the set of services running on the machine, traditionally grouped as the services necessary for one user to log in on the machine console (run level S), and for multiple users to log in to the machine (run levels 2 and 3). These system states are represented in SMF as milestones, which are stable services that represent a group of other services. svcs -d can be used to see what services must be running before a milestone is reached.

svcadm(1M) is now the preferred method of setting the system's default run level. This is done with the milestone subcommand and the FMRI of a valid milestone, as seen in Table 3.

Table 3: Comparison of Methods for Changing Run Levels
Old Method
SMF Method
edit /etc/inittab
svcadm milestone -d milestone/single-user:default
 

The -d option indicates that the default milestone should be set to the named FMRI. Without the -d option, svcadm milestone transitions the system to the named milestone immediately.

The boot process has been updated to be aware of milestones. In addition to the traditional boot -s (boot into single-user mode), you now have boot -m milestone=<milestone> to boot to the named milestone. <milestone> can be single-user, multi-user, or multi-user-server, as well as the special milestones all (all enabled services online) and none (no services at all). The none milestone can be very useful in repairing systems that have failures early in the boot process.

Booting to the single-user milestone (with -m milestone=single-user) is slightly different than using the old boot -s. When the system is explicitly booted to a milestone, exiting the console administrative shell will not transition the system to multi-user mode, as boot -s does. To move to multi-user mode after boot -m milestone=single-user, use the command svcadm milestone milestone/multi-user-server:default.

Enabling, Disabling, and Monitoring Legacy Services

Services that are started by traditional rc scripts (referred to as legacy services) will generally continue to work as they always have. They will show up in the output of svcs(1), with an FMRI based on the path name of their rc script, but they cannot be controlled by svcadm(1M). They should be stopped and started by running the rc script directly.

As mentioned in the Notable Changes section of the guide, rc scripts may not run at exactly the same point in boot as they had in earlier versions of Solaris. In particular, problems may arise for scripts that depend on running before certain rc scripts provided in the Solaris OS. The vast majority of scripts should continue to work without any trouble, though.

Adding New Services to inetd.conf

The Internet services daemon, inetd(1M), has been rewritten as part of SMF. It stores all of its configuration data in the SMF database, rather than /etc/inet/inetd.conf, allowing the SMF tools to be used to control and observe inetd-based services. Most inetd-based services that ship with the Solaris OS will no longer have entries in inetd.conf. To provide compatibility for services which haven't converted to SMF, entries can still be added to inetd.conf using the same syntax as always, and the new inetconv(1M) command will convert the new services to SMF services. inetconv should always be run after editing /etc/inet/inetd.conf; it can be run without any arguments.


4. Conclusion

Although the new SMF feature is totally different from the previous boot and daemon management within the Solaris OS, it includes many welcome changes. The system boots faster and can recover from errors, such as hardware failures, that cause services to fail. SMF allows exact knowledge of the state of the system and its services, and allows easy management of those services. Overall, there is a lot to like, with only the fear of learning something new standing in the way of progress. Of course, if the new facility isn't learned, causing mayhem within a Solaris 10 system is a likely outcome. So it's time to roll up your sleeves and make sure you understand the new world before you are surprised by some new, unknown creature there.


Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.


BigAdmin