Solaris 10


A Systems Administrator's Perspective

The Texas Ranger Blog


Being a system administrator can certainly have its moments - when new gear arrives on the doorstep, the latest project goes live, or when you solve a particularly nasty problem and save the day. However, it certainly has its drawbacks as well - it is by no means an 8 to 5 job (damn pager) and being forced to upgrade a "perfectly good" system as support is dropped by a vendor sure doesn't help at all. "Quality of life" is a phrasethat you rarely hear from an SA that supports a mission critical environment.

More on the pager later, but the upgrade issue—let's have a closer look at that one. Obviously we all know the value in running the latest and greatest operating system that a vendor has to offer. New features, reliability and performance are the words we hear each time that a new OS drops. Of course new deployments are certainly candidates for the latest OS (once we certify it), but those systems that we inherited from a previous co-worker, the systems that are so sensitive no one wants to touch them, and of course the downtime itself on a production system always makes us leery of doing anything to them if we can at all help it.

Now that Red Hat is dropping level 1 support for RHEL 3, there are a bunch of people that will be faced with these upgrades in the near future. The worst part is that these probably are not going to be simple in-place upgrades either. According to Red Hat's documentation: "Red Hat generally recommends administrator-guided migrations for commercial deployments, as this method provides the highest assurances of a successful migration."1 An "administrator-guided" migration is defined as one in which you inventory your current system, back it up, and perform a fresh install of Red Hat Enterprise Linux. As most of us know, this is not as simple as it looks on paper. If you have a bunch of cookie-cutter web server boxes, then this is a no-brainer. However, in highly customized deployments, this can be a massive project in addition to the due diligence of just certifying the OS with the particular application. One little missed step, configuration error, or incorrect file can make cut-over an absolute nightmare. in the same document, Red Hat goes on to say "The Red Hat Enterprise Linux automated upgrade capability handles only the upgrade of system components distributed with the Red Hat base operating system. Automated upgrades may impact non-standards-conforming third-party applications in unspecified ways, some of which may not be found until application runtime."

This is one of the main reasons I have really grown to love Solaris' Live Upgrade feature. You essentially upgrade the box to an alternate boot drive while the system is running, then take a simple reboot, and you're running on the new version of the OS—nice and easy. And with Solaris' binary compatibility guarantee, you can be very assured of a smooth transition (of course due diligence is always recommended). You could upgrade a Solaris 8 box to Solaris 10 in this manner with no issues.

Well, if Red Hat wants customers to do a fresh install, why not take this time to look at Solaris 10 as a potential candidate for your upgrade? Solaris 10 should have no problems running on that same "commodity" hardware you have Linux installed on (actually there are more platforms supported by Solaris 10 than by RHEL4) and you can take advantage of all of the cutting edge features of Solaris 10.

This is where we get back to that pager issue.

To sum up all of the new features of Solaris 10 in a concise statement would be simply: Quality of Life. As stated before, this is a rarity, but with Solaris 10, it becomes a possibility. Before we enumerate some of the features and their impact on your day to day operations, a recent customer really drove this home with a comment. Faced with a daunting performance issue in their application, they made the decision to evaluate their application that ran on Solaris 8 with Solaris 10 and use Dtrace2 to look for bottlenecks. Within a single day, many application issues were identified that would have plagued the application regardless of the host OS. After making the necessary changes, performance improved dramatically. During a follow-up conversation with the SA, he stated—"I got to go home and have dinner with my family for the first time in months." That is quality of life defined.

To start our tour of Solaris 10 features, let's start with the aforementioned DTrace. DTrace allows a system administrator or developer to concisely answer arbitrary questions about their system and the applications running on it—all without changes to any code or rebuilding of applications. The way that DTrace works is outside the scope of this document, but suffice it to say: you will no longer look at the output of a system monitoring tool asking yourself "Why is this happening?" A simple DTrace script—either one you write yourself or one you grab from the Net3—and your question will be answered. For instance, if you saw an application that was constantly growing in memory footprint and wondered where the allocations might be coming from, a DTrace script like this one could answer that question in mere seconds:

# dtrace -n pid1234::malloc:entry'{@[ustack(10)] = count()}'
dtrace: description 'pid124105::malloc:entry' matched 2 probes
^C
              libc.so.1`malloc
              libcrypto.so.0.9.8`default_malloc_ex+0x1e
              libcrypto.so.0.9.8`CRYPTO_malloc+0x5e
              libcrypto.so.0.9.8`EVP_DigestInit_ex+0xfb
              libcrypto.so.0.9.8`HMAC_Init_ex+0x18f
              libcrypto.so.0.9.8`HMAC_Init+0x5a
              sshd`mac_compute+0x34
              sshd`packet_read_poll2+0x1a0
              sshd`packet_read_poll_seqnr+0x19
              sshd`dispatch_run+0x78
              sshd`process_buffered_input_packets+0x22
              sshd`server_loop2+0xa0
              sshd`do_authenticated2+0xe
              sshd`do_authenticated+0x3b
              sshd`main+0x1068
               10

That's it. We know where in our application (sshd(1M) in this example) the allocations are occurring, and we can go to the developers armed with this to ask them to look at their code.

Or maybe we find ourselves being curious about the types of system calls occurring on the system. One more quick DTrace invocation and:

# dtrace -n syscall:::entry'{@[probefunc] = count()}'
dtrace: description 'syscall:::entry' matched 228 probes
^C

  [SNIP]
  munmap                                                          127
  open                                                            127
  getpid                                                          139
  xstat                                                           139
  writev                                                          146
  close                                                           149
  lwp_sigmask                                                     161
  read                                                            178
  pollsys                                                         190
  write                                                           238
  ioctl                                                          3023

Now we could continue to dig into this, but it clearly shows just how simple understanding your system can be with DTrace. Of course, Red Hat will tell you that they have SystemTap coming4, but I wouldn't hold my breath. They claimed that SystemTap would be ready for prime-time by the end of 2005, but it is still considered "unstable" and has yet to integrate. DTrace has been available and production ready since November of 2003. Countless performance and aberrant system problems have been diagnosed by many customers and helped get them home in time for dinner with their family.

Another key feature of Solaris 10 (and one that Red Hat admits to having no equivalent to) is Predictive Self Healing5. Predictive Self Healing takes advantage of the telemetry data provided by modern hardware to help identify and take action in the face of potential hardware issues. Red Hat contends that modern hardware is so reliable, that this isn't really an issue, but try explaining that to the business unit a core system goes down for in the middle of the trading day.

When the Fault Management facility in Solaris 10 detects that there might be an issue, it quickly acts to take action if possible to prevent system failure. Let's look at an example from a recently installed AMD based system:

Aug 23 18:45:55 box1 fmd: [ID 441519 daemon.error] SUNW-MSG-ID:
AMD-8000-2F, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 23 18:45:55 box1 EVENT-TIME: Wed Aug 23 18:45:55 CDT 2006
Aug 23 18:45:55 box1 PLATFORM: Sun Fire V40z, CSN: XG060259910,
HOSTNAME: box1
Aug 23 18:45:55 box1 SOURCE: eft, REV: 1.16
Aug 23 18:45:55 box1 EVENT-ID: d24d0585-5e07-45cc-e0fb-b68b853f13d5
Aug 23 18:45:55 box1 DESC: The number of errors associated with
this memory module has exceeded acceptable levels.  Refer to
http://sun.com/msg/AMD-8000-2F for more information.
Aug 23 18:45:55 box1 AUTO-RESPONSE: Pages of memory associated with
this memory module are being removed from service as errors are
reported.
Aug 23 18:45:55 box1 IMPACT: Total system memory capacity will be
reduced as pages are retired.
Aug 23 18:45:55 box1 REC-ACTION: Schedule a repair procedure to
replace the affected memory module.  Use fmdump -v -u <EVENT_ID> to
identify the module.

FMA has identified a memory problem, began to take corrective action (retiring the associated pages), provided a clear explanation of the problem, and referenced a web page with more details. How could this be any simpler? Had Solaris 10 not had Predictive Self Healing, it is possible that this system could have suffered fatal failure. Instead, we just need to schedule a maintenance window to correct the issue. As systems become larger with more CPUs and memory, the probability of this happening increases. Good to know that my pager will just get an informational message, and I can continue to watch my favorite TV program while I schedule Sun Service.

Another feature of Solaris 10 that is proving widely valuable (and to which RHEL has no equal) is Solaris Containers6. Solaris Containers are virtualized application environments that allow for consolidation, increased utilization, security, and containment of "unruly" application. While Red Hat again wants us to believe that Containers have no real value over other virtualization methods, that is wholly untrue. Solaris Containers are not orthogonal to domaining or hypervisor based approaches, but merely another approach that has countless advantages for many applications. For example, there is no need to dedicate hardware to a particular container. You can use Solaris Resource Manager to control CPU resources allocated to applications, or you can let the nature of Unix timeshare take its course. The security benefits of Solaris Containers alone are enough to justify their use especially in the face of new compliance and security regulations. By running network based services in a container, you have helped eliminate a possible system compromise if the service is hacked because that process in the container will have no access to the Solaris kernel. This is just a great way to "double bag" your groceries.

Containers also provide better resource utilization for many cases over other virtualization mechanisms by the nature of the shared kernel. Pages of memory not private to a particular process will have shared mappings across the containers. So if you are running ten containers with Apache, then the binaries for all of the Apache instances can be shared, saving precious memory.

And in the "quality of life" category, containers allow for simple application regression testing. You can simply create a container on a production system (or your staging or user acceptance environment if you are lucky enough to have one) and test your application on the exact hardware and Solaris version that it will eventually run on without impacting the environment until you are ready to deploy. How's that for knowing you have a perfectly valid test scenario? And to make things even simpler, you can simply halt the container with the current version of the software, re-IP [[don't know that one]] the new container, boot it and you have an alarmingly simple back-out plan if there is an issue. Mere seconds to halt the new container and bring the old one back without the sheer terror associated with replacing files during an outage.

Next in our tour of Solaris 10 features to improve quality of life is ZFS7. The ZFS file system is a radical new approach to the whole concept creating silos in the form of disk volumes. With a typical volume-based approach, there is a need to do detailed analysis of a system to determine capacity and configuration (which of course will change over time anyway) and to manage potentially hundreds of volumes and their disk layouts. All of which is an excruciating nightmare plagued by problems and management headaches. ZFS ends the suffering.

ZFS is a 128-bit file system, virtually guaranteeing enough capacity for the foreseeable future8, but that is by no means all it brings to the table. ZFS has changed the whole notion of volumes and filesystems. With ZFS, you merely allocate large pools of disks and then create file systems on the pools. In the pool based approach, creating a file system is as easy as creating a directory. File systems will share the storage space of the pool (although you can place quotas and reservations on file systems) thus ending the nightmare of right-sizing. Space can dynamically be added (or removed) from the pool as usage dictates all while the ZFS volume is on-line. In fact, all ZFS administration is on-line.

Think about your favorite file system and volume manager tools. If you wanted to create a highly available configuration for a NFS shared home directory file system, what steps would you have to take? Let's compare that to a ZFS installation:

# zpool create -f mypool raidz2 c2d0 c3d0 c4d0 c5d0 c6d0 c7do
# zfs create mypool/home
# zfs set mountpoint=/export/home mypool/home
# zfs create mypool/home/user1
# zfs set compression=on mypool/home
# zfs set quota=10g mypool/home/user1
# zfs set sharenfs=rw mypool/home

That's it. You now have a highly available NFS shared home file system with a user that has a 10 GB quota and compression enabled. No more files to edit—nothing else to do. Everything is mounted, shared and ready to go. This is probably significantly easier than what you had in mind originally.

In addition to this dramatically simplified administration model, you get so much more. All meta-data and data is checksummed. This means that should you have a disk problem and invalid data is returned, ZFS can (in any redundant configuration) identify that the checksum is wrong and use the redundancy to return correct data to the application. In addition, it repairs the incorrect data for you so that you are still protected. All of this with no intervention on the part of the administrator. Also, because of the copy-on-write semantics of ZFS, the on disk state is always valid. There is no need for an fsck(1M) (or even the existence of) on a ZFS file system, ever.

ZFS has many performance enhancing features as well—another nightmare for administrators solved. ZFS supports dynamic striping in which it can favor faster disks through its "write anywhere" semantics. This means that unlike a conventional stripe where you are only as fast as the slowest disk in a stripe, you are as fast as the fastest disk in the pool with ZFS. Also, ZFS can auto-tune the block size on a file by file basis so mixed workloads perform well on ZFS without any additional manual tuning. ZFS also has an advanced I/O scheduler that tags each I/O with both a priority and a deadline. Higher priority I/Os get deadlines that are sooner than lower priority I/Os. So, for instance, a read will have a higher priority than a write since writes are generally asynchronous to the application whereas a read must block until the data is available. This means that a read gets to jump to the front of the line and get serviced quicker for better responsiveness.

ZFS provides unlimited constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a file system, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data. Snapshots and clones can make many burdensome tasks (like test case repeatability) a trivial non-issue.

This is barely scratching the surface of the innovative nature of ZFS and the impact it will have on administrators' lives. ZFS and pooled storage will do for disks what the virtual memory model did for memory (for those that still recall the bad old days of working with physical memory addressing).

And last, but definitely not least, in this abbreviated tour of the quality of life features of Solaris 10 is the Solaris Service Management Facility9. As systems continue to become more and more complex, the traditional separation of processes startup and run-time management is becoming almost unsupportable. Since systems administrators are tasked with ensuring that critical software is always up, running, and functioning correctly, there is a never ending array of home-grown monitoring and startup scripts as well as third party solutions for the same. In either case, it is an additional burden for an already overworked system administrator. SMF makes process life-cycle management a first class citizen and provides this as a core operating system function.

SMF defines a service model which is used for system startup, provides a single interface for management of system and application services, as well as diagnosis and restart capabilities for all services. SMF also allows for the definition of dependencies between services so on our complex systems where so many applications depend on one another for correct operation, we can better manage those relationships. In addition, the vast array of configuration file locations and semantics has been brought into a unified repository with a well defined set of interfaces for managing the services.

There are so many instances of day to day operations that SMF works to simplify. Questions such as: "Why won't service X start?", "Service X has failed, what all is impacted?", "Where are the log files for X?", and "Process X has died, how do I restart it?" are all a thing of the past. With SMF, all of these are very well defined and trivial to answer. The best illustration of this is an example:

If one wanted to list all services on a system:

# svcs -a
[SNIP]
disabled       Sep_08   svc:/network/nis/server:default
disabled       Sep_08   svc:/network/nis/client:default
disabled       Sep_08   svc:/network/rpc/keyserv:default
disabled       Sep_08   svc:/network/rpc/nisplus:default
disabled       Sep_08   svc:/network/inetd-upgrade:default
disabled       Sep_08   svc:/application/print/server:default
online         Sep_09   svc:/system/console-login:default
online         Sep_09   svc:/appliance/kit/akd:default
online         Sep_09   svc:/appliance/kit/akproxyd:default
online         Sep_09   svc:/network/ssh:default
online         Sep_09   svc:/system/system-log:default
online         Sep_09   svc:/system/dumpadm:default
online         Sep_09   svc:/system/fmd:default
online         Sep_09   svc:/network/smtp:sendmail
[SNIP]

We can see the service FMRI (Fault Managed Resource Identifier) and the state of the service. If we wanted to know more about a particular service - our good friend sshd for example:

# svcs -l ssh
fmri         svc:/network/ssh:default
name         SSH server
enabled      true
state        online
next_state   none
state_time   Wed May 10 21:44:13 2006
logfile      /var/svc/log/network-ssh:default.log
restarter    svc:/system/svc/restarter:default
contract_id  49
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/system/filesystem/autofs (online)
dependency   require_all/none svc:/network/loopback (online)
dependency   require_all/none svc:/network/physical (online)
dependency   require_all/none svc:/system/cryptosvc (online)
dependency   require_all/none svc:/system/utmp (online)
dependency   require_all/restart file://localhost/etc/ssh/sshd_config (online)

What if we had an FTP server that suddenly stopped working? We could spend time debugging it ourselves, or we could use the power of SMF:

# svcs -x ftp
svc:/network/ftp:default (FTP server)
 State: uninitialized since Wed May 10 21:44:39 2006
Reason: Restarter svc:/network/inetd:default is not running.
   See: http://sun.com/msg/SMF-8000-5H
   See: in.ftpd(1M)
   See: ftpd(1M)
Impact: This service is not running.

That was easy - a very concise and clear definition of what the problem is. FTP doesn't work because inetd is not running. Well, why isn't inetd running?

# svcs -x inetd
svc:/network/inetd:default (inetd)
 State: disabled since Wed May 10 21:44:38 2006
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: inetd(1M)
Impact: This service is not running.

Another administrator has errantly disabled inetd. If we wanted to get FTP running again, it is even easier than starting inetd and then ensuring FTP is set up correctly. With SMF understanding the dependencies, we can enable FTP and tell SMF to recursively traverse the dependency graph for FTP and ensure that all of its dependencies are online as well:

# svcadm -v enable -r ftp
svc:/network/ftp:default enabled.
svc:/network/inetd:default enabled.
svc:/network/loopback enabled.
svc:/system/filesystem/local enabled.
svc:/milestone/single-user enabled.
svc:/system/identity:node enabled.
svc:/system/filesystem/minimal enabled.
svc:/system/filesystem/usr enabled.
svc:/system/boot-archive enabled.
svc:/system/filesystem/root enabled.
svc:/system/device/local enabled.
svc:/milestone/devices enabled.
svc:/system/manifest-import enabled.
svc:/system/sysevent enabled.
svc:/milestone/name-services enabled.
svc:/milestone/sysconfig enabled.

And with that, FTP is functioning correctly again. In addition to this simple administration model, SMF brings us automated restart. When a service is defined, a contract can be created that tells the operating system to respond in the case of a fatal failure of any piece of the service. This means that if a process in a service that has a contract receives a signal and exits, it will be instantly restarted by SMF. This is not a polling process where there is a "watcher" process, this is completely kernelized and instantaneous. For example, sendmail has a contract that indicates it should be restarted in the case of failure:

# svcs -l sendmail
fmri         svc:/network/smtp:sendmail
name         sendmail SMTP mail transfer agent
enabled      true
state        online
next_state   none
state_time   Sat Sep 09 01:00:07 2006
logfile      /var/svc/log/network-smtp:sendmail.log
restarter    svc:/system/svc/restarter:default
contract_id  67
[SNIP]

So if we cause a fault, we can see SMF recover:

# pgrep -fl sendmail; date; pkill -9 sendmail; date; pgrep -fl sendmail; date
125205 /usr/lib/sendmail -Ac -q15m
125204 /usr/lib/sendmail -bd -q15m
Mon Sep 11 05:11:31 UTC 2006
Mon Sep 11 05:11:31 UTC 2006
125218 /usr/lib/sendmail -bd -q15m
125219 /usr/lib/sendmail -Ac -q15m
125216 /sbin/sh /lib/svc/method/smtp-sendmail start
Mon Sep 11 05:11:31 UTC 2006

All of this happens auto-magically for us in sub-second time. A thousand home-grown monitoring scripts just made their final trip into /dev/null. Making your critical application SMF aware is as simple as creating a service manifest for it and giving it your properties. Many popular applications have already had SMF manifests created for them and they are available on the SMF website.

Solaris 10 has numerous other features that are just as powerful and contribute to the quality of life of system administrators. This was just a brief look at a few of the more popular features that have changed the lives of countless people already.

So if you count yourself in that number that will soon be looking at a forced upgrade (really a new install) so that you are running a fully supported system, certainly Solaris 10 deserves a look. Not only are there a wealth of features that improve your ability to do your job, you get to continue running an open source operating system—www.opensolaris.org, you get a highly scalable, performing, and reliable operating system, you get the support of a company that has been servicing mission critical companies for decades, you get to continue to use the same commodity hardware you currently use, you get a free operating system that is the same if you run it on your laptop or a fully loaded Sun x460010 with 16 cores and 128GB of memory, and you can get to go home and have dinner with your family.

Although no time frame has been set yet for Solaris 10 (or Solaris 9), even Solaris 8 will continue to have full support until February of 2009. RHEL4 moves to maintenance support in March of 2008. Solaris 10 virtually guarantees you many, many more years of full support negating the stress of having to perform these daunting tasks every couple of years.

System administrator quality of life. Say it a couple of times. Sounds good, doesn't it?

References

  1. www.redhat.com/f/pdf/rhel4/UpgradeGuidelines.pdf
  2. www.opensolaris.org/os/community/dtrace/
  3. www.opensolaris.org/os/community/dtrace/dtracetoolkit/
  4. www.redhat.com/f/pdf/whitepapers/SolarisvRHEL-v1.pdf
  5. www.opensolaris.org/os/community/fm/
  6. www.opensolaris.org/os/community/zones/
  7. www.opensolaris.org/os/community/zfs/
  8. blogs.sun.com/bonwick/entry/128_bit_storage_are_you
  9. www.opensolaris.org/os/community/smf/
  10. www.sun.com/servers/x64/x4600/
  11. http://www.redhat.com/security/updates/errata/

Related

Resources
Blogs
Jonathan Schwartz
President and COO
Greg Papadopolous
Executive Vice President and CTO
Chris Ratcliffe
Director, Solaris Software

Companion Products

Sun Ultra 20 Workstation
Solaris 10 pre-loaded & includes Sun's enterprise development software
Sun Fire X2100 Server
Solaris 10 and Sun Java Enterprise System pre-loaded
Sun Fire V490 Server
Seamless upgrades with Solaris 10 and binary compatibility
Solaris 10 Support
Solaris and SunSpectrum Support & Services Plans
Java Powered