Sun Inner Circle: For Business & Technology Leaders Sun Inner Circle: For Business & Technology Leaders

Humanizing Storage

Adding identity intelligence to information lifecycle management

It's Bill Vass back with another letter. It is with a mix of sadness and excitement that I must inform you that this will be my last letter in this forum. I have recently accepted an exciting new opportunity to become the COO of Sun Microsystems Federal.

Since this is my farewell letter, I want to thank all Sun Inner Circle readers for your loyal support. Your generous sharing of insights, experiences, and concerns has helped to make my sponsorship of Sun Inner Circle especially gratifying and informative. Thank you.

Luckily, I'm leaving my CIO responsibilities in the capable hands of Bob Worrall, who steps up from his position as VP of Information Technology. Bob is a great guy, a strategic thinker, and a practiced veteran. In addition to the many accolades he has garnered during his long career, he was a recent recipient of CIO Magazine's "Ones to Watch" Award.

Without further adieu, so to speak, I turn my attention to information lifecycle management (ILM). In a word: storage. Back in my days as technical director at the Department of Defense — and while I held development and operations jobs here at Sun and other companies — storage received prominent attention. For large-scale application developers, optimizing storage for access and update speeds is always important, as is configuring the storage environment for availability.

After all, performance is infinitely bad when the storage does not work. For operations managers, the focus on consolidation, cost reductions, backups, replication, and business continuity forces a serious consideration of storage. Furthermore, as the industry moves to ILM, the focus of both of these groups will change and combine in new ways.

For most CIOs, storage is not part of their day-to-day focus, since so much of their time is focused on business change and overall IT strategy. But with the prevalence of virtualized IT infrastructure, as well as the ever-increasing importance of data security and regulatory compliance, CIOs can no longer afford not to focus on the entire lifecycle of information in the enterprise. Simply put, today's CIOs are the stewards of an enterprise's information, and as such, they are responsible for the way the organization archives, accesses, protects, and secures data.

But let's face it: Most CIOs typically care about storage when one of three scenarios transpires:

  1. The storage stops working and the CIO needs to figure out what went wrong
  2. The CIO needs to cut costs by streamlining the storage systems
  3. Legal inquires or regulatory compliance efforts force the organization to confront the difficulties of accessing stored data and controlling the flow of data

Most of the time, however, it's the storage professional who cares about the data systems.

Putting aside the third scenario for a moment, the other two concerns derive mostly from cost considerations — either the enterprise didn't invest enough time and money in its storage and now the system is broken, or the organization overspent and it needs to figure out how to reduce costs without inhibiting access. Of course, both of those issues are very real and valid.

As a result, many enterprises have adopted ILM strategies to effectively balance storage costs and data availability. Essentially, ILM involves migrating older data to less expensive (and slower to access) storage systems during a predetermined time in the data's lifecycle.

High-Profile Data Security Failures
While a sound idea and a proven practice, today's ILM is rudimentary in that it doesn't fully consider two important aspects of storage — data security and regulatory compliance. Take the Department of Veterans Affairs, which recently suffered negative publicity and untold costs from its inability to secure the personal data of 26.5 million veterans and their spouses.

Of course, the DVA is not alone: Hotels.com, Hewlett-Packard, and Ohio University have all been taken to task recently for their inability to protect personal records, as have many other enterprises. Fundamentally, what all of these incidents share are data storage deficiencies. Ultimately, the CIO is responsible for all of an organization's information, as well as how the related data is accessed, processed, downloaded, shared, and saved.

Similarly, there's the not-so-small matter of regulatory compliance. Suppose an auditor asks an enterprise for all the emails, spreadsheets, and other documents related to a years-old financial control. Even with an effective ILM strategy, that effort could involve accessing archived items stored across multiple tiers of a storage infrastructure. The recovery of the data might necessitate the painstaking search of whole volumes of disk and tape archives. Again, no small feat and hardly an inexpensive proposition.

To become more than just a cost-saver, ILM needs to address data protection and regulatory compliance issues. To do so, data systems require identity intelligence. Going forward, ILM needs to marry the identity of the people, organizations, and systems that create and access data to the stored data itself. By doing so, it becomes possible to assign access privileges and apply data protection schemes based upon a person's role, as well as to audit and retrieve data based upon the individual user.

But, as is my way, I'm getting a bit ahead of myself.

ILM: Where We Are Today
It's no secret that today's enterprises generate vastly more data than they can afford to store. ILM recognizes that the value of data changes over time — an Excel or OpenOffice spreadsheet is probably most valuable the moment it is created, less valuable in a week or a month, and possibly completely outdated a year hence. The question that organizations confront is at what time and what price does it make sense to move data to less expensive but harder to access systems. ILM helps organizations resolve this dilemma.

The same is true of ERP data, but processing and virtualization create more complex storage requirements. Relational database storage needs constant management, archiving, and tuning, or the system will grid to a halt. In addition, many systems require offsite replication for disaster recovery or complex index caching to meet performance targets.

Traditional ILM involves a tiered approach to storage, where the hierarchy of storage is based on the data value and access needs. Usually this tiered approach involves online, near-line, and offline storage media. A typical tiered storage deployment might look something like this:

Online: Primary disk storage for primary applications. The data is frequently accessed, so performance and availability are key characteristics. Fibre Channel disks are the predominant technology, and the disks typically spin continuously to improve performance and availability. In many cases, data is cached or "pinned" into memory to provide the needed performance. As a result, online storage tends to be complex and require continuous monitoring, both of which contribute to its expense.
Near-line: Secondary storage for data that is accessed infrequently but must be readily available. Near-line storage is usually disk-based, and SATA disks prove a popular choice for their low cost per megabyte of storage. Typically, the disks don't spin continuously, and the data is compressed. Also, in some cases, high performance tape and optical storage prove attractive near-line storage alternatives.
Offline: Tertiary storage for backup and long-term archiving. The data is inexpensive to store but more difficult to access. Usually, offline involves tape-based libraries, but it might instead use an optical stack of CD-ROM or DVD discs.

In addition to resolving the tiering of data, traditional ILM strategies do a rudimentary job of addressing the criticality of the data — namely how often data is backed up and what level of redundancy it requires. For example, ILM can specify that a routine log file probably doesn't need to be redundant in a RAID or backed up, but it does need to run fast and be online.

Or the ILM strategy might specify that application data should to be mirrored in a RAID and stored in cache, instead of merely backed up, which might cause the enterprise to lose a day's worth of data in the event that something unfortunate happens. Today, ILM can help provide structure around how data is backed up, mirrored, or stored within a RAID.

But there is one area where's today's flavor of ILM falls short: Data doesn't always follow a neat degradation path. If the value of data decreases and increases over time, how do you predict the changes before they happen? Or, put another way, how do you classify data based upon the way people actually use it? The answer to these questions lies in not treating all data as identical, but rather looking at who owns the data and who accesses it.

Identity-Enabled ILM: Where We Need to Go
Today's ILM strategies base the tiering decisions solely on time retention policies. Once data exists for a given amount of time, it moves to a cheaper and slower to access tier. But data doesn't always erode cleanly. Sun's vision of identity-enabled ILM seeks to examine user lifecycles to take actions on storage and data management that makes sense for the people who store and access the data. Essentially, identity-enabled ILM seeks to marry identity management and ILM capabilities.

Overlapping Capabilities

By way of illustration, let's return the Department of Veterans Affairs example. Following the loss of 26 million personal records, the White House and among others called for the greater use of encryption to fortify data security. What this call to arms missed, however, was that the data should have never been downloaded to the laptop in the first place because given enough time with static mobile storage, encryption can be broken.

Ideally, an identity-enabled ILM policy would have allowed the employee to access the data, view it, and interact with it, but not download it out of the system of record. What enterprises need is a way to control how data relates to people over time — which people can access data, and, based upon their role, when and how they can interact with it.

Improving Regulatory and Legal Compliance
Similarly, regulatory and legal compliance could be greatly simplified by identity-enabled ILM. Assume the Justice Department subpoenas an enterprise for all the data a certain employee touched during the past five years. Today, someone would need to figure out where the data was stored, find the appropriate storage systems and media, and restore whole volumes of tapes just to pull off little bits of data.

Needless to say, this is a very manual and expensive process. But, with identity-enabled ILM, it would be possible to direct a request at the storage infrastructure that says something like: Give me all of Bill Vass's data (email, correspondence and documents) for the past five years. And that would be the end of the story.

Although identity-enabled ILM is still in its infancy, and many of the technical details need to be hammered out, essentially what's required is an abstraction layer that sits on top of traditional ILM. Metatags in this layer could describe who created the data, when it was created, what level of encryption it requires, which regulatory efforts it influences, and other factors related to data protection and compliance requirements. Then a metadirectory could help people store and access data based upon their identity, roles, and privileges.

Also, it is important to point out that this world of identity-enabled ILM will require more end user and business involvement in the classification of data and how it is stored than ever before. As the above examples of confidential data loss illustrate, enterprises are already witnessing the importance of including end users and business professionals in considerations regarding data classification and storage. But those examples demonstrate reactive involvement, when what enterprises need is the proactive involvement of end users and business leaders, so that problems never occur in the first place.

But, again, I might be getting ahead of myself.

What Sun Is Doing to Get to Identity-Enabled ILM
Today, Sun has a complete storage portfolio including key ILM software solutions like Enterprise Storage Manager and Storage Archive Manager, and the underlying storage devices covering all classes of storage, from enterprise disk to access tape and everything in between.

And Sun has a full identity management suite, including Identity Manager, Access Manager, and Identity Auditor. Sun is working to take information from the identity management suite and feed it into the storage systems, and, conversely, suck information out of the storage systems and supply it to the identity management suite. As of today, this is mostly accomplished through a professional services engagement, but increasingly, Sun is seeking to bake these capabilities into both its ILM and identity management products.

 
Check out Sun's ILM and identity management solutions.

Real-World Scenarios for Identity-Enabled ILM
Let's consider a couple of examples of how enterprises might benefit from a tighter integration of Sun's ILM and identity management capabilities. Today, many enterprises automate the provisioning of access to systems and applications, but they don't have the capability to automatically provision storage. They've covered the structured data in databases, but they have no way to manage unstructured emails and documents. In other words, there's little guarantee that unstructured data follows regulated storage procedures because it is left to manual processes.

What should happen is that during the on-boarding process for a new employee, the provisioning of storage should be automated. So a new employee would be granted access to shared directories based upon his or her role, and the storage system should automatically create storage for the employee's personal files.

And then the storage system should create data protection schemes based upon the role of that user. For example, emails of rank-and-file employees are archived after five days and retained for 30 days, whereas a manager's email is retained for six months, and a VP's email might be retained for seven years.

In other words, the provisioning of storage space, access privileges, and data protection schemes need to be based upon the identity of the user — and that provisioning should have the flexibility to change as the employee shifts roles. For example, if a manager becomes promoted to director, then perhaps her email is automatically encrypted.

Or if someone leaves the company, the storage system automatically archives his data and protects those files in case of a legal inquiry. All of these scenarios provide examples of how the identity management system might push information about users to the storage systems.

Conversely, with identity-enabled ILM it will be possible for the identity management suite to pull information out of the storage system. Storage devices contain immense amounts of information about users and access patterns. An organization might want to feed data from storage into Identity Auditor to verify that data is being appropriately accessed based on user roles and data protection schemes.

Sun's Strategic Roadmap for ILM and Identity Integration
To realize the full potential of identity-enabled ILM, Sun is pursuing a "crawl, walk, and run" strategy for the integration of ILM and identity management. Today, Sun is beginning to bake identity management into its storage products, including single sign-on for storage devices, automated provisioning of storage, and automated access provisioning based on user identities. Sun's near-term initiatives include:

  • Enable Identity Auditor to analyze data from intelligent storage systems like the Sun StorageTek 5310 Compliance Archiving System to proactively identify potential compliance violations and security threats.
  • Tightly integrate identity management products with Enterprise Storage Manager to provide a conduit for automated provisioning and to pass storage data back to Identity Auditor.
  • Evolve towards the metatagging of data with identity, data protection, and compliance information through tighter Storage Access Manager integration.

In the future, Sun plans to further enhance identity-enabled ILM with federated policy management and service-oriented storage capabilities. Federated policy management will ensure that policies marry compliance and regulatory rules to storage administration policies across all storage systems. And service-oriented storage will create shared software services that are used across different storage applications to deliver shared storage functionality. But that's still a little ways down the road.

Bringing It All Together
Today, ILM has helped numerous organizations balance storages costs against data access needs. But with the increasing importance of data security and regulatory compliance, ILM based solely on time retention policies is no longer enough.

Storage systems require identity intelligence because data value doesn't always erode on a predictable path. Identity-enabled ILM seeks to examine user lifecycles in order to take actions on storage and data management that makes sense for the people who create, store, and access the data.

In the future, it is only by integrating identity management and ILM capabilities that organizations will be able to cost-effectively manage the challenges posed by data security and regulatory compliance.

Bill Vass
CIO
Sun Microsystems, Inc.
cio@sun.com