|
| United States Worldwide |
![]() |
![]() |
![]() |
|
Project Honeycomb: Intelligent Storage for Massive Data Volumes
Project Honeycomb is an innovative storage system that introduces a whole new level of intelligence and programmability to the category. To learn more about it we talked to Mike Davis, senior product manager for Honeycomb in the Storage Group. Q: Why is Honeycomb so highly anticipated? MIKE: This is a differentiated, novel approach that has been in the works for a couple of years. Honeycomb is the first storage system with application extensibility frameworks, which will allow developers to offload low-level functionality from the application server to the storage system functions such as metadata management, query, and other appropriate data services. Offloading certain functions from the application server and bringing them down nearer to the data in the storage system improves the overall price/performance of the application and improves the reliability and scalability. Q: Honeycomb has been referred to as programmable storage. Why? MIKE: From the inception of this project our aims were to provide an element of programmability and to add real intelligence to the storage system. Honeycomb features application-aware programmable storage, which is an extensible metadata system that makes it easier to manage and manipulate large-scale digital asset repositories. This system offers the flexibility of a discrete database along with improved scalability and reliability. The metadata frameworks are file type-independent and run as an embedded, native part of the system structure. The second level of programmability is what I'm most excited about the realization of an intelligent storage system. Honeycomb can manipulate data on the fly as it comes in and out of the system, and that's where it really becomes strategic for a lot of customers and application partners. There used to be a very clear distinction between what was above the wire and what was below the wire or what was executed in the application server and what was executed in the storage system. Honeycomb blurs that distinction. Q: How does Honeycomb work? MIKE: There are two elements to Honeycomb: a highly reliable Serial ATA-based storage system based on a clustered architecture and a rich extensibility framework for access and management of data. Embedded in the cluster is fully distributed high performance database technology that is aligned very well with the storage system. Honeycomb's architecture is designed to reduce or eliminate bottlenecks and single points of failure that exist in legacy architectures. Built-in parallelism improves performance and provides outstanding performance for query as well as data I/O.
Q: What customer challenges does Honeycomb address? MIKE: Honeycomb was developed to solve continuing problems in the management of large-scale repositories, and by that I mean any large collection of unstructured data that tends to grow over time. This design concentrates on reducing administrative and service costs. We know that IT budgets tend to be static though data sets are growing exponentially, and our goal is to enable a single system administrator to manage a petabyte of storage. Inexpensive storage hardware is easy to find, the trick is reducing the need to configure, repair, provision, service, migrate data, and other tasks that contribute to the true cost of the system. Large repositories need failure resilience the ability to suffer multiple types of failures without risk of losing data integrity. Through parallelism, Honeycomb can provide a level of reliability that is arguably better than what customers can get in any SAN environment. The last major unsolved challenge for these customers is the management of metadata, the rich set of attributes that describe the data and allow it to be recalled instantly. Q: How does Honeycomb manage metadata? MIKE: An extensible metadata framework is really just the first step in programmability and application awareness. That's the most obvious thing that we could quickly offload from the application server to allow it to manage higher-level application tasks. The metadata associated with these repository applications is just as important as the data. Metadata includes attributes such as patient name or retention date. It's just a small fraction of the size of the data itself, but without it the data can't be navigated. Metadata must be managed with excellent reliability, and it has to be able to scale over time as the data scales because data grows with metadata in a related fashion. In Honeycomb, metadata can be stored more reliably, more scalably, and more cost-effectively than in other storage solutions because there is no separate database for metadata it's all housed in the same system. Q: Does all of this innovation translate to a lower cost of ownership? MIKE: Definitely. There are three ways to reduce TCO in a system like this, and we've aggressively pursued them all. One is to use low-cost components, which we've done by incorporating low-cost servers, Ethernet and SATA disks. Another is to reduce complexity in management and administration. Honeycomb has no volume management, no host bus adaptors, no RAID configuration, no mount points to manage. The third is to reduce or eliminate costs associated with service. Honeycomb detects failures and self-heals transparently in a manner that eliminates the need for urgent service. Inoperable parts can be replaced on a deferred service model once every four to six months so everyone can throw away their pager. Q: What kind of customer is Honeycomb ideally suited for? MIKE: In the short term, ideal customers would be service providers or companies in the media and entertainment, life sciences, or government communities. Just about any organization tasked with managing large collections of unstructured data with in-house applications would benefit from Honeycomb.
Q: Honeycomb incubated in Sun Labs. How did the project begin? MIKE: The Honeycomb project began in spring 2003 at Sun Labs with a couple of experts in modern search technologies. They were convinced that existing storage approaches still hadn't solved the problems of large-scale data management, and they pulled together a team of exceptional talent to map out a new kind of storage system. The group included Bill Joy and Greg Papadopoulos, storage experts, engineers focused on low-cost hardware design, and new talent from parallel storage start-ups. The project was organized around a couple of themes. The team was looking for a new way to organize vast amounts of disparate information, and they thought there had to be a way to combine server and storage in a more collaborative arrangement. They thought that merging the two layers together would provide a more efficient solution overall, and they were right. Q: Do you think Sun has an advantage in developing this kind of technology? MIKE: Absolutely. Sun has visibility across the whole application deployment because we are a systems vendor. We can look across the whole chain from the human client to the tape archive and see where the bottlenecks and inefficiencies are. Much of the expertise and intellectual property we needed to build the system were already in house. Honeycomb leverages Solaris 10, Java, internal database technologies, clustering, intelligent networking, self-healing, and load-balancing technologies. No other company has a similar level of competencies, and we don't see the traditional storage vendors as being capable of graduating to this level of storage any time soon. Q: Would you walk through a typical Honeycomb transaction to illustrate the features? MIKE: A typical interaction with Honeycomb storage could begin with a clinician studying medical images on his local Picture Archiving and Communications System (PACS) workstation. Let's say he wants to view 20 thumbnails from a radiology study. To begin, the software would issue a query to Honeycomb requesting all of the images in the database with a specific patient ID, medical study, over a range of dates and under a certain doctor's care. Honeycomb would locate the images and, instead of sending gigabytes of data across the LAN to be broken down to thumbnail images at the workstation, would change the data on the fly and send 100 KB of information already in thumbnail form across the LAN. The application server wouldn't have to do additional processing or carry as much data because Honeycomb searched and manipulated the data on its own. For the PACS software vendor to be able to call these sorts of rich data services, the developer has to have an application program interface (API). An API allows him to store metadata attributes, find items, issue retrieve commands and manipulate the data. Q: When will there be a product launch? MIKE: We are actively working to develop a support ecosystem before we execute on a global launch campaign. Right now we are focusing on certain OEMs and the ISV community. We are also working directly with a handful of customers that have their own in-house applications and developers. If you would like to learn more about Honeycomb, contact your Sun account executive or send an email to: honeycomb@sun.com |
Related Resources
| ||||||