Back to Dot-Com Builder How-Tos Archive
Storing XML Data
by Todd Sundsted Before you decide how to store your XML data, ask yourself why you want to store your XML data. You might think the answer to this question is self-evident. After all, storage is storage, right? Not quite. There are many different ways to store XML data, and there are several ways to use the stored XML data. Asking questions like the following will help you select a storage strategy that meets your needs:
The answers to these questions may reveal that your XML data storage solution doesn't have to be fancy -- a file system might suffice. Consider XML data from which Web pages are generated via an XSLT transformation. If the XML data is static, a file system storage solution may provide perfectly adequate storage, as it does for HTML pages. "Well, that's obvious!" you might object, "But what about XML data used in a B2B transaction? You can't simply store that on disk." Perhaps you can. The storage mechanism you select depends on how you will use the XML data. So again, ask yourself why you need to store the B2B XML data. If your answer is that you need it for long-term bulk persistence instead of to support queries, then you'll require a different type of storage technology. Consider this scenario that involves an XML router. A router accepts messages from one entity that is intended for delivery to another entity, and typically guarantees one-time delivery. To provide guaranteed delivery, even in the event of its failure, the router must persistently store the messages it receives. However, since a router does not need to query the stored data, the persistence mechanism can be very simple. The file system is an extremely simple but completely adequate solution. Before you immerse yourself further in the technology, take a look at the different types of XML information and then revisit the question of how XML information can be used. The Fundamental Distinction: Data versus Document The most fundamental distinction that can be drawn between types of XML content is whether the XML content is to be used as data or as a document. An XML document is information generated by people for use by people. It consists of both text and markup language. This is XML content being used as it was originally intended to be used. Since XML documents contain information intended for our consumption, the structure of the information tends to be loose and irregular, much like our writing and speech. XML data, on the other hand, is information generated by machines for machines. This is XML content performing the role of platform-independent data-exchange message. The information source is seldom static, and the XML content itself is highly regular and structured, as is appropriate for information coming out of a structured information repository or report-generation tool. How Will the Data Be Used? The essential difference between document and data addresses both the origin of XML content as well as its intended use -- presentation versus data transport. Let's consider its use in more detail. Ask yourself the following questions:
You may not need to retrieve stored content as an XML document if you're planning to use traditional tools to generate reports from the data once it's stored. Likewise, data may be written as one kind of XML and read as another -- XML orders from customers may go in, and XML commands for the shipping system may go out. The second question should lead you to think up a few more:
These questions are important, because storing XML content as rows in a relational database may result in the retrieved content looking different from the original. What about queries? Does your application need to query the information in an XML document? If so, you won't be able to store the XML document as an opaque chunk of bytes (a "BLOB" in database terminology). What kind of queries will you need to support? XPath-like queries? Relational queries? Considering queries is important because different storage technologies vary greatly in their support for queries, as you'll learn next. Before you can select a storage solution, you need answers to questions like the previous ones. Asking these questions -- and answering them -- is a step you should not skip. Your Options XML data is used in many ways. Luckily, there are nearly as many ways to store it -- each with associated benefits and costs. What follows is a summary of options currently available, along with the benefits provided and the pitfalls associated with each. It's not a product comparison, but it should give you enough information to make an educated choice:
Storage as BLOBs
BLOBs are easy to use -- what you put in is what you get back -- but they don't permit access to the details of the data stored. If you plan to store XML as a BLOB, store it as a file in the file system. In most cases this solution will perform better than storing the data as a BLOB in a relational database.
Tables in Relational Databases
While storing XML as a BLOB in a relational database has its drawbacks, storing XML information in the tables and rows of a relational database is a solution that has stood the test of time. Relational database technology is mature, stable, and ubiquitous; and there are toolkits that automate much of the work. This approach is not without its problems, however. There is a fundamental mismatch in the way information is modeled in XML and relational databases. XML models data as a tree or hierarchy of elements. Relationships between data objects are typically indicated by containment. Relational databases model data as tables. Relationships between data objects are captured in foreign keys. In spite of the mismatch between tables and trees, relational databases work well, particularly when applications need to access the stored data in formats other than XML. Remember, XML is a relatively new data format. For every application that understands XML, there are thousands that do not. However, many of these older applications know how to access the data in a relational database, making the database -- not XML -- the data exchange technology.
Object Databases
There is a downside, however. Object databases, while useful in specific applications, never replaced relational databases -- though vendors, analysts, and other pundits predicted they would. In fact, the entire object database market is basically a niche market today. The reason object databases failed to catch on when other object-isms did has to do with the fact that relational databases make it easier to look at data in many different ways. Object databases tend to orient themselves toward a limited number of views of the data -- a potential liability when making business decisions.
Native XML Databases
While this level of native support for XML might seem like a boon, it can be limiting. Like object databases, native XML databases have trouble presenting stored data in many different, sometimes ad hoc, forms. The hierarchical model imposed by native XML databases locks applications into a single view of the data, unless you incorporate transformation technology such as XSLT. Though not as mature as relational and object databases, the future looks bright for XML databases. Analysts predict the market will double or even triple in the next couple years. However, it's still not clear whether native XML databases will match relational databases in terms of flexibility. Current products seem most useful when used to store semistructured data such as documents. Given the role relational databases play in enterprise computing, you will have to contend with storing XML data in a relational database at some point. The next section addresses a couple of issues you will have to confront. Mismatch Between XML and Relational Models As tough a problem as the mismatch between the XML model and the relational model creates, it's not difficult to overcome. The industry has a long history of dealing with this problem in the form of a very similar clash between object persistence and relational databases.
Mapping Relationships
When selecting a storage technology for your XML data, you have a number of options to choose from, each with benefits and costs. Before you select a particular technology, make sure you have asked the necessary questions about how you intend to use the stored XML data and why you need to store it. Resources
|
| ||