BigAdmin System Administration Portal

HowTos

Archived from Sun's Dot-Com Builder Web Site
This content is archived from Sun's Dot-Com Builder Web Site.
These are the Best Practices > How To's archives.

Some of these pages may contain links that are no longer available. If you see these, you can report it through the Suggestions link and we will remove the link and leave the name (for reference).

Back to Dot-Com Builder How-Tos Archive

Things to Consider When Selecting a Web Traffic Analysis Tool
December 12, 2000

Introduction

If you have a Web site to build or manage, you have an audience to cater to. How are you going to measure their activity, and how will you know if you've succeeded? How much traffic analysis is required on your site? Thomas Jensen, Audience Analyst for Sun, offers advice on selecting and implementing a traffic analysis tool.

Traffic analysis solutions range from expensive, complex, enterprise-level software applications with database back ends to freeware tools that generate HTML reports from raw Web server logs. Another fundamental distinction concerns data collection methods, i.e., whether data is captured from Web server log files or by sniffing packets on the network.

Before You Buy: Things to Know

1) Find out your hardware requirements.
Traffic analysis applications generally require a lot of CPU power and large amounts of disk storage. Examine hardware requirements for CPUs and storage, and ask critical questions of the vendor. (In terms of CPUs, sniffers may be slightly less resource-intensive than log-based tools.)

And with free Web traffic analysis tools, remember that only the software is free. Enterprise-scale sites have to devote enterprise-scale hardware and disk space to traffic analysis.

2) Review the purpose and business goals of your site.
As Jensen says, "If you don't know your business objectives, your stats won't make much sense. But then, neither will your Web site." To select and implement traffic analysis tools, technologists need to be familiar with the organization and its processes for managing the site.

Involve the decision makers in your organization, and find out what kind of decisions they make on a daily/weekly basis. Instead of asking them about "data," ask questions like, "What do you base your decisions on?" or "What are we trying to accomplish with this site?" Then decide what traffic analysis reports are needed.

3) Appoint an administrator for the project.
You need a warm body to interpret the traffic analysis reports. Don't just trust the reports; someone has to pull together information from different reports. This analyst can also educate the report user community.

4) Be aware of the limitations of traffic analysis packages.
For all tools, user tracking is limited because reports are based on HTTP traffic, and HTTP is a content-centric rather than user-centric protocol. Don't expect to track each individual user unless you are using login or password authentication. You can track visitors by browser agent and/or IP address, but neither of these is a unique identifier.

Cookies might be the answer, but check your privacy policy. And don't even think of pushing cookies to people unless you give them something in return, Jensen advises. It's simply a matter of courtesy. Supermarkets don't stick a bar code on shoppers' clothes when they enter, but they might give discounts to registered, card-carrying customers.

5) Don't expect true numbers from traffic analysis tools.
"It's not like counting nails in a box," explains Jensen. Inaccuracies are inevitable when you are recording and counting millions of requests per day, but the real problems are with interpreting the numbers. For instance, page views, visits, and even users are arbitrarily defined. Is a page view merely a request for an HTML page, or do other pages count as well? And if so, how do you count a servlet that generates a login page? How do you account for redirects, which can lead to serious overcounting? Frames may also skew the page-view count.

More significantly, a good amount of traffic to your site (easily one-third of the total) does not come from the real flesh-and-blood users you are interested in analyzing. The most notable sources of such "noise" traffic are search-engine spiders, but many other automated clients are also out there. If you have anything besides unrestricted, plain HTML pages on your site, these clients are prone to breaking and getting stuck in request loops. This traffic not only inflates your tracking total, but can also skew it severely.

Jensen has yet to discover a tool that identifies such unwanted (for stats purposes) clients by their behavior, rather than their IP address or user agent. "Common sense in interpreting the numbers still goes a longer way than relentless pursuit of the perfect configuration," he says.

6) Decide what to count and what to filter out.
Filtering is key to conserving storage space and can protect your reports from some of the irrelevant traffic discussed above. But keep in mind that for some purposes, it is essential to know the complete load on the server.

To filter with log-parsing tools, you can presort or preprocess logs using a Perl script or simply the 'grep' command. If you use a packet sniffer, you rely on the software for all filtering.

Criteria for Selection

1) The most obvious -- reliability:
Although support is provided with the enterprise-level tracking products, it is unlikely to be at the same level as support for mission-critical software. Downtime should be expected, but not data loss.

With a log-based tool, the Web server logs can be reprocessed/reloaded -- but how lengthy and arduous is the recovery process? Packet-sniffing traffic analysis tools are responsible for the data capture, so at least that function must have 100% availability. Also, tools must be able to store data, if it cannot be written to the database at a given time.

2) User interface and usability:
You can usually check out the online demos of various traffic analysis packages to get a sense of the HTML GUI. A friendly basic GUI is essential for distributing traffic reports to a wider audience within your company. Light users, who do not have Web traffic reporting as one of their core competencies, cannot be expected to take advantage of sophisticated ad hoc reporting capabilities.

3) Ease of configuring reports:
What choices are possible regarding the time period, granularity, and scope of reports? For example, can reports be run for subsections as well as for the whole site?

Also, is it necessary to specify all the variables every time you select a report? This may be too challenging for certain users.

How does the report track visitors, and can you select different options? For example, will the package count all visitors, or only external visitors?

4) Ad hoc reporting:
Does the tool provide these capabilities? Any package with a database back end should have a query interface -- but does it require users to write SQL or Perl, or does it have a GUI? This obviously has an impact on the training required and the size of the user community.

Ad hoc reporting may not be available for smaller, cheaper tools, but you can write log preprocessing scripts in order to meet some of your reporting needs.

5) Flexibility:
A traffic analysis package has to accommodate everything on your Web site, including not only HTML but other static and/or dynamic content formats as well. How does the tool identify viewable content, i.e., page views, and how can this identification be customized for your site?

Useful Optional Features

Some features may not be standard in all packages but can be useful, depending on your needs. Examples include:

Ability to aggregate content in reports, instead of just reporting by individual URLs. Another method is to truncate URLs in order to get directory-level reports, but this solution sacrifices the drill-down capability.

Good click-stream analysis, such as a report on the top 10 paths through the site or incoming and outgoing clicks to and from a single page. Note that calculating click-stream reports can be quite resource intensive, so ask for only what you really need.

Report user management by a designated superuser or administrator. For example, users can get accounts with an inbox to regularly receive relevant reports.

Advice on Implementation

Before you implement a new tool, think carefully about picking "ground zero." Choose a meaningful release date for the new reports, such as the beginning of a fiscal year or a fiscal quarter, and make sure it's attainable.

Find out what information you already have. Do you have old Web server logs or reports? Think about how you will compare data over time. You want to ensure comparability over time -- CEOs probably want to see growth rates, not page-view counts. Any time you change to a new tool, it invariably means a shift in your measurement methodology, so you need to know how the new numbers relate to the old.

Don't just give people the stats they're already getting. For example, people may be accustomed to reports counting hits or total users, and they may be making inferences about usage and popularity that would be more appropriately based on page views or daily unique visitors. Start a new methodology to discover your staff's needs, and then configure the reports to meet those needs.

The real value of these tools is not in measuring traffic but in providing people with information. As Jensen says, "It's easy enough to implement your Web stats software and walk away, but that's not doing your company any favors."

BigAdmin