Recovering Sun Java System Calendar Server Databases
Calendar Server Engineering Team, May 2007
This article discusses how to diagnose Sun Java System Calendar Server calendar database corruption and describes the best ways to recover a database in various situations. The example at the end offers a cookbook list of steps for recovering a database after it has been corrupted.
This article is intended for Calendar Server administrators who manage Calendar Server 6.x
database operations.
What to Do When a Database Is Corrupted
Database Corruption Basics
No one can guarantee corruption-free databases, but you can minimize data loss and
operational downtime. Closely monitoring the database and Calendar Server is important to detecting
corruption early. Frequent and complete backups are the key to recovering from corruption
once it is found.
Two levels of corruption are possible in a calendar database:
Application level
Offending entries in one or more database files prevent the server from running when they are operated upon.
Database level
Corruptions in the Berkeley database pages cause various problems. One common symptom is looping while running the csdb check utility. Another common symptom is an error message such as one of the following:
"illegal page type or format"
or
"page 97895 doesn't exist, create flag not set."
Administrative Utility Basics
This document mentions two different types of command-line utilities:
Calendar Server Administrative Utility
csdb: Calendar database utility. It includes command-line options like rebuild, list, and check.
csdb operates at the Calendar Server application level.
Berkeley Database Supporting Utilities
db_recover: Recovery utility.
db_verify: Verification utility.
These utilities operate directly on the Berkeley database files.
Monitoring Calendar Log Files
You should inspect the log files on a regular basis for ALERT,
CRITICAL, ERROR, and WARNING level errors. If you find errors, examine the events
for possible problems in the operation of Calendar Server.
More specifically, to detect calendar database corruption, inspect the calendar log files for
errors such as the following:
In the http.log files:
[29/Nov/*2005:14:50:27* +0000] usxsl069 cshttpd[8140]: General Error:
caldb: Error with calendar database: PANIC: fatal region error
detected; run recovery
In the store.log files:
20051129150204 - VerifyDB: failures detected -
/opt/iplanet/hotbackup/hotbackup_20051129/ics50alarms.db
20051129150204 - VerifyDB: check - /tmp/csstored.ics50alarms.db.out
for possible error messages.
What Data Needs to Be Gathered When a Database Is Corrupted
This section summarizes the types of data you should gather that will help
Sun investigate a database corruption if none of the Calendar Server database files
is recoverable.
When the log files indicate that there is a database corruption, stop the
services, if possible, and copy the entire contents of the /var/opt/SUNWics5/csdb directory
and the /var/opt/SUNWics5/logs files into a save directory.
Note the exact time of the corruption of the error in the
log files. Gather information close to the time of the error in all
the calendar log files available. Note any activities that may have occurred on
the server during that time: nightly scripted jobs, cron jobs, any backups, automated purges,
data being pushed or pulled from an external source, and so on.
Note the users who might have been logged in and active during
the time of the database corruption. This may require correlating data from the
commandlog.txt or http.commands files to the http.log and the store.log files. If the
commandlog.txt or http.commands files were active, note the commands that may have been
running during that time. Ask the calendar users who may have been
active during the time of the corruption what they may have been doing
on the Calendar Server to see if they noticed any problems with the
Calendar Server during that time.
For example:
Were they syncing data from Outlook to the Calendar Server?
Were they editing an event in Calendar Express?
Were they viewing a subscribed calendar? If so, which calendar?
Once all this information is gathered, we can try to determine if
a certain end-user usage pattern or some maintenance activity on the server might be
causing the problem.
Please gather all the information you can and provide the information to Sun.
The more complete the information, the faster and easier it is to resolve.
For a complete guide to gathering debugging information for Calendar Server, see Sun Gathering Debug Data for Sun Java System Calendar Server.
You can download or view this book from the following location:
http://docs.sun.com/app/docs/coll/1656.1
Which Utilities to Use
After a database corruption occurs, you have to choose which recovery utility to
run first. It can be hard to determine when to use db_verify
versus db_recover versus csdb rebuild versus db_dump and db_reload.
To save time, first try to recover the calendar database with the
db_recover command and verify the individual database files with the db_verify command. On
a 1GB database, the db_recover operation takes less than a minute.
If no errors are reported against any of the calendar database files, you
may not have to rebuild any calendar databases with csdb rebuild.
If errors are reported, do a rebuild with csdb rebuild. On the same 1GB
database, the csdb rebuild operation can take from 30 to 45 minutes.
If any errors are reported against the other databases, a csdb rebuild may
or may not work, depending on how corrupted the database files are.
If the csdb rebuild does not work, you may have to take one of
these actions:
Try to repair the corrupted database
Consider using the hotbackup copy or archive copy of the calendar database files
Use copies of csbackup data to restore the database
How to Repair a Corrupt Database
This section summarizes the corrective actions you can take to repair a corrupt
database or minimize data loss and operational downtime.
For detailed instructions on how to recover a corrupted database, follow the steps
in the Calendar Server documentation. See these sections in the Sun Java System Calendar Server 6.3 Administration Guide:
“16.2. Administering
Databases with the csdb Utility” (http://docs.sun.com/app/docs/doc/819-4654/6n6prj59r?a=view)
“22.5 Dealing With Calendar Server
Database Issues” (http://docs.sun.com/app/docs/doc/819-4654/6n6prj5d9?a=view)
Take these steps:
Stop the Calendar Server.
Copy the files from the /var/opt/SUNWics5/csdb directory to a safe location for a backup.
Check that the __db.* files are removed from the “source” data directory, as lock issues may have occurred in these files.
Recover and verify the database using the Berkeley DB utilities, db_recover and db_verify.
(These utilities are located in the /opt/SUNWics5/cal/tools/unsupported/bin directory.)
If these commands are successful, go to Step 8.
If any errors are reported against any of the calendar database
files, including the alarms database or the gse database, follow one of
these two alternatives:
The first alternative includes two steps:
Try to repair the database at the database level by performing the db_dump and db_load commands. Or use an alternate backup database directory as the next best choice to minimize data loss and operational downtime.
If the db_dump and db_load commands are successful, try to repair the database at the calendar level by performing a csdb rebuild.
The second alternative is one step: simply use the csdb rebuild command to try to repair the database at the calendar level.
You can rebuild the database with the csdb rebuild, csdb -g rebuild or csdb -a -V rebuild command.
Performing a csdb rebuild will recover data. However, depending on how corrupted the database files are, only a limited amount of data may be recovered.
If you also perform a db_dump and db_load, you may recover more data than by using csdb rebuild alone. However, depending on how corrupted the database files are, you may still only recover a limited amount of data.
If the preceding steps are successful, copy these database files to a safe location.
Perform a database-level check to make sure that the rebuilt database is operational. Run the db_verify db file command. Repeat the command on each database file.
Note - You must run the deletelog with db_verify -o db file.
Perform a calendar-level check to make sure that the rebuilt database is operational:
Run the csdb check command.
Note: you can also use csdb -v list to verify if the database is operational.
Check that the rebuilt database is roughly the same size as the original.
If the check succeeds, the database has been successfully rebuilt.
Next, check basic calendar functionality on the rebuilt database before you bring the database back into production.
If you could not successfully rebuild the database, use an alternate
backup database directory (a hotbackup or archive copy) as the next
best choice to minimize data loss and operational downtime.
Repeat the process described above on the individual hotbackup or archive copies of
the database files, starting from the most recent copy.
Note - If you recover data from the archive backup directories, additional work is required
to recover data from the database transaction log files before you can restore
the data. If you do use the archive backup directory, you must run
the db_recover utility with the transaction log files to fully recover the data.
Additionally, we recommend that you choose an archive with good database files for
recovering data from the transaction log files. This option is a good choice
if your most important objective is to minimize operational downtime.
Alternatively, if you do not want to follow the procedure described in this
section, you can use copies of csbackup data to restore the database.
An Example: How to Recover a Corrupt Database
To perform the sample recovery
Stop the calendar services immediately.
(Do not start the calendar services.)
Copy the live database (csdb) to a save directory. Copy *.db and log.*.
For example, csdb_date (csdb_dec20).
Verify the individual database files in the live database.
db_verify and db_recovery are located in /opt/SUNWics5/cal/tools/unsupported/bin.
These steps should work if the problem is detected early and the
database is not badly damaged.
If the db_recover or db_verify command does not report any problems, copy the
recovered files to another save directory. For example, recovered_csdb_2006mar20.
If the db_recover or db_verify command reports errors, choose a hotbackup or archive
copy of the database and try to verify and recover those database files.
If db_verify reports errors on all the hotbackup and archive directories, try using
the csdb rebuild and/or db_dump and db_load commands.
Run these commands on the live database, hotbackup databases, or archive databases until you
find a database suitable for use.
% csdb rebuild
For instructions on how to use the db_dump and db_load commands, see “22.5.7. Using the Dump and Load Procedure to Recover a Calendar Database” in
the Troubleshooting chapter of the Sun Java System Calendar Server 6.3 Administration Guide: http://docs.sun.com/app/docs/doc/819-4654/6n6prj5dp?a=view.
Note - Copy the entire hotbackup or archive directory before you try to run db_verify
or db_recover to verify or recover any data.
Check how many calendars are in your database. You can do this quickly
by issuing the following command:
(Issue this command while the calendar services are not running.)
% cscal list > /tmp/cscal_out
% wc -l /tmp/cscal_out
Assume you know that you have about 12,200 calendars. If the output is
drastically lower--for example, 8,000 calendars--examine your hotbackup and archive copies to see if
you might have a better database copy to restore.
For a more complete check, issue either of the following commands:
% csdb check
% csdb -v list
You can perform the preceding commands on the live deployment machine or an
alternate staging or testing machine. These commands can take a long time. We
recommend that the Calendar Server is not running while the command is executing.
Move and rename today's hotbackup and archive directories so that, when you start
up calendar services, new hotbackup and archive directories are created.
Start the calendar services.
As Calendar Server starts, it creates new archive and hotbackup directories for today
and copies the live database files into the archive and hotbackup directories.
If none of the Calendar Server database files from the live database, hotbackup, and archive directories is recoverable, please send the following data to Sun. Be sure to send a copy of the data as it was before the data recovery was attempted:
csdb directory
Hotbackup directory
Hotbackup/tmp directory
Archive directory
All Calendar Server log files (including commandlog.txt) for the last two days.
For More Information
See the BigAdmin Communications Suite Hub for more information about
Sun Java Communications Suite: