BigAdmin System Administration Portal
Feature Article
Print-friendly VersionPrint-friendly Version

Recovering Sun Java System Calendar Server Databases

Calendar Server Engineering Team, May 2007

This article discusses how to diagnose Sun Java System Calendar Server calendar database corruption and describes the best ways to recover a database in various situations. The example at the end offers a cookbook list of steps for recovering a database after it has been corrupted.

The article includes the following topics:

Who Should Read This Article

This article is intended for Calendar Server administrators who manage Calendar Server 6.x database operations.


What to Do When a Database Is Corrupted

Database Corruption Basics

No one can guarantee corruption-free databases, but you can minimize data loss and operational downtime. Closely monitoring the database and Calendar Server is important to detecting corruption early. Frequent and complete backups are the key to recovering from corruption once it is found.

Two levels of corruption are possible in a calendar database:

  • Application level

    Offending entries in one or more database files prevent the server from running when they are operated upon.

  • Database level

    Corruptions in the Berkeley database pages cause various problems. One common symptom is looping while running the csdb check utility. Another common symptom is an error message such as one of the following:

    "illegal page type or format"

    or

    "page 97895 doesn't exist, create flag not set."

Administrative Utility Basics

This document mentions two different types of command-line utilities:

Calendar Server Administrative Utility

  • csdb: Calendar database utility. It includes command-line options like rebuild, list, and check.

    csdb operates at the Calendar Server application level.

Berkeley Database Supporting Utilities

  • db_recover: Recovery utility.

  • db_verify: Verification utility.

These utilities operate directly on the Berkeley database files.

Monitoring Calendar Log Files

You should inspect the log files on a regular basis for ALERT, CRITICAL, ERROR, and WARNING level errors. If you find errors, examine the events for possible problems in the operation of Calendar Server.

More specifically, to detect calendar database corruption, inspect the calendar log files for errors such as the following:

In the http.log files:

[29/Nov/*2005:14:50:27* +0000] usxsl069 cshttpd[8140]: General Error:

    caldb: Error with calendar database: PANIC: fatal region error
    detected; run recovery

In the store.log files:

20051129150204 - VerifyDB: failures detected -
    /opt/iplanet/hotbackup/hotbackup_20051129/ics50alarms.db
20051129150204 - VerifyDB: check - /tmp/csstored.ics50alarms.db.out
    for possible error messages.

What Data Needs to Be Gathered When a Database Is Corrupted

This section summarizes the types of data you should gather that will help Sun investigate a database corruption if none of the Calendar Server database files is recoverable.

When the log files indicate that there is a database corruption, stop the services, if possible, and copy the entire contents of the /var/opt/SUNWics5/csdb directory and the /var/opt/SUNWics5/logs files into a save directory.

Note the exact time of the corruption of the error in the log files. Gather information close to the time of the error in all the calendar log files available. Note any activities that may have occurred on the server during that time: nightly scripted jobs, cron jobs, any backups, automated purges, data being pushed or pulled from an external source, and so on.

Note the users who might have been logged in and active during the time of the database corruption. This may require correlating data from the commandlog.txt or http.commands files to the http.log and the store.log files. If the commandlog.txt or http.commands files were active, note the commands that may have been running during that time. Ask the calendar users who may have been active during the time of the corruption what they may have been doing on the Calendar Server to see if they noticed any problems with the Calendar Server during that time.

For example:

  • Were they syncing data from Outlook to the Calendar Server?

  • Were they editing an event in Calendar Express?

  • Were they viewing a subscribed calendar? If so, which calendar?

Once all this information is gathered, we can try to determine if a certain end-user usage pattern or some maintenance activity on the server might be causing the problem.

Please gather all the information you can and provide the information to Sun. The more complete the information, the faster and easier it is to resolve.

For a complete guide to gathering debugging information for Calendar Server, see Sun Gathering Debug Data for Sun Java System Calendar Server. You can download or view this book from the following location:

http://docs.sun.com/app/docs/coll/1656.1


Which Utilities to Use

After a database corruption occurs, you have to choose which recovery utility to run first. It can be hard to determine when to use db_verify versus db_recover versus csdb rebuild versus db_dump and db_reload.

To save time, first try to recover the calendar database with the db_recover command and verify the individual database files with the db_verify command. On a 1GB database, the db_recover operation takes less than a minute.

If no errors are reported against any of the calendar database files, you may not have to rebuild any calendar databases with csdb rebuild.

If errors are reported, do a rebuild with csdb rebuild. On the same 1GB database, the csdb rebuild operation can take from 30 to 45 minutes.

If any errors are reported against the other databases, a csdb rebuild may or may not work, depending on how corrupted the database files are.

If the csdb rebuild does not work, you may have to take one of these actions:

  • Try to repair the corrupted database

  • Consider using the hotbackup copy or archive copy of the calendar database files

  • Use copies of csbackup data to restore the database


How to Repair a Corrupt Database

This section summarizes the corrective actions you can take to repair a corrupt database or minimize data loss and operational downtime.

For detailed instructions on how to recover a corrupted database, follow the steps in the Calendar Server documentation. See these sections in the Sun Java System Calendar Server 6.3 Administration Guide:

“16.2. Administering Databases with the csdb Utility” (http://docs.sun.com/app/docs/doc/819-4654/6n6prj59r?a=view)

“22.5 Dealing With Calendar Server Database Issues” (http://docs.sun.com/app/docs/doc/819-4654/6n6prj5d9?a=view)

Take these steps:

  1. Stop the Calendar Server.

  2. Copy the files from the /var/opt/SUNWics5/csdb directory to a safe location for a backup.

    Check that the __db.* files are removed from the “source” data directory, as lock issues may have occurred in these files.

  3. Recover and verify the database using the Berkeley DB utilities, db_recover and db_verify.

    (These utilities are located in the /opt/SUNWics5/cal/tools/unsupported/bin directory.)

    If these commands are successful, go to Step 8.

  4. If any errors are reported against any of the calendar database files, including the alarms database or the gse database, follow one of these two alternatives:

    • The first alternative includes two steps:

      • Try to repair the database at the database level by performing the db_dump and db_load commands. Or use an alternate backup database directory as the next best choice to minimize data loss and operational downtime.

      • If the db_dump and db_load commands are successful, try to repair the database at the calendar level by performing a csdb rebuild.

    • The second alternative is one step: simply use the csdb rebuild command to try to repair the database at the calendar level.

    You can rebuild the database with the csdb rebuild, csdb -g rebuild or csdb -a -V rebuild command.

    Performing a csdb rebuild will recover data. However, depending on how corrupted the database files are, only a limited amount of data may be recovered.

    If you also perform a db_dump and db_load, you may recover more data than by using csdb rebuild alone. However, depending on how corrupted the database files are, you may still only recover a limited amount of data.

  5. If the preceding steps are successful, copy these database files to a safe location.

  6. Perform a database-level check to make sure that the rebuilt database is operational. Run the db_verify db file command. Repeat the command on each database file.

    Note - You must run the deletelog with db_verify -o db file.

  7. Perform a calendar-level check to make sure that the rebuilt database is operational:

    • Run the csdb check command.

      Note: you can also use csdb -v list to verify if the database is operational.

    • Check that the rebuilt database is roughly the same size as the original.

    If the check succeeds, the database has been successfully rebuilt.

  8. Next, check basic calendar functionality on the rebuilt database before you bring the database back into production.

  9. If you could not successfully rebuild the database, use an alternate backup database directory (a hotbackup or archive copy) as the next best choice to minimize data loss and operational downtime.

Repeat the process described above on the individual hotbackup or archive copies of the database files, starting from the most recent copy.

Note - If you recover data from the archive backup directories, additional work is required to recover data from the database transaction log files before you can restore the data. If you do use the archive backup directory, you must run the db_recover utility with the transaction log files to fully recover the data. Additionally, we recommend that you choose an archive with good database files for recovering data from the transaction log files. This option is a good choice if your most important objective is to minimize operational downtime.

Alternatively, if you do not want to follow the procedure described in this section, you can use copies of csbackup data to restore the database.


An Example: How to Recover a Corrupt Database

To perform the sample recovery

  1. Stop the calendar services immediately.

    (Do not start the calendar services.)

  2. Copy the live database (csdb) to a save directory. Copy *.db and log.*.

    For example, csdb_date (csdb_dec20).

  3. Verify the individual database files in the live database.

    db_verify and db_recovery are located in /opt/SUNWics5/cal/tools/unsupported/bin.

    % db_verify ics50alarms.db
    % db_verify ics50calprops.db
    % db_verify -o ics50deletelog.db
    % db_verify ics50events.db
    % db_verify ics50gse.db
    % db_verify ics50journals.db
    % db_verify ics50recurring.db
    % db_verify ics50todos.db
  4. Execute the following database recovery (db_recover) step. For now, perform this step whether or not any errors are reported from the db_verify step.

    Recover data in the live database:

    db_recover -c -h path to csdb

    or

    cd path to csdb
    db_recover -c
  5. Verify the individual database files in the live database again:
    % db_verify ics50alarms.db
    % db_verify ics50calprops.db
    % db_verify -o ics50deletelog.db
    % db_verify ics50events.db
    % db_verify ics50gse.db
    % db_verify ics50journals.db
    % db_verify ics50recurring.db
    % db_verify ics50todos.db

    These steps should work if the problem is detected early and the database is not badly damaged.

    If the db_recover or db_verify command does not report any problems, copy the recovered files to another save directory. For example, recovered_csdb_2006mar20.

    If the db_recover or db_verify command reports errors, choose a hotbackup or archive copy of the database and try to verify and recover those database files.

  6. If db_verify reports errors on all the hotbackup and archive directories, try using the csdb rebuild and/or db_dump and db_load commands.

    Run these commands on the live database, hotbackup databases, or archive databases until you find a database suitable for use.

    % csdb rebuild

    For instructions on how to use the db_dump and db_load commands, see “22.5.7. Using the Dump and Load Procedure to Recover a Calendar Database” in the Troubleshooting chapter of the Sun Java System Calendar Server 6.3 Administration Guide: http://docs.sun.com/app/docs/doc/819-4654/6n6prj5dp?a=view.

    Note - Copy the entire hotbackup or archive directory before you try to run db_verify or db_recover to verify or recover any data.

  7. Check how many calendars are in your database. You can do this quickly by issuing the following command:

    (Issue this command while the calendar services are not running.)

    % cscal list > /tmp/cscal_out
    % wc -l /tmp/cscal_out

    Assume you know that you have about 12,200 calendars. If the output is drastically lower--for example, 8,000 calendars--examine your hotbackup and archive copies to see if you might have a better database copy to restore.

    For a more complete check, issue either of the following commands:

    % csdb check
    % csdb -v list

    You can perform the preceding commands on the live deployment machine or an alternate staging or testing machine. These commands can take a long time. We recommend that the Calendar Server is not running while the command is executing.

  8. Move and rename today's hotbackup and archive directories so that, when you start up calendar services, new hotbackup and archive directories are created.

  9. Start the calendar services.

    As Calendar Server starts, it creates new archive and hotbackup directories for today and copies the live database files into the archive and hotbackup directories.

  10. If none of the Calendar Server database files from the live database, hotbackup, and archive directories is recoverable, please send the following data to Sun. Be sure to send a copy of the data as it was before the data recovery was attempted:

    • csdb directory

    • Hotbackup directory

    • Hotbackup/tmp directory

    • Archive directory

    • All Calendar Server log files (including commandlog.txt) for the last two days.


For More Information

See the BigAdmin Communications Suite Hub for more information about Sun Java Communications Suite:

http://www.sun.com/bigadmin/hubs/comms


Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.


BigAdmin
  
 
 
 
Would you recommend this Sun site to a friend or colleague?
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc.