ITSD Computing and Communications Services News
February, 2003
  Backing Up Databases Requires Extra Attention and Effort

If you work with or rely on databases, it's important to protect the information so it's accurate and available when needed. One way to do this is to regularly back up all the files on your computer, either through the Lab's centralized backup service or via a local system.

In the past, all databases had to be completely shut down to perform backups. While that's not always the case today, most databases require extra effort to ensure that the data are backed up properly.

Some databases, such as MySQL, PostreSQLl and Oracle, require that backups be performed either by closing the database so it is not active or by using the built-in tools for taking a snapshot of the database data. Some databases, such as Oracle, allow backups to be performed on either a raw device (on which the data reside) or on a "mirror" (a duplicate copy of the data) that has been specifically set up to be captured when backups are performed.

A few databases (e.g. BerkeleyDB, ASCII flatfile or dBase/FoxPro) can be backed up while the data are still available via the database. These databases can be backed the up like other files. However, if the database is actively being used during the backup period, not all of the changes made since the last backup will necessarily be caught.

In fact, any backup of any database may not include the most recent changes as the backup is really just a snapshot of the database at a particular point, cutting off further entry while the backup process occurs. Usually, changes made during one backup will be captured by the next backup procedure.

Some databases now feature special tools to perform the backup from within the database itself and this feature is becoming more common. Both MySQL and PostgreSQL have tools that take the snapshot, put it in a file separate from the actual database and then back it up. With MySQL folks should use mysqldump or mysqlhotcopy to take the snapshot backup. The documentation should be consulted to ensure the most reliable backup (including LOCK TABLES and FLUSH TABLES). With PostgreSQL, the idea is to use the SQL-dump utility to generate a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose.

In summary, for those databases that don't contain backup utilities, the idea is to have two copies of the database written at the same time, then stop the database momentarily to break off one of the copies from activity and back that copy up. Afterwards, the user needs to bring the two databases back into synchronization and then reactivate the two-copy system until the next backup. There are options within different backup systems to either then back up the raw device itself (an example in UNIX is /dev/rdsk/*) or the partition and its entire structure.

Whichever approach is used, pay special attention to ensure that the backups are working correctly. With the proliferation of free and cheap databases, users and administrators may not be as mindful of the need to protect the information through backups. If a database is to be used in any production manner, it is recommended that whoever does the backups be informed about them so that proper actions can be taken to assure the database is backed up correctly.

For any backup related service needs, please contact the Backup Services staff via email at Backups@lbl.gov or by phone at xBKUP (x2587). Please check their website for extensive information about backups at LBNL.