KnowledgeForum: transaction recovery in Oracle

Instance and Crash Recovery

Crash recovery is used to recover from a failure either when a single-instance database crashes or all instances of an Oracle Real Application Clusters database crashes. Instance recovery refers to the case where a surviving instance recovers a failed instance in an Oracle Real Application Clusters database.

The goal of crash and instance recovery is to restore the data block changes located in the cache of the dead instance and to close the redo thread that was left open. Instance and crash recovery use only online redo log files and current online datafiles. Oracle recovers the redo threads of the dead instances together.

Crash and instance recovery have the following shared characteristics:

Redo the changes using the current online datafiles (as left on disk after the crash or SHUTDOWN ABORT)
Use only the online redo logs and never require the use of the archived logs
Have a recovery time governed by the number of dead instances, amount of redo generated in each dead redo thread since the last checkpoint, and by user-configurable factors such as the number and size of redo log files, checkpoint frequency, and the parallel recovery setting

Oracle performs this recovery automatically on two occasions:

At the first database open after the crash of a single-instance database or all instances of an Oracle Real Applications Cluster database (crash recovery).
When some but not all instances of an Oracle Real Application Clusters configuration fail (instance recovery). The recovery is performed automatically by a surviving instance in the configuration.

The important point is that in both crash and instance recovery Oracle applies the redo automatically: no user intervention is required to supply redo logs. However, you can set parameters in the database server that can tune the duration of instance and crash recovery performance. Also, you can tune the rolling forward and rolling back phases of instance recovery separately. Finally, you can tune checkpointing so that recovery time is optimized.

Media Recovery

Media recovery is divided into the following types:

Datafile media recovery
Block media recovery

Typically, the term "media recovery" refers to recovery of datafiles. Block media recovery is a more specialized operation that you can only perform with RMAN.

Datafile Media Recovery

Datafile media recovery is used to recover from a lost or damaged current datafile or control file. It is also used to recover changes that were lost when a tablespace went offline without the OFFLINE NORMAL option. Datafile media recovery and instance recovery have in common the requirement to repair database integrity. However, these types of recovery differ with respect to their additional features. Media recovery has the following characteristics:

Applies needed changes using restored backups of damaged datafiles.
Can use archived logs as well as the online logs.
Requires explicit invocation by a user.
Does not detect media failure (that is, the need to restore a backup) automatically. After a backup has been restored, however, detection of the need to recover it through media recovery is automatic.
Has a recovery time governed solely by user policy (for example, frequency of backups, parallel recovery parameters) rather than by Oracle internal mechanisms.

The database cannot be opened if any of the online datafiles needs media recovery, nor can a datafile that needs media recovery be brought online until media recovery has been executed. The following scenarios necessitate media recovery:

You restore a backup of a datafile.
You restore a backup control file (even if all datafiles are current).
A datafile is taken offline (either by you or automatically by Oracle) without the OFFLINE NORMAL option.

Unless the database is not open by any instance, datafile media recovery can only operate on offline datafiles. You can initiate datafile media recovery before opening a database even when crash recovery would have sufficed. If so, crash recovery still runs automatically at database open.

Note that when a file requires media recovery, you must perform media recovery even if all necessary changes are contained in the online logs. In other words, you must still run recovery even though the archived logs are not needed. Media recovery may find nothing to do -- and signal the "no recovery required" error -- if invoked for files that do not need recovery.

Block Media Recovery

Block media recovery is a technique for restoring and recovering individual data blocks while all database files remain online and available. If corruption is limited to only a few blocks among a subset of database files, then block media recovery may be preferable to datafile recovery.

The interface to block media recovery is provided by RMAN. If you do not already use RMAN as your principal backup and recovery solution, then you can still perform block media recovery by cataloging into the RMAN repository the necessary user-managed datafile and archived redo log backups.

Redo Application During Recovery

Media recovery proceeds through the application of redo data to the datafiles. Whenever a change is made to a datafile, the change is first recorded in the online redo logs. Media recovery selectively applies the changes recorded in the online and archived redo logs to the restored datafile to roll it forward.

This section contains these topics:

About Redo Application
Cache Recovery
Transaction Recovery

About Redo Application

Database buffers in the buffer cache in the SGA are written to disk only when necessary, using a least-recently-used (LRU) algorithm. Because of the way that the database writer process uses this algorithm to write database buffers to datafiles, datafiles may contain some data blocks modified by uncommitted transactions and some data blocks missing changes from committed transactions.

Two potential problems can result if an instance failure occurs:

Data blocks modified by a transaction might not be written to the datafiles at commit time and may only appear in the redo log. Therefore, the redo log contains changes that must be reapplied to the database during recovery.
After the roll forward phase, the datafiles may contain changes that had not been committed at the time of the failure. These uncommitted changes must be rolled back to ensure transactional consistency. These changes were either saved to the datafiles before the failure, or introduced during the roll forward phase.

To solve this dilemma, two separate steps are generally used by Oracle for a successful recovery of a system failure: rolling forward with the redo log (cache recovery) and rolling back with the rollback or undo segments (transaction recovery).

Cache Recovery

The online redo log is a set of operating system files that record all changes made to any database buffer, including data, index, and rollback segments, whether the changes are committed or uncommitted. All changes to Oracle blocks are recorded in the online log.

The first step of recovery from an instance or disk failure is called cache recovery or rolling forward, and involves reapplying all of the changes recorded in the redo log to the datafiles. Because rollback data is also recorded in the redo log, rolling forward also regenerates the corresponding rollback segments

Rolling forward proceeds through as many redo log files as necessary to bring the database forward in time. Rolling forward usually includes online redo log files (instance recovery or media recovery) and may include archived redo log files (media recovery only).

After rolling forward, the data blocks contain all committed changes. They may also contain uncommitted changes that were either saved to the datafiles before the failure, or were recorded in the redo log and introduced during cache recovery.

Transaction Recovery

You can run Oracle in either manual undo management mode or automatic undo management mode. In manual mode, you must create and manage rollback segments to record the before-image of changes to the database. In automatic undo management mode, you create one or more undo tablespaces. These undo tablespaces contain undo segments similar to traditional rollback segments. The main difference is that Oracle manages the undo for you.

Undo blocks (whether in rollback segments or automatic undo tablespaces) record database actions that should be undone during certain database operations. In database recovery, the undo blocks roll back the effects of uncommitted transactions previously applied by the rolling forward phase.

After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This process is called rolling back or transaction recovery.

Oracle can roll back multiple transactions simultaneously as needed. All transactions systemwide that were active at the time of failure are marked as dead. Instead of waiting for SMON to roll back dead transactions, new transactions can recover blocking transactions themselves to get the row locks they need.

Complete and Incomplete Media Recovery

Media recovery updates a backup to either to the current or to a specified noncurrent time. When performing media recovery, you can recover the whole database, a tablespace, or a datafile. In any case, you always use a restored backup to perform the recovery.

This section contains the follow topics:

Complete Recovery
Incomplete Recovery

Complete Recovery

Complete recovery involves using redo data or incremental backups combined with a backup of a database, tablespace, or datafile to update it to the most current point in time. It is called complete because Oracle applies all of the redo changes contained in the archived and online logs to the backup. Typically, you perform complete media recovery after a media failure damages datafiles or the control file.

You can perform complete recovery on a database, tablespace, or datafile. If you are performing complete recovery on the whole database, then whether you are using RMAN or SQL*Plus you must:

Mount the database
Ensure that all datafiles you want to recover are online
Restore a backup of the whole database or the files you want to recover
Apply online or archived redo logs, or a combination of the two

If you are performing complete recovery on a tablespace or datafile, then you must:

Take the tablespace or datafile to be recovered offline if the database is open
Restore a backup of the datafiles you want to recover
Apply online or archived redo logs, or a combination of the two

Incomplete Recovery

Incomplete recovery uses a backup to produce a noncurrent version of the database. In other words, you do not apply all of the redo records generated after the most recent backup. You usually perform incomplete recovery of the whole database in the following situations:

Media failure destroys some or all of the online redo logs.
A user error causes data loss, for example, a user inadvertently drops a table.
You cannot perform complete recovery because an archived redo log is missing.
You lose your current control file and must use a backup control file to open the database.

To perform incomplete media recovery, you must restore all datafiles from backups created prior to the time to which you want to recover and then open the database with the RESETLOGS option when recovery completes. The RESETLOGS operation creates a new incarnation of the database--in other words, a database with a new stream of log sequence numbers starting with log sequence 1.

Tablespace Point-in-Time Recovery

The tablespace point-in-time recovery (TSPITR) feature enables you to recover one or more tablespaces to a point-in-time that is different from the rest of the database. TSPITR is most useful when you want to:

Recover from an erroneous drop or truncate table operation
Recover a table that has become logically corrupted
Recover from an incorrect batch job or other DML statement that has affected only a subset of the database
Recover one independent schema to a point different from the rest of a physical database (in cases where there are multiple independent schemas in separate tablespaces of one physical database)
Recover a tablespace on a very large database (VLDB) rather than restore the whole database from a backup and perform a complete database roll-forward

Media Recovery Options

Because you are not completely recovering the database to the most current time, you must tell Oracle when to terminate recovery. You can perform the following types of media recovery.

Type of Recovery	Function
Time-based recovery	Recovers the data up to a specified point in time.
Cancel-based recovery	Recovers until you issue the CANCEL statement (not available when using Recovery Manager).
Change-based recovery	Recovers until the specified SCN.
Log sequence recovery	Recovers until the specified log sequence number (only available when using Recovery Manager).

RMAN and User-Managed Restore and Recovery

You have a choice between two basic methods for recovering physical files. You can:

Use the RMAN utility to restore and recover the database
Restore backups by means of operating system utilities, and then recover by executing the SQL*Plus RECOVER command

Whichever method you choose, you can recover a database, tablespace, or datafile. Before performing media recovery, you need to determine which datafiles to recover. Often you can use the fixed view V$RECOVER_FILE. This view lists all files that require recovery and explains the error that necessitates recovery.

RMAN Restore and Recovery

The basic RMAN recovery commands are RESTORE and RECOVER. Use RESTORE to restore datafiles from backup sets or from image copies on disk, either to their current location or to a new location. You can also restore backup sets containing archived redo logs. Use the RMAN RECOVER command to perform media recovery and apply archived logs or incremental backups.

RMAN automates the procedure for recovering and restoring your backups and copies. For example, run the following commands from within RMAN to restore and recover the database to its current time:

SHUTDOWN IMMEDIATE; # shuts down database

STARTUP MOUNT; # starts and mounts database

RESTORE DATABASE; # restores all datafiles

RECOVER DATABASE; # recovers database using all available redo

ALTER DATABASE OPEN; # reopens the database

User-Managed Restore and Recovery

If you do not use RMAN, then you can restore backups with operating system utilities and then run the SQL*Plus RECOVER command to recover the database. You should follow these basic steps:

After identifying which files are damaged, place the database in the appropriate state for restore and recovery. For example, if some but not all datafiles are damaged, then take the affected tablespaces offline while the database is open.

Restore the files with an operating system utility. If you do not have a backup, it is sometimes possible to perform recovery if you have the necessary redo logs dating from the time when the datafiles were first created and the control file contains the name of the damaged file.

If you cannot restore a datafile to its original location, then relocate the restored datafile and change the location in the control file.

Restore any necessary archived redo log files.

Use the SQL*Plus RECOVER command to recover the datafile backups.

For example, assume that you lose the /oracle/dbs/users1.dbf datafile, which is contained in the users tablespace, to a media failure. Also, assume that you have a backup called /dsk2/backup/users1.dbf on a separate disk drive. You discover that the datafile is missing because a query returns an error saying that the file is missing.

Your first step is to take the users tablespace offline. For example, you run this SQL statement:

SQL> ALTER TABLESPACE users OFFLINE IMMEDIATE;

Then, you restore the backup of users1.dbf using an operating system utility. For example, you run this UNIX command:

% cp /dsk2/backup/users1.dbf /oracle/dbs/users1.dbf

Assuming that you have all necessary archived redo logs, you can recover the datafile with the following SQL*Plus command:

SQL> RECOVER AUTOMATIC DATAFILE '/oracle/dbs/users1.dbf';

Finally, bring the tablespace online as follows:

SQL> ALTER TABLESPACE users ONLINE;

KnowledgeForum

Monday, 16 July 2012

transaction recovery in Oracle

No comments:

Post a Comment