Instance and Crash Recovery
Crash
recovery is used to recover from a failure either when a single-instance
database crashes or all instances of an Oracle Real Application Clusters
database crashes. Instance recovery refers to the case where a surviving
instance recovers a failed instance in an Oracle Real Application Clusters
database.
The
goal of crash and instance recovery is to restore the data block changes
located in the cache of the dead instance and to close the redo thread that was
left open. Instance and crash recovery use only online redo log files and
current online datafiles. Oracle recovers the redo
threads of the dead instances together.
- Redo the changes using the
current online datafiles (as left on disk after the crash or SHUTDOWN ABORT)
- Use only the online redo logs
and never require the use of the archived logs
- Have a recovery time governed
by the number of dead instances, amount of redo generated in each dead
redo thread since the last checkpoint, and by user-configurable factors
such as the number and size of redo log files, checkpoint frequency, and
the parallel recovery setting
- At the first database open
after the crash of a single-instance database or all instances of an
Oracle Real Applications Cluster database (crash recovery).
- When some but not all instances
of an Oracle Real Application Clusters configuration fail (instance
recovery). The recovery is performed automatically by a surviving instance
in the configuration.
The
important point is that in both crash and instance recovery Oracle applies the
redo automatically: no user intervention is required to supply redo logs.
However, you can set parameters in the database server that can tune the
duration of instance and crash recovery performance. Also, you can tune the
rolling forward and rolling back phases of instance recovery separately.
Finally, you can tune checkpointing so that recovery time is optimized.
Typically,
the term "media recovery" refers to recovery of datafiles. Block
media recovery is a more specialized operation that you can only perform with
RMAN.
Datafile
media recovery is used to recover from a lost or damaged current datafile or
control file. It is also used to recover changes that were lost when a
tablespace went offline without the OFFLINE NORMAL
option. Datafile media recovery and instance recovery have in common the requirement
to repair database integrity. However, these types of recovery differ with
respect to their additional features. Media recovery has the following
characteristics:
- Applies needed changes using
restored backups of damaged datafiles.
- Can use archived logs as well
as the online logs.
- Requires explicit invocation by
a user.
- Does not detect media failure
(that is, the need to restore a backup) automatically. After a backup has
been restored, however, detection of the need to recover it through media
recovery is automatic.
- Has a recovery time governed
solely by user policy (for example, frequency of backups, parallel
recovery parameters) rather than by Oracle internal mechanisms.
The
database cannot be opened if any of the online datafiles needs media recovery,
nor can a datafile that needs media recovery be brought online until media
recovery has been executed. The following scenarios necessitate media recovery:
- You restore a backup of a
datafile.
- You restore a backup control
file (even if all datafiles are current).
- A datafile is taken offline
(either by you or automatically by Oracle) without the OFFLINE NORMAL option.
Unless
the database is not open by any instance, datafile media recovery can only
operate on offline datafiles. You can initiate datafile media recovery before
opening a database even when crash recovery would have sufficed. If so, crash
recovery still runs automatically at database open.
Note
that when a file requires media recovery, you must perform media
recovery even if all necessary changes are contained in the online logs. In
other words, you must still run recovery even though the archived logs are not
needed. Media recovery may find nothing to do -- and signal the "no
recovery required" error -- if invoked for files that do not need
recovery.
Block
media recovery is a technique for restoring and recovering individual data
blocks while all database files remain online and available. If corruption is
limited to only a few blocks among a subset of database files, then block media
recovery may be preferable to datafile recovery.
The
interface to block media recovery is provided by RMAN. If you do not already
use RMAN as your principal backup and recovery solution, then you can still
perform block media recovery by cataloging into the RMAN repository the
necessary user-managed datafile and archived redo log backups.
Media
recovery proceeds through the application of redo data to the datafiles.
Whenever a change is made to a datafile, the change is first recorded in the
online redo logs. Media recovery selectively applies the changes recorded in
the online and archived redo logs to the restored datafile to roll it forward.
Database
buffers in the buffer cache in the SGA are written to disk only when necessary,
using a least-recently-used (LRU) algorithm. Because of the way that the
database writer process uses this algorithm to write database buffers to
datafiles, datafiles may contain some data blocks modified by uncommitted
transactions and some data blocks missing changes from committed transactions.
- Data blocks modified by a
transaction might not be written to the datafiles at commit time and may
only appear in the redo log. Therefore, the redo log contains changes that
must be reapplied to the database during recovery.
- After the roll forward phase,
the datafiles may contain changes that had not been committed at the time
of the failure. These uncommitted changes must be rolled back to ensure
transactional consistency. These changes were either saved to the
datafiles before the failure, or introduced during the roll forward phase.
To
solve this dilemma, two separate steps are generally used by Oracle for a
successful recovery of a system failure: rolling forward with the redo log
(cache recovery) and rolling back with the rollback or undo segments
(transaction recovery).
The
online redo log is a set of
operating system files that record all changes made to any database buffer,
including data, index, and rollback segments, whether the changes are
committed or uncommitted. All changes to Oracle blocks are recorded in the
online log.
The
first step of recovery from an instance or disk failure is called cache recovery or rolling forward, and involves reapplying all
of the changes recorded in the redo log to the datafiles. Because rollback data
is also recorded in the redo log, rolling forward also regenerates the
corresponding rollback segments
Rolling
forward proceeds through as many redo log files as necessary to bring the
database forward in time. Rolling forward usually includes online redo log
files (instance recovery or media recovery) and may include archived redo log
files (media recovery only).
After
rolling forward, the data blocks contain all committed changes. They may also
contain uncommitted changes that were either saved to the datafiles before the
failure, or were recorded in the redo log and introduced during cache recovery.
You
can run Oracle in either manual undo management
mode or automatic undo management
mode. In manual mode, you must create and manage rollback segments to record the before-image
of changes to the database. In automatic undo management mode, you create one
or more undo tablespaces. These undo tablespaces contain undo segments similar
to traditional rollback segments. The main difference is that Oracle manages
the undo for you.
Undo
blocks (whether in rollback segments or automatic undo tablespaces) record
database actions that should be undone during certain database operations. In
database recovery, the undo blocks roll back the effects of uncommitted
transactions previously applied by the rolling forward phase.
After
the roll forward, any changes that were not committed must be undone. Oracle
applies undo blocks to roll back uncommitted changes in data blocks that were
either written before the crash or introduced by redo application during cache
recovery. This process is called rolling back
or transaction recovery.
Oracle can roll back multiple
transactions simultaneously as needed. All transactions systemwide that were
active at the time of failure are marked as dead. Instead of waiting for SMON
to roll back dead transactions, new transactions can recover blocking
transactions themselves to get the row locks they need.
Media
recovery updates a backup to either to the current or to a specified noncurrent
time. When performing media recovery, you can recover the whole database, a
tablespace, or a datafile. In any case, you always use a restored backup to
perform the recovery.
Complete
recovery involves using redo data or incremental backups combined with a backup
of a database, tablespace, or datafile to update it to the most current point
in time. It is called complete because Oracle applies all of the
redo changes contained in the archived and online logs to the backup.
Typically, you perform complete media recovery after a media failure damages
datafiles or the control file.
You
can perform complete recovery on a database, tablespace, or datafile. If you
are performing complete recovery on the whole database, then whether you are
using RMAN or SQL*Plus you must:
- Mount the database
- Ensure that all datafiles you
want to recover are online
- Restore a backup of the whole
database or the files you want to recover
- Apply online or archived redo
logs, or a combination of the two
- Take the tablespace or datafile
to be recovered offline if the database is open
- Restore a backup of the
datafiles you want to recover
- Apply online or archived redo
logs, or a combination of the two
Incomplete
recovery uses a backup to produce a noncurrent version of the database. In
other words, you do not apply all of the redo records generated after the most
recent backup. You usually perform incomplete recovery of the whole database in
the following situations:
- Media failure destroys some or
all of the online redo logs.
- A user error causes data loss,
for example, a user inadvertently drops a table.
- You cannot perform complete
recovery because an archived redo log is missing.
- You lose your current control
file and must use a backup control file to open the database.
To
perform incomplete media recovery, you must restore all datafiles from backups
created prior to the time to which you want to recover and then open the
database with the RESETLOGS option when recovery completes. The RESETLOGS operation creates a new incarnation
of the database--in other words, a database with a new stream of log sequence
numbers starting with log sequence 1.
The
tablespace point-in-time recovery (TSPITR) feature enables you to recover one
or more tablespaces to a point-in-time that is different from the rest of the
database. TSPITR is most useful when you want to:
- Recover from an erroneous drop or
truncate table operation
- Recover a table that has become
logically corrupted
- Recover from an incorrect batch
job or other DML statement that has affected only a subset of the database
- Recover one independent schema
to a point different from the rest of a physical database (in cases where
there are multiple independent schemas in separate tablespaces of one
physical database)
- Recover a tablespace on a very
large database (VLDB) rather than restore the whole database from a backup
and perform a complete database roll-forward
Because
you are not completely recovering the database to the most current time, you
must tell Oracle when to terminate recovery. You can perform the following
types of media recovery.
- Use the RMAN utility to restore
and recover the database
- Restore backups by means of
operating system utilities, and then recover by executing the SQL*Plus RECOVER command
Whichever
method you choose, you can recover a database, tablespace, or datafile. Before
performing media recovery, you need to determine which datafiles to recover.
Often you can use the fixed view V$RECOVER_FILE.
This view lists all files that require recovery and explains the error that
necessitates recovery.
The
basic RMAN recovery commands are RESTORE and RECOVER. Use RESTORE to restore datafiles from backup sets or from image copies
on disk, either to their current location or to a new location. You can also
restore backup sets containing archived redo logs. Use the RMAN RECOVER command to perform media recovery and apply archived logs
or incremental backups.
RMAN
automates the procedure for recovering and restoring your backups and copies.
For example, run the following commands from within RMAN to restore and recover
the database to its current time:
If
you do not use RMAN, then you can restore backups with operating system
utilities and then run the SQL*Plus RECOVER command
to recover the database. You should follow these basic steps:
- After identifying which files
are damaged, place the database in the appropriate state for restore and
recovery. For example, if some but not all datafiles are damaged, then
take the affected tablespaces offline while the database is open.
- Restore the files with an
operating system utility. If you do not have a backup, it is sometimes
possible to perform recovery if you have the necessary redo logs dating
from the time when the datafiles were first created and the control file
contains the name of the damaged file.
If you cannot restore a datafile to its original location,
then relocate the restored datafile and change the location in the control
file.
For
example, assume that you lose the /oracle/dbs/users1.dbf
datafile, which is contained in the users tablespace, to a media failure.
Also, assume that you have a backup called /dsk2/backup/users1.dbf
on a separate disk drive. You discover that the datafile is missing because a
query returns an error saying that the file is missing.
Then,
you restore the backup of users1.dbf using an operating system utility. For example, you run
this UNIX command:
Assuming
that you have all necessary archived redo logs, you can recover the datafile
with the following SQL*Plus command:
No comments:
Post a Comment