日期:2014-05-16  浏览次数:20747 次

oracle 10g RAC 节点重启,但是没有记录有效的日志信息--问题诊断

oracle 10g RAC 重启,但是没有记录有效的日志信息

from:
Oracle? Database Release Notes
10g Release 2 (10.2) for Linux x86-64
B15666-19
____________________________________________________________________
6.15 Configuring Oracle Clusterware Process Monitor Daemon

The 10.2.0.4 patch release for Oracle Clusterware on Linux includes

the Oracle Clusterware Process Monitor Daemon (oprocd). It is started

automatically by Oracle Clusterware to detect system hangs. When it

detects a system hang, it restarts the hung node.

Review the following configuration information if you have installed

the 10.2.0.4 patch set.

Oracle has found wide variations in scheduling latencies observed

across operating systems and versions of operating systems. Because

of these scheduling latencies, the default values for oprocd can be

overly sensitive, particularly under heavy system load, resulting in

unnecessary oprocd-initiated restarts (false restarts).

Oracle recommends that you address scheduling latencies with your

operating system vendor to reduce or eliminate them as much as

possible, as they can cause other problems.

To overcome these scheduling latencies, Oracle recommends that you

set the Oracle Clusterware parameter diagwait to the value 13. This

setting increases the time for failed nodes to flush final trace

files, which helps to debug the cause of a node failure. You must

shut down the cluster to change the diagwait setting. However, if you

prefer, you can use the default timing threshold for diagwait. In

that case, you do not need to perform the procedure documented here.

If you require more aggressive failover times to meet more stringent

service level requirements, then you should open a service request

with Oracle Support to receive advice about how to tune for lower

failover settings.

Note:
Changing the diagwait parameter requires a clusterwide shutdown. Oracle recommends that you change the diagwait setting either immediately after the initial installation, or during a scheduled outage.

Log in as root, and run the following command on all nodes, where

CRS_home is the home directory of the Oracle Clusterware

installation:

# CRS_home/bin/crsctl stop crs
Enter the following command, where CRS_home is the Oracle Clusterware

home:

# CRS_home/bin/oprocd stop
Repeat this command on all nodes.

From one node of the cluster, change the value of the diagwait

parameter to 13 seconds by issuing the following command as root:

# CRS_home/bin/crsctl set css diagwait 13 -force
Restart the Oracle Clusterware by running the following command on

all nodes:

# CRS_home/bin/crsctl start crs
Run the following command to ensure that Oracle Clusterware is

functioning properly:

# CRS_home/bin/crsctl check crs


来自IBM的解释:
Server running AIX with Oracle RAC reboots itself


Technote (troubleshooting)


Problem(Abstract)
Server running AIX with Oracle RAC reboots itself with no warning

Symptom
AIX server shuts down and/or reboots.

A REBOOT_ID is logged in /var/adm/ras/errlog indicating "SYSTEM

SHUTDOWN BY USER" although no shutdown or reboot command was issued

by any user.

example error message...

LABEL: REBOOT_ID
IDENTIFIER: 2BFA76F6

Date/Time: Wed Dec 3 08:19:09 2008