How to Troubleshoot Grid
Infrastructure Startup Issues [ID 1050908.1]
|
|
|
|
Modified
25-JUN-2010Type
HOWTOStatus
PUBLISHED |
|
In this Document
Goal
Solution
Start up
sequence:
Cluster status
Case 1: OHASD.BIN does not start
Case
2: OHASD Agents does not start
Case 3: CSSD.BIN
does not start
Case 4: CRSD.BIN does not
start
Case 5: GPNPD.BIN does not
start
Case 6: Various other daemons does not
start
Case 7: CRSD Agents does not
start
Network and Naming Resolution
Verification
Log File Location, Ownership and
Permission
Network Socket File Location, Ownership
and Permission
Diagnostic file
collection
References
Applies to:
Oracle Server - Enterprise Edition - Version:
11.2.0.1 and later[Release: 11.2 and later ]
Information in
this document applies to any platform.
Goal
This goal of the note is to provide
reference to troubleshoot 11gR2 Grid Infrastructure clusterware startup issues.
It applies to issues in both new environments (during root.sh or rootupgrade.sh)
and unhealthy existing environments. To look specifically at root.sh issues,
see Note:
1053970.1
for more information.
Solution
Start up sequence:
In a nutshell, the
operating system starts ohasd, ohasd starts agents to start up daemons (gipcd,
mdnsd, gpnpd, ctssd, ocssd, crsd, evmd asm etc), and crsd starts agents that
start user resources (database, SCAN, listener etc).
For detailed Grid
Infrastructure clusterware startup sequence, please refer to note
1053147.1
Cluster status
To find out cluster and
daemon status:
$GRID_HOME/crsctl check crs
CRS-4638:
Oracle High Availability Services is online
CRS-4537: Cluster Ready Services
is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533:
Event Manager is online
$GRID_HOME/crsctl stat res -t
-init
--------------------------------------------------------------------------------
NAME
TARGET STATE SERVER
STATE_DETAILS
--------------------------------------------------------------------------------
Cluster
Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started
ora.crsd
1 ONLINE ONLINE rac1
ora.cssd
1 ONLINE
ONLINE rac1
ora.cssdmonitor
1 ONLINE ONLINE
rac1
ora.ctssd
1 ONLINE ONLINE rac1
OBSERVER
ora.diskmon
1 ONLINE ONLINE
rac1
ora.drivers.acfs
1 ONLINE ONLINE
rac1
ora.evmd
1 ONLINE ONLINE
rac1
ora.gipcd
1 ONLINE ONLINE
rac1
ora.gpnpd
1 ONLINE ONLINE
rac1
ora.mdnsd
1 ONLINE ONLINE rac1
Case 1: OHASD.BIN does not start
As
ohasd.bin is responsible to start up all other cluserwareprocesses directly or
indirectly, it needs tostart up properly for the rest of the stack to come
up.
Automatic ohasd.bin start up depends on the
following:
1.
OS is at appropriate run level:
OS
need to be at specified run level before CRS will try to start up.
To
find out at which run levelthe clusterwareneeds to come up:
cat /etc/inittab|grep
init.ohasd
h1:35
:respawn:/etc/init.d/init.ohasd run
>/dev/null 2>&1 </dev/null
Above example shows CRS
suppose to run at run level 3 and 5; please note depend on platform, CRS comes
up at different run level.
To find out current run level:
who -r
2.
"init.ohasd run" is up
On Linux/UNIX, as "init.ohasd run" is configured
in /etc/inittab, process init (pid 1, /sbin/init on Linux, Solaris and hp-ux,
/usr/sbin/init on AIX) will start and respawn "init.ohasd run" if it fails.
Without "init.ohasd run" up and running, ohasd.bin will not start:
ps -ef|grep init.ohasd|grep -v
grep
root 2279 1 0 18:14 ? 00:00:00 /bin/sh
/etc/init.d/init.ohasd run
3.
Cluserware auto
start is enabled - its enabled by default
By default CRS is enabled for
auto start upon node reboot, to enable:
$GRID_HOME/bin/crsctl enable
crs
To verify whether its currently enabled or not:
cat
$SCRBASE/$HOSTNAME/root/ohasdstr
enable
SCRBASE is
/etc/oracle/scls_scr on Linux and AIX, /var/opt/oracle/scls_scr on hp-ux and
Solaris
Note: NEVER EDIT THE FILE MANUALLY, use "crsctl enable/disable
crs" command instead.
4.
syslogd is up and OS is able to
execute init script S96ohasd
OS may stuck with some other Snn
script while node is coming up, thus never get chance to execute S96ohasd; if
that's the case, following message will not be in OS messages:
Jan 20 20:46:51 rac1 logger: Oracle HA daemon is enabled
for autostart.
If you don't see above message, the other
possibility is syslogd(/usr/sbin/syslogd) is not fully up. Grid may fail to come
up in that case as well. This may not apply to AIX.
To find out whether
OS is able to execute S96ohasd while node is coming up, modify
ohasd:
From:
case `$CAT
$AUTOSTARTFILE` in
enable*)
$LOGERR "Oracle HA daemon is
enabled for autostart."
To:
case `$CAT
$AUTOSTARTFILE` in
enable*)
/bin/touch
/tmp/ohasd.start."`date`"
$LOGERR "Oracle HA daemon is enabled for
autostart."
After a node reboot, if you don't see
/tmp/ohasd.start.timestamp
get created, it means OS stuck with some
other Snn
script. If you do see /tmp/ohasd.start.timestamp
but
not "Oracle HA daemon is enabled for autostart" in messages, likely syslogd is
not fully up. For both case, you will need engage System Administrator to find
out the issue on OS level. For latter case, the workaround is to "sleep" for
about 2 minutes, modify ohasd:
From:
case `$CAT
$AUTOSTARTFILE` in
enable*)
$LOGERR "Oracle HA daemon is
enabled for autostart."
To:
case `$CAT
$AUTOSTARTFILE` in
enable*)
/bin/sleep 120
$LOGERR "Oracle HA daemon is enabled for autostart."
分享到:
相关推荐
Troubleshooting for the ECG holter, guide you to learn✓ the artifact in the waveform and fix it.
Additionally, you’ll be introduced to the best patterns, practices, and common principles of microservice design that will help you to understand how to troubleshoot and debug the issues faced ...
You will find out how to troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch. Finally, you will analyze your cluster’s historical performance, ...
The text also examines security tools and techniques relevant to Windows 8 and explains how to troubleshoot startup errors and slowdowns. Labs for each chapter focus on support tools and techniques ...
Sander Rossel will take readers through common pitfalls, why databases run slowly, how to troubleshoot performance issues, and how to test and deploy SQL Server databases. The first half of SQL ...
Additionally, you’ll be introduced to the best patterns, practices, and common principles of microservice design that will help you to understand how to troubleshoot and debug the issues faced ...
We'll show you the best patterns, practices, and common principles of microservice design, and you'll learn to troubleshoot and debug the issues faced during development. We'll show you how to design...
How to use scripts to perform routine checks for health issues. Learn how to implement security and authentication in Nginx. Learn how and what to migrate from IIS & Apache web servers.
Even if you do choose to auto-generate your classes, understanding how these techniques work will allow you to expand the code to better fit your application's needs and to better troubleshoot issues ...
You will find out how to troubleshoot some of the common performance and reliability issues that come up when using ElasticSearch. Finally, you will analyze your cluster's historical performance, and...
Learn techniques to sail through system maintenance while ensuring accuracy and to practically troubleshoot issues Book Description NetSuite ERP is a complete, scalable cloud ERP solution targeted at...
Once you’ve grasped all this, you’ll explore how to troubleshoot Kubernetes clusters and debug Kubernetes applications. You also discover how to analyze the networking model and its alternatives in...
You will be taken through an in-depth discussion of how to effectively scale Salt to manage thousands of machines, and how to troubleshoot issues when things don't go exactly the way you expect them ...
The book then covers common lower-layer and upper-layer protocols and provides you with real-world scenarios like Internet connectivity issues, how to capture social media traffic, and fighting a ...
you will learn to leverage the App-V reporting server along with Microsoft Office Excel and pivot tables to gain insights on which applications are being used, along with how to troubleshoot issues ...