Here’s a treat for the hard-core Oracle performance geeks out there – I’m releasing a cool, but still experimental script for ASH (or poor-man’s ASH)-based wait event analysis, which should add a whole new dimension into ASH based performance analysis. It doesn’t replace any of the existing ASH analysis techniques, but should bring the relationships between Oracle sessions in complex wait chains out to bright daylight much easier than before.
You all are familiar with the AWR/Statspack timed event summary below:
Similar breakdown can be gotten by just aggregating ASH samples by the wait event:
SQL> @ash/dashtop session_state,event 1=1 "TIMESTAMP'2013-09-09 21:00:00'" "TIMESTAMP'2013-09-09 22:00:00'" %This SESSION EVENT TotalSeconds CPU User I/O Application Concurrency Commit Configuration Cluster Idle Network System I/O Scheduler Administrative Queueing Other MIN(SAMPLE_TIME) MAX(SAMPLE_TIME) ------ ------- ---------------------------------------------------------------- ------------ ---------- ---------- ----------- ----------- ---------- ------------- ---------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- --------------------------------------------------------------------------- --------------------------------------------------------------------------- 68% ON CPU 25610 25610 0 0 0 0 0 0 0 0 0 0 0 0 0 09-SEP-13 09.00.01.468 PM 09-SEP-13 09.59.58.059 PM 14% WAITING SQL*Net more data from client 5380 0 0 0 0 0 0 0 0 5380 0 0 0 0 0 09-SEP-13 09.00.01.468 PM 09-SEP-13 09.59.58.059 PM 6% WAITING enq: HW - contention 2260 0 0 0 0 0 2260 0 0 0 0 0 0 0 0 09-SEP-13 09.04.41.893 PM 09-SEP-13 09.56.07.626 PM 3% WAITING log file parallel write 1090 0 0 0 0 0 0 0 0 0 1090 0 0 0 0 09-SEP-13 09.00.11.478 PM 09-SEP-13 09.59.58.059 PM 2% WAITING db file parallel write 730 0 0 0 0 0 0 0 0 0 730 0 0 0 0 09-SEP-13 09.01.11.568 PM 09-SEP-13 09.59.48.049 PM 2% WAITING enq: TX - contention 600 0 0 0 0 0 0 0 0 0 0 0 0 0 600 09-SEP-13 09.04.41.893 PM 09-SEP-13 09.48.16.695 PM 1% WAITING buffer busy waits 560 0 0 0 560 0 0 0 0 0 0 0 0 0 0 09-SEP-13 09.10.02.492 PM 09-SEP-13 09.56.07.626 PM 1% WAITING log file switch completion 420 0 0 0 0 0 420 0 0 0 0 0 0 0 0 09-SEP-13 09.47.16.562 PM 09-SEP-13 09.47.16.562 PM 1% WAITING latch: redo allocation 330 0 0 0 0 0 0 0 0 0 0 0 0 0 330 09-SEP-13 09.04.41.893 PM 09-SEP-13 09.53.27.307 PM ...
The abovementioned output has one shortcoming in a multiuser (database) system – not all wait events are simple, where a session waits for OS to complete some self-contained operation (like an IO request). Often a session waits for another session (who holds some lock) or some background process who needs to complete some task before our session can continue. That other session may wait for a yet another session due to some other lock. The other session itself waits for a yet another one, thanks to some buffer pin (buffer busy wait). The session who holds the buffer pin, may itself be waiting for LGWR, who may in turn wait for DBWR etc… You get the point – sometimes we have a bunch of sessions, waiting for each other in a chain.
The V$WAIT_CHAINS view introduced in Oracle 11g is capable of showing such chains of waiting sessions – however it is designed to diagnose relatively long-lasting hangs, not performance problems and short (but non-trivial) contention. Usually the DIAG process, who’s responsible for walking through the chains and populating V$WAIT_CHAINS, doesn’t kick in after a few seconds of ongoing session waits and the V$WAIT_CHAINS view may be mostly empty – so we need something different for performance analysis.
Now, it is possible to pick one of the waiting sessions and use the blocking_session column to look up the blocking SID – and see what that session was doing and so on. I used to do this somewhat manually, but realized that a simple CONNECT BY loop with some ASH sample iteration trickery can easily give us complex wait chain signature & hierarchy information.
So, I am presenting to you my new ash_wait_chains.sql script! (Woo-hoo!). It’s actually completely experimental right now, I’m not fully sure whether it returns correct results and some of the planned syntax isn’t implemented yet :) EXPERIMENTAL, not even beta :) And no RAC nor DBA_HIST ASH support yet.
In the example below, I am running the script with parameter event2, which is just really a pseudocolumn on ASH view (look in the script). So it shows just the wait events of the sessions involved in a chain. You can use any ASH column there for extra information, like program, module, sql_opname etc:
SQL> @ash/ash_wait_chains event2 1=1 sysdate-1 sysdate -- ASH Wait Chain Signature script v0.1 EXPERIMENTAL by Tanel Poder ( http://blog.tanelpoder.com ) %This SECONDS WAIT_CHAIN ------ ---------- ------------------------------------------------------------------------------------------ 60% 77995 -> ON CPU 10% 12642 -> cell single block physical read 9% 11515 -> SQL*Net more data from client 2% 3081 -> log file parallel write 2% 3073 -> enq: HW - contention -> cell smart file creation 2% 2723 -> enq: HW - contention -> buffer busy waits -> cell smart file creation 2% 2098 -> cell smart table scan 1% 1817 -> db file parallel write 1% 1375 -> latch: redo allocation -> ON CPU 1% 1023 -> enq: TX - contention -> buffer busy waits -> cell smart file creation 1% 868 -> block change tracking buffer space 1% 780 -> enq: TX - contention -> buffer busy waits -> ASM file metadata operation 1% 773 -> latch: redo allocation 1% 698 -> enq: TX - contention -> buffer busy waits -> DFS lock handle 0% 529 -> enq: TX - contention -> buffer busy waits -> control file parallel write 0% 418 -> enq: HW - contention -> buffer busy waits -> DFS lock handle
What does the above output tell us? I have highlighted 3 rows above, let’s say that we want to get more insight about the enq: HW – contention wait event. The ash_wait_chains script breaks down the wait events by the “complex” wait chain signature (who’s waiting for whom), instead of just the single wait event name! You read the wait chain information from left to right. For example 2% of total response time (3073 seconds) in the analysed ASH dataset was spent by a session, who was waiting for a HW enqueue, which was held by another session, who itself happened to be waiting for the cell smart file creation wait event. The rightmost wait event is the “ultimate blocker”. Unfortunately ASH doesn’t capture idle sessions by default (an idle session may still be holding a lock and blocking others), so in the current version (0.1) of this script, the idle blockers are missing from the output. You can still manually detect a missing / idle session as we’ll see later on.
So, this script allows you to break down the ASH wait data not just by a single, “scalar” wait event, but all wait events (and other attributes) of a whole chain of waiting sessions – revealing hierarchy and dependency information in your bottleneck! Especially when looking into the chains involving various background processes, we gain interesting insight into the process flow in an Oracle instance.
As you can add any ASH column to the output (check the script, it’s not too long), let’s add the program info too, so we’d know what kind of sessions were involved in the complex waits (I have removed a bunch of less interesting lines from the output and have highlighted some interesting ones (again, read the chain of waiters from left to right):
SQL> @ash/ash_wait_chains program2||event2 1=1 sysdate-1 sysdate -- ASH Wait Chain Signature script v0.1 EXPERIMENTAL by Tanel Poder ( http://blog.tanelpoder.com ) %This SECONDS WAIT_CHAIN ------ ---------- ----------------------------------------------------------------------------------------------------------------------------------------------- 56% 73427 -> (JDBC Thin Client) ON CPU 9% 11513 -> (JDBC Thin Client) SQL*Net more data from client 9% 11402 -> (oracle@enkxndbnn.enkitec.com (Jnnn)) cell single block physical read 2% 3081 -> (LGWR) log file parallel write 2% 3073 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) cell smart file creation 2% 2300 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) buffer busy waits -> (JDBC Thin Client) cell smart file creation ... 1% 1356 -> (JDBC Thin Client) latch: redo allocation -> (JDBC Thin Client) ON CPU 1% 1199 -> (JDBC Thin Client) cell single block physical read 1% 1023 -> (JDBC Thin Client) enq: TX - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) cell smart file creation 1% 881 -> (CTWR) ON CPU 1% 858 -> (JDBC Thin Client) block change tracking buffer space 1% 780 -> (JDBC Thin Client) enq: TX - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) ASM file metadata operation 1% 766 -> (JDBC Thin Client) latch: redo allocation 1% 698 -> (JDBC Thin Client) enq: TX - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) DFS lock handle 0% 529 -> (JDBC Thin Client) enq: TX - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) control file parallel write 0% 423 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) cell smart file creation 0% 418 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) ASM file metadata operation 0% 418 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) DFS lock handle ... 0% 25 -> (JDBC Thin Client) gcs drm freeze in enter server mode -> (LMON) ges lms sync during dynamic remastering and reconfig -> (LMSn) ON CPU 0% 25 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) DFS lock handle -> (DBWn) db file parallel write ... 0% 18 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) CSS operation: action 0% 18 -> (LMON) control file sequential read 0% 17 -> (LMSn) ON CPU 0% 17 -> (JDBC Thin Client) buffer busy waits -> (Wnnn) cell smart file creation 0% 16 -> (JDBC Thin Client) enq: FB - contention -> (JDBC Thin Client) gcs drm freeze in enter server mode -> (LMON) ges lms sync during dynamic remastering and reconfig -> (LMSn) ON CPU ... 0% 3 -> (JDBC Thin Client) enq: HW - contention -> (JDBC Thin Client) buffer busy waits -> (JDBC Thin Client) CSS operation: action 0% 3 -> (JDBC Thin Client) buffer busy waits -> (JDBC Thin Client) latch: redo allocation -> (JDBC Thin Client) ON CPU ... 0% 1 -> (JDBC Thin Client) enq: TX - contention -> (JDBC Thin Client) enq: TX - contention -> (JDBC Thin Client) buffer busy waits -> (Wnnn) ASM file metadata operation 0% 1 -> (Wnnn) buffer busy waits -> (JDBC Thin Client) block change tracking buffer space ...
In a different example, when adding the sql_opname column (11.2+) we can also display the type of SQL command that ended up waiting (like UPDATE, DELETE, LOCK etc) which is useful for lock contention analysis:
SQL> @ash/ash_wait_chains sql_opname||':'||event2 1=1 sysdate-1/24/60 sysdate
-- Display ASH Wait Chain Signatures script v0.1 EXPERIMENTAL by Tanel Poder ( http://blog.tanelpoder.com )
%This SECONDS WAIT_CHAIN
------ ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
18% 45 -> LOCK TABLE:enq: TM - contention -> LOCK TABLE:enq: TM - contention
18% 45 -> LOCK TABLE:enq: TM - contention
17% 42 -> DELETE:enq: TM - contention -> LOCK TABLE:enq: TM - contention
10% 25 -> SELECT:direct path read
6% 14 -> TRUNCATE TABLE:enq: TM - contention
5% 13 -> SELECT:ON CPU
2% 5 -> LOCK TABLE:enq: TM - contention -> SELECT:ON CPU
2% 5 -> TRUNCATE TABLE:enq: TM - contention -> SELECT:ON CPU
2% 5 -> SELECT:db file scattered read
2% 5 -> DELETE:enq: TM - contention -> LOCK TABLE:enq: TM - contention -> SELECT:db file scattered read
2% 5 -> DELETE:enq: TM - contention -> LOCK TABLE:enq: TM - contention -> SELECT:ON CPU
2% 5 -> LOCK TABLE:enq: TM - contention -> LOCK TABLE:enq: TM - contention -> SELECT:db file scattered read
2% 5 -> TRUNCATE TABLE:enq: TM - contention -> SELECT:db file scattered read
2% 5 -> LOCK TABLE:enq: TM - contention -> SELECT:db file scattered read
2% 5 -> LOCK TABLE:enq: TM - contention -> LOCK TABLE:enq: TM - contention -> SELECT:ON CPU
2% 4 -> TRUNCATE TABLE:enq: TM - contention -> SELECT:direct path read
2% 4 -> LOCK TABLE:enq: TM - contention -> LOCK TABLE:enq: TM - contention -> SELECT:direct path read
2% 4 -> DELETE:enq: TM - contention -> LOCK TABLE:enq: TM - contention -> SELECT:direct path read
2% 4 -> LOCK TABLE:enq: TM - contention -> SELECT:direct path read
This is an experimental script and has some issues and shortcomings right now. For example, if the final blocking session itself was idle during the sampling, like the blue highlighted line above, then the blocking session is not shown there – as ASH doesn’t capture idle session samples. Also it doesn’t work for RAC right now (and this might be problematic as in RAC the blocking_session info for global wait events may not immediately be resolved either, unlike in the instance-local cases). And I will try to find better example cases illustrating the use of this technique (you’re welcome to send me your output from busy databases for analysis :)
I have written a Part 2 for this article too.
NB! If you want to move to the "New World" - offload your data and workloads to Hadoop, without having to re-write your existing applications - check out Gluent. We are making history! ;-)
参考至:http://blog.tanelpoder.com/2013/09/11/advanced-oracle-troubleshooting-guide-part-11-complex-wait-chain-signature-analysis-with-ash_wait_chains-sql/
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
### 高级Oracle故障排除技巧 #### 引言与作者简介 本次分享由Tanel Põder呈现,他是一位在Oracle领域拥有超过12年经验的专业人士。Tanel不仅是数据库管理员(DBA),还是一名研究者和顾问。他在Oracle内部机制、...
webMethods Advanced Troubleshooting Guide
Oracle Troubleshooting Guide 本文档是 Oracle 公司发布的技术文档,旨在帮助用户解决 Java 平台Standard Edition 中的常见问题。本指南涵盖了 Java 平台的各种故障排除方法,旨在提高用户的 Troubleshooting 能力...
Error Occurred – What to do? ................................................................................................ 1 Error Log ................................................................
在《Oracle TimesTen In-Memory Database Troubleshooting Guide 11g Release 2 (11.2.2)》中,用户可以找到关于解决TimesTen数据库问题的详尽指南。这份文档旨在帮助管理员和开发者诊断并解决在使用TimesTen时可能...
为了帮助开发者更好地使用 Java,Oracle 公司提供了 Troubleshooting Guide,旨在帮助开发者解决 Java 相关的问题。 Java Troubleshooting Guide 是一份详细的指南,涵盖了 Java 的各种问题和解决方法。该指南分为...
这个"PowerShell Troubleshooting Guide(pdf+epub+mobi+code_file).zip"压缩包包含了多种格式的资源,旨在帮助用户理解和解决在使用PowerShell时可能遇到的问题。 PDF、EPUB和MOBI格式的书籍提供了详细的PowerShell...
Citrix Troubleshooting Guide
### Oracle WebLogic Server 11g: Diagnostics and Troubleshooting #### 知识点一:Oracle WebLogic Server 11g 概述 - **版本介绍**:Oracle WebLogic Server 11g(版本号10.3.6)是Oracle公司推出的一款企业级...
10. **Oracle TimesTen Troubleshooting Procedures**: 最后,这份文档是解决TimesTen遇到问题的宝贵资源,包含了故障排查方法、常见错误的解决步骤和性能问题的诊断工具。 通过深入学习这些文档,开发者和数据库...
《Java SE 6与HotSpot VM故障排除指南》是Oracle公司于2008年11月发布的一份技术文档,旨在帮助开发人员、系统管理员和技术支持人员解决在使用Java Standard Edition 6 (Java SE 6) 和HotSpot虚拟机过程中遇到的各种...
红帽企业Linux故障排除指南 ,作者列出了大多数红帽企业版运行过程中出现的故障处理方式
ANSYS EKM Troubleshooting Guide
Oracle Solaris 9 平台笔记:Sun Quad FastEthernet 设备驱动程序 Oracle Solaris 9 平台笔记:The Sun Quad FastEthernet 设备驱动程序是 Sun Microsystems 公司为其客户提供的一份平台笔记,旨在帮助用户更好地...
文档信息和版权声明表明,这份指南由Oracle及其关联公司所有,并受版权法和知识产权法的保护。该文档和相关软件是在使用和披露限制下的许可协议下提供的。除非在您的许可协议中明确允许,否则您不能使用、复制、翻译...
Finding the slow part of an application is often the easy part of the battle. It’s finding a solution that’s difficult. Troubleshooting Oracle Performance helps by providing a systematic approach ...
Chapter 11: An Introduction to Python Chapter 12: Minecraft Pi Edition Part IV: Hardware Hacking Chapter 13: Learning to Hack Hardware Chapter 14: The GPIO Port Chapter 15: The Raspberry Pi Camera ...