`

Tuning ‘log file sync’ Event Waits

 
阅读更多

Tuning ‘log file sync’ Event Waits

Posted in: Technical Track

 

In this blog entry, I will discuss strategies and techniques to resolve ‘log file sync’ waits. This entry is intended to show an approach based upon scientific principles, not necessarily a step-by-step guide. Let’s understand how LGWR is inherent in implementing the commit mechanism first.

Commit Mechanism and LGWR Internals

At commit time, a process creates a redo record (containing commit opcodes) and copies that redo record into the log buffer. Then, that process signals LGWR to write the contents of log buffer. LGWR writes from the log buffer to the log file and signals user process back completing a commit. A commit is considered successful after the LGWR write is successful.

Of course, there are minor deviations from this general concept, such as latching, commits from a plsql block, or IMU-based commit generation, etc. But the general philosophy remains the same.

Signals, Semaphores and LGWR

The following section introduces the internal workings of commit and LGWR interation in Unix platforms. There are minor implementation differences between a few Unix flavours or platforms like NT/XP, for example the use of post-wait drivers instead of semaphores etc. This section will introduce, but not necessarily dive deep into, internals. I used truss to trace LGWR and user process. The command is:

truss -rall -wall -fall -vall -d -o /tmp/truss.log -p 22459

(A word of caution: don’t truss LGWR or any background process unless it is absolutely necessary. Doing so can cause performance issues, or worse, shutdown the database.)

 

Initially, LGWR is sleeping on semaphore, using the semtimedop or semop call.

22459/1: semtimedop(9, 0xFFFFFD7FFFDFE648, 1, 0xFFFFFD7FFFDFE488) EAGAIN
22459/1:         semnum=15    semop=-1    semflg=0
22459/1:         timeout: 2.060000000 sec

In the above call, 9 is the semaphore set id visible through the ipcs command and semnum=15 is the semaphore for the LGWR process in that set.

The next argument is a structure sembuf:

{ 
        unsigned short sem_num;  /* semaphore number */
        short          sem_op;   /* semaphore operation */
        short          sem_flg;  /* operation flags */
}

The third argument is number of semaphores.

When a session commits, a redo record is created and copied in to the log buffer. Then that process posts an LGWR semaphore using a semctl call if LGWR is not active already. Then, the process goes to sleep with a semtimedop call in its own semaphore.

The semaphore set id is 9, but sem_num is 118, which is for the user process I was tracing. The first semctl calls is posting LGWR. The process then sleeps on the semtimedop call.

27396/1: semctl(9, 15, SETVAL, 1)        = 0
...
27396/1:semtimedop(9, 0xFFFFFD7FFFDFC128, 1, 0xFFFFFD7FFFDFBF68)=0
27396/1:        semnum=118   semop=-1    semflg=0
27396/1:        timeout: 1.000000000 sec

The waiting log writer gets a 0 return code from semtimedop and writes the redo records to the current redo log file. kaio calls are kernalized asynchronous I/O calls on the Solaris platform.

22459/7: pwrite(262, "01 "9E0E i ?".., 1024, 1915904) = 1024
22459/9: pwrite(263, "01 "9E0E i ?".., 1024, 1915904) = 1024
22459/1: kaio(AIOWAIT, 0xFFFFFD7FFFDFE310)               = 1
22459/1:         timeout: 600.000000 sec
22459/9: kaio(AIONOTIFY, 0)                         = 0
22459/7: kaio(AIONOTIFY, 0)                         = 0

After successful completion of the write(s), LGWR posts a semaphore of the waiting process using the semctl command.

22459/1: semctl(9, 15, SETVAL, 1)                   = 0

The user process/session continues after receiving a return code from semtimedop call, reprinted below.

27396/1:semtimedop(9,0xFFFFFD7FFFDFC128,1,0xFFFFFD7FFFDFBF68)=0

So, What Exactly is a ‘log file sync’ wait?

A commit is not complete until LGWR writes the log buffers, including the commit redo records to the log files. In a nutshell, after posting LGWR to write, user or background processes wait for LGWR to signal back with a 1-second timeout. The User process charges this wait time as a ‘log file sync’ event.

In the prior section, ‘log file sync’ waits start at step 2 after the semctl call, and completes after step 5.

The Root Causes of ‘log file sync’ waits

The root causes of ‘log file sync’ essentially boil down to few scenarios. The following is not an exhaustive list, by any means.

  1. LGWR is unable to complete writes fast enough for one of the following reasons:
    1. Disk I/O performance to log files is not good enough. Even though LGWR can use asynchronous I/O, redo log files are opened with the DSYNC flag and buffers must be flushed to the disk (or at least, written to the disk array cache in the case of SAN) before LGWR can mark a commit as complete.
    2. LGWR is starving for CPU resource. If the server is very busy, LGWR can starve for CPU too. This will lead to slower response from LGWR, increasing ‘log file sync’ waits. After all, these system calls and I/O calls must use CPU. In this case, ‘log file sync’ is a secondary symptom and resolving the root cause for high CPU usage will reduce ‘log file sync’ waits.
    3. Due to memory starvation issues, LGWR can be paged out. This also can lead to slower response from LGWR.
    4. LGWR is unable to complete writes fast enough due to file system or uUnix buffer cache limitations.
  2. LGWR is unable to post the processes fast enough, due to excessive commits. It is quite possible that there is no starvation for CPU or memory, and that I/O performance is decent enough. Still, if there are excessive commits, LGWR has to perform many writes/semctl calls, and this can increase ‘log file sync’ waits. This can also result in sharp increase in redo wastage statistics.
  3. IMU undo/redo threads. With private strands, a process can generate few Megabytes of redo before committing. LGWR must write the generated redo so far, and processes must wait for ‘log file sync’ waits, even if the redo generated from other processes is small enough.
  4. LGWR is suffering from other database contention such as enqueue waits or latch contention. For example, I have seen LGWR freeze due to CF enqueue contention. This is a possible scenario, albeit an unlikely one.
  5. Various bugs. Oh yes, there are bugs introducing unnecessary ‘log file sync’ waits.

It is worthwhile to understand and identify the root cause, to and resolve it. Here are the steps and considerations.

1. First make sure the ‘log file sync’ event is indeed a major wait event. For example, in the statspack report for 60 minutes below, ‘log file sync’ is indeed an issue. Why? Statspack runs for 1800 seconds and there are 8 CPUs in the server, so there are roughly 14,400 CPU seconds available. There is just one database alone in this server, and so, the approximate CPU usage is 7034/14,400: 50%

But, 27021 seconds were spent waiting. On average, 27021/3600 = 7.5 processes were waiting for ‘log file sync’ event. So, this is a major bottleneck for application scalability.

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                       % Total
Event                        Waits          Time (s) Ela Time
--------------------------- ------------    -------- --------
log file sync                1,350,499      27,021    50.04
db file sequential read      1,299,154      13,633    25.25
CPU time                                     7,034    13.03
io done                      3,487,217       3,225     5.97
latch free                     115,471       1,325     2.45

2. Next, identify and break down LGWR wait events, and query wait events for LGWR. In this instance, the LGWR sid is 3 (and usually it is).

select sid, event, time_waited, time_waited_micro
from v$session_event where sid=3 order by 3
SQL> /

   SID EVENT                          TIME_WAITED TIME_WAITED_MICRO
------ ------------------------------ ----------- -----------------
..
     3 control file sequential read        237848  2378480750
     3 enqueue                             417032  4170323279
     3 control file parallel write         706539  7065393146
     3 log file parallel write             768628  7686282956
     3 io done                           40822748  4.0823E+11
     3 rdbms ipc message                208478598  2.0848E+12

When LGWR is waiting (using the semtimedop call) for posts from the user sessions, that wait time is accounted as an ‘rdbms ipc message’ event. Normally, this event can be ignored. The next highest waited event is the ‘io done’ event. After submitting async I/O requests, LGWR waits until the I/O calls complete, since LGWR writes are done as synchronous writes. (“asynchronous” and “synchronous” are not contradictory terms when comes to I/O. Google it — there is enormous information about this already.)

It is worth noting that v$session_event is a cumulative counter from the instance startup, and hence, can be misleading. The difference between two snapshots from this view, for the same session, can be quite useful.

Tanel Poder has written an excellent tool for this. Using that tool, we can print out a 1-second snapshot for LGWR session 3. From the table below, 813 milliseconds were spent waiting for ‘io done’ event in a 1-second interval. That’s 81%. (Some parts of the output have been removed to improve readability.)

------------------------------------------------------------------
SID, SNAPSHOT START  ,SEC, TYPE, STATISTIC            ,    DELTA,
------------------------------------------------------------------
3, 20080513 11:44:32,   1, STAT, messages sent        ,        9,
3, 20080513 11:44:32,   1, STAT, messages received    ,      153,
3, 20080513 11:44:32,   1, STAT, redo wastage         ,    39648,
3, 20080513 11:44:32,   1, STAT, redo writes          ,      152,
3, 20080513 11:44:32,   1, STAT, redo blocks written  ,     1892,
3, 20080513 11:44:32,   1, STAT, redo write time      ,       82,
3, 20080513 11:44:32,   1, WAIT, rdbms ipc message    ,   169504,
3, 20080513 11:44:32,   1, WAIT, io done              ,   813238,
3, 20080513 11:44:32,   1, WAIT, log file parallel write ,  5421,
3, 20080513 11:44:32,   1, WAIT, LGWR wait for redo copy ,     1,

3. Next, confirm that LGWR is waiting for that event by SQL*Trace also. Tracing LGWR can deteriorate performance further. So, careful consideration must be given before turning sqltrace on LGWR. Packages such as dbms_system/dbms_support or oradebug can be used to turn on a 10046 event at level 12.

A few trace lines are shown below. In this specific case, LGWR is waiting for I/O events. The output below shows that LGWR submitted two write calls with 16 redo blocks. Waits for I/O completion events are counted as ‘io done’ events. Between two calls, LGWR waited for 1600 microseconds or 1.6ms. The performance of write itself is not entirely bad.

  WAIT #0: nam='rdbms ipc message' ela= 7604 p1=223 p2=0 p3=0
  WAIT #0: nam='log file parallel write' ela= 35 p1=2 p2=16 p3=2
  WAIT #0: nam='io done' ela= 0 p1=0 p2=0 p3=0
  WAIT #0: nam='io done' ela= 639 p1=0 p2=0 p3=0
  WAIT #0: nam='io done' ela= 0 p1=0 p2=0 p3=0
  WAIT #0: nam='io done' ela= 605 p1=0 p2=0 p3=0
  WAIT #0: nam='io done' ela= 1 p1=0 p2=0 p3=0
  WAIT #0: nam='io done' ela= 366 p1=0 p2=0 p3=0

4. Let’s look at few other statistics also. From the statspack report, and with one column removed to improve readability:

Statistic                             Total     per Second
------------------------------- ----------- --------------
redo blocks written                 230,881        2,998.5
redo buffer allocation retries            0            0.0
redo entries                        285,803        3,711.7
redo log space requests                   0            0.0
redo log space wait time                  0            0.0
redo ordering marks                       0            0.0
redo size                       109,737,304    1,425,159.8
redo synch time                      40,744          529.1
redo synch writes                    38,141          495.3
redo wastage                      5,159,124       67,001.6
redo write time                       6,226           80.9
redo writer latching time                 4            0.1
user calls                          433,717        5,632.7
user commits                         38,135          495.3
user rollbacks                            1            0.0
workarea executions - multipass           0            0.0

From the above statistics, we can see that

  1. 5632 user calls / second were made and 495 commits
    were executed per second, on average.
  2. 3000 redo blocks (of size 512 bytes in solaris) are written by LGWR, approximately 1.5MB/sec.
  3. 1.4MB/sec redo was generated, which is approximately 16Mbps.
  4. For 3000 redo blocks, 38,155 commits were created.

Essentially, the redo size is high, but not abnormal. But, 500 commits per second is on the higher side.

5. Knowledge about your application will be useful here. If commit frequency is higher, trace a few sessions and understand why there are so many commits. For example, in the following trace file, there is a commit after every SQL statement, and that can be the root cause for these waits.

The XCTEND call below with rlbk=0 and rd_only=0 is a commit.

WAIT #120: nam='SQL*Net message from client' ela= 291 p1=1952673792 p2=1 p3=0
XCTEND rlbk=0, rd_only=0
WAIT #0: nam='log file sync' ela= 1331 p1=31455 p2=0 p3=0
WAIT #0: nam='SQL*Net message to client' ela= 1 p1=1952673792 p2=1 p3=0

6. Next, examine the commit frequency. If it is higher, LGWR could be spending time signalling user process. Just as with any other process, the Unix scheduler can kick LGWR off the CPU, and this can cause foreground processes to wait for ‘log file sync’ event. In Solaris, prstat microstat accounting (-mL) is quite useful in breaking down how much time is spent by LGWR waiting for CPU.

7. Although uncommon, there are few bugs causing unnecessary ‘log file sync’ waits, signaling incorrect processes etc.

8. In a few scenarios, ‘log file sync’ waits are intermittent and hard to catch. Statistics might need to be captured with more granularity to understand the issue. In the example below, I had instance freeze intermittently, and I wrote a small script to gather statistics from v$sysstat every 20 seconds and spool to a file. Also, we were collecting iostat using the Oracle-supplied tool, LTOM.

23-MAR-2007-04:29:43:Redo blocks written:1564176614:Delta: 8253
23-MAR-2007-04:30:14:Redo blocks written:1564176614:Delta:    0
23-MAR-2007-04:30:44:Redo blocks written:1564205771:Delta:29157

Between 4:29:43 and 4:30:14, redo blocks-written statistics did not change, since delta is 0. Meaning, LGWR did not write any redo blocks. But, ‘redo size’ statistics during the same duration was at ~7MB in that instance. The number of processes waiting for the ‘log file sync’ event increased to 100. So, we know that freeze occurred because LGWR wasn’t able to write any blocks during this time-frame.

In a RAC cluster, this problem is magnified, since both CR- and CUR-mode buffers need the log flush to complete before these blocks can be transferred to another instance’s cache.

Looking closely at I/O statistics, we can see that average service time was quite high for a few devices during that time period (we mapped these devices back to log file systems later). This lead to more a granular time-frame, and finally, the issue proved to be a hardware switch intermittently freezing.

extended device statistics

r/s  w/s kr/s kw/s wait actv wsvc_t asvc_t %w  %b device
0.0  0.0  0.0  0.0  0.0  1.0    0.0    0.0  0 100 d6
0.0  0.0  0.0  0.0  0.0  1.0    0.0    0.0  0 100 d2
0.0  0.0  0.0  0.0  0.0  1.0    0.0    0.0  0 100 d1
0.0  0.0  0.0  0.0  0.0  9.0    0.0    0.0  0 100 d0
0.0  0.0  0.0  0.0  0.0  3.0    0.0    0.0  0 100 d13
0.0  2.0  0.0 24.0  0.0  2.0    0.0 1000.7  0 100 d12
0.0  0.0  0.0  0.0  0.0  1.0    0.0    0.0  0 100 d11

9. Unix tools such as truss, tusc, and strace can be used to debug those scenarios if the above techniques are not sufficient. But, tools such as truss and strace should be used as a last resort. In Solaris 10, dtrace can be used to debug I/O or processor-specific issues. dtrace is safer by design. Brendan Gregg has a dtrace tool kit and wealth of information here.

Guidelines for Resolving a Few Root Causes

Finding and understanding the root cause is essential to resolving a performance issue. Some final observations:

  • If I/O bandwith is an issue, then doing anything other than improving I/O bandwidth is not useful. Switching to file systems providing better write throughput is one option. RAW devices are another option. Reducing the number of log file members in a group is another, as it reduces the number of write calls. But, this option comes with a cost.
  • If CPU starvation is an issue, then reducing CPU starvation is the correct step to resolve it. Increasing the priority of LGWR is a work around.
  • If commit rate is higher, then decreasing commits is correct step but, in a few cases, if that is not possible, increasing the priority of LGWR (using nice) or increasing the priority class of LGWR to RT might provide some relief.
  • Solid State Disk devices also can be used if the redo size is extreme. That being thecase, it is also preferable to decrease redo size.
  • If excessive redo size is the root cause, it can be reduced using various techniques. More information can be found here (PDF).

In summary, finding and resolving the root cause for a log file sync event will improve application response time.

 

 

参考至:https://www.pythian.com/blog/tuning-log-file-sync-event-waits/

 

如有错误,欢迎指正

 

邮箱:czmcj@163.com

 

分享到:
评论

相关推荐

    SQL Tuning - File IO Performance

    本文将深入探讨“SQL Tuning - File IO Performance”这一主题,特别是从给定的文件信息中提取出的重要知识点进行详细阐述。这些知识点涵盖了SQL区域(SQL Area)与共享池(Shared Pool)的关系、图书馆缓存...

    SQL Tuning Advisor使用总结

    ### SQL Tuning Advisor 使用总结 #### 一、SQL Tuning Advisor 概述 SQL Tuning Advisor(STA)是Oracle数据库中的一个强大工具,用于优化SQL语句的性能。通过自动化的诊断过程,STA能够帮助数据库管理员(DBA)...

    camera tuning file

    CMOS摄像头模组(CMOS Camera Module)是集成在手机中的一个重要组件,它负责捕捉图像并将其转换为电子信号,最终形成用户所见的数字图片。随着智能手机市场的飞速发展,摄像头模组的技术也随之进步。...

    camera tuning基础知识

    ### Camera Tuning 基础知识详解 #### 一、前言 随着智能手机技术的不断发展,摄像头已成为衡量手机性能的重要指标之一。为了提供更优质的拍摄体验,摄像头传感器(Camera Sensor)的效果调整至关重要。这一过程...

    Prentice.Hall.Performance.Tuning.for.Linux.Servers

    memory, I/O, file systems, and networking Tune web, file, database, and application servers running commercial workloads Predict the impact of changes in tuning parameters or configurations Tune ...

    Audio_Tuning_Tool_exe_v2.2016.rar

    《MTK平台音频调校工具Audio_Tuning_Tool_exe_v2.2016详解》 在移动设备领域,音质的优化与调校是提升用户体验的关键环节之一。针对MTK(MediaTek)平台,开发者们推出了Audio_Tuning_Tool_exe_v2.2016这一专业工具...

    Db10g SQL Tuning Vol-I

    根据提供的文件信息,我们可以推断出这是一份关于Oracle Database 10g SQL Tuning的培训资料。下面将从以下几个方面来深入解析这份文档中所包含的重要知识点。 ### 一、Oracle Database 10g SQL Tuning 的概念 ...

    SQL Tuning使用指南

    ### SQL Tuning 使用指南知识点详解 #### 一、SQL Tuning 概述 **SQL Tuning** 是一种专为 **Oracle 数据库** 设计的强大工具,用于分析、测试和优化 SQL 语句,旨在提高应用系统的整体性能。该工具覆盖了数据库...

    英特尔超频软件(Intel Extreme Tuning Utility) v6.4.1.25.zip

    英特尔超频软件(Intel Extreme Tuning Utility)顾名思义是Intel官方推出的一款超频工具,能够对处理器、内存和系统时钟进行有效调整,并监控温度、电压和风扇速度。Intel对于玩家超频的态度一向是不支持也不反对,...

    oracle sql_tuning

    SQL_Tuning是Oracle官方提供的一个专题,旨在帮助管理员和开发人员优化SQL查询,从而提高数据库的效率和响应时间。以下是一些关于"Oracle SQL Tuning"的重要知识点: 1. **SQL执行计划**:SQL执行计划是Oracle解析...

    Audio-Tuning-Tool-exe-v2.2052.rar

    《MTK音频调试工具——Audio-Tuning-Tool-exe-v2.2052详解》 MTK(MediaTek)音频调试工具是一款专为MediaTek芯片平台设计的专业软件,用于帮助开发者、工程师以及爱好者对手机或其他设备的音频系统进行调试和优化...

    PDF Linux Performance and Tuning Guidelines 4285

    Linux Performance and Tuning Guidelines 4285 PDF Linux® is an open source operating system developed by people from all over the world. The source code is freely available and can be used under the ...

    eMMC/SSD File System Tuning

    2. BTRFS文件系统:BTRFS(B-Tree File System)是一种日志结构的文件系统,它支持高级特性,如快照、复制、在线碎片整理和文件系统级别上的压缩。BTRFS具有较高的可扩展性,适合大容量存储设备的性能调优。 3. F2...

    SQL Tuning sqlServer 数据库

    以上内容只是《SQL Tuning in sqlServer 数据库》教程的部分要点,通过深入学习,你可以掌握一系列SQL Server性能优化的技巧和最佳实践,从而更好地管理和优化你的数据库系统。CHM格式的文档通常包含丰富的超链接和...

    Tuning Red Hat Enterprise Linux

    Tuning Red Hat Enterprise Linux linux服务器调优

    Intel Extreme Tuning Utility(Intel官方超频工具)

    Intel Extreme Tuning Utility(Intel官方超频工具),简单直观的界面

    Oracle Database 12c Performance Tuning Recipes

    Oracle Database 12c Performance Tuning Recipes is a ready reference for database administrators in need of immediate help with performance issues relating to Oracle Database. The book takes an example...

    Clear type Tuning中文控制面板

    "Clear Type Tuning 中文控制面板"是一款专为优化Windows操作系统中文显示效果设计的工具。它主要用于提升显示器上文本的清晰度,使字体在屏幕上看起来更加锐利、易读,尤其是在高分辨率或大尺寸显示器上效果更为...

    2 day+performance tuning guide

    ### 标题解读:“2 Day + Performance Tuning Guide” 该标题表明这是一份针对Oracle数据库性能调优的指南,旨在帮助用户在两天内掌握并实施有效的性能调优策略。这份指南适用于Oracle Database 10g Release 2 ...

Global site tag (gtag.js) - Google Analytics