`
quanminchaoren
  • 浏览: 928192 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Android systrace

阅读更多
Understanding Systrace
Caution: If you've never used systrace before, we strongly recommend reading the systrace overview before continuing.

systrace is the primary tool for analyzing Android device performance. However, it's really a wrapper around other tools: It is the host-side wrapper around atrace, the device-side executable that controls userspace tracing and sets up ftrace, the primary tracing mechanism in the Linux kernel. systrace uses atrace to enable tracing, then reads the ftrace buffer and wraps it all in a self-contained HTML viewer. (While newer kernels have support for Linux Enhanced Berkeley Packet Filter (eBPF), the following documentation pertains to the 3.18 kernel (no eFPF) as that's what was used on the Pixel/Pixel XL.)

systrace is owned by the Google Android and Google Chrome teams and is developed in the open as part of the Catapult project. In addition to systrace, Catapult includes other useful utilities. For example, ftrace has more features than can be directly enabled by systrace or atrace and contains some advanced functionality that is critical to debugging performance problems. (These features require root access and often a new kernel.)

Running systrace
When debugging jitter on Pixel/Pixel XL, start with the following command:

./systrace.py sched freq idle am wm gfx view sync binder_driver irq workq input -b 96000
When combined with the additional tracepoints required for GPU and display pipeline activity, this gives you the ability to trace from user input to frame displayed on screen. Set the buffer size to something large to avoid losing events (which usually manifests as some CPUs containing no events after some point in the trace).

When going through systrace, keep in mind that every event is triggered by something on the CPU.

Note: Hardware interrupts are not controlled by the CPU and do trigger things in ftrace, but the actual commit to the trace log is done by the interrupt handler, which could have been delayed if your interrupt arrived while (for example) some other bad driver had interrupts disabled. The critical element is the CPU.

Because systrace is built on top of ftrace and ftrace runs on the CPU, something on the CPU must write the ftrace buffer that logs hardware changes. This means that if you're curious about why a display fence changed state, you can see what was running on the CPU at the exact point of its transition (something that is running on the CPU triggered that change in the log). This concept is the foundation of analyzing performance using systrace.

Example: Working frame
This example describes a systrace for a normal UI pipeline. To follow along with the example, download the zip file of traces (which also includes other traces referred to in this section), upzip the file, and open the systrace_tutorial.html file in your browser. Be warned this systrace is a large file; unless you use systrace in your day-to-day work, this is likely a much bigger trace with much more information than you've ever seen in a single trace before.

For a consistent, periodic workload such as TouchLatency, the UI pipeline contains the following:

EventThread in SurfaceFlinger wakes the app UI thread, signaling it's time to render a new frame.
App renders frame in UI thread, RenderThread, and hwuiTasks, using CPU and GPU resources. This is the bulk of the capacity spent for UI.
App sends rendered frame to SurfaceFlinger via binder and goes to sleep.
A second EventThread in SurfaceFlinger wakes SurfaceFlinger to trigger composition and display output. If SurfaceFlinger determines there is no work to be done, it goes back to sleep.
SurfaceFlinger handles composition via HWC/HWC2 or GL. HWC/HWC2 composition is faster and lower power but has limitations depending on the SOC. This usually takes ~4-6ms, but can overlap with step 2 because Android applications are always triple-buffered. (While applications are always triple-buffered, there may only be one pending frame waiting in SurfaceFlinger, which makes it appear identical to double-buffering.)
SurfaceFlinger dispatches final output to display via vendor driver and goes back to sleep, waiting for EventThread wakeup.
Let's walk through the frame beginning at 15409ms:


Figure 1. Normal UI pipeline, EventThread running.
Figure 1 is a normal frame surrounded by normal frames, so it's a good starting point for understanding how the UI pipeline works. The UI thread row for TouchLatency includes different colors at different times. Bars denote different states for the thread:

Gray. Sleeping.
Blue. Runnable (it could run, but the scheduler hasn't picked it to run yet).
Green. Actively running (the scheduler thinks it's running).
Note: Interrupt handlers aren't shown in the normal per-CPU timeline, so it's possible that you're actually running interrupt handlers or softirqs during some portion of a thread's runtime. Check the irq section of the trace (under process 0) to confirm whether an interrupt is running instead of a standard thread.

Red. Uninterruptible sleep (generally sleeping on a lock in the kernel). Can be indicative of I/O load. Extremely useful for debugging performance issues.
Orange. Uninterruptible sleep due to I/O load.
To view the reason for uninterruptible sleep (available from the sched_blocked_reason tracepoint), select the red uninterruptible sleep slice.

While EventThread is running, the UI thread for TouchLatency becomes runnable. To see what woke it, click the blue section:


Figure 2. UI thread for TouchLatency.
Figure 2 shows the TouchLatency UI thread was woken by tid 6843, which corresponds to EventThread. The UI thread wakes:


Figure 3. UI thread wakes, renders a frame, and enqueues it for SurfaceFlinger to consume.
If the binder_driver tag is enabled in a trace, you can select a binder transaction to view information on all of the processes involved in that transaction:


Figure 4. Binder transaction.
Figure 4 shows that, at 15,423.65ms, Binder:6832_1 in SurfaceFlinger becomes runnable because of tid 9579, which is TouchLatency's RenderThread. You can also see queueBuffer on both sides of the binder transaction.

During the queueBuffer on the SurfaceFlinger side, the number of pending frames from TouchLatency goes from 1 to 2:


Figure 5. Pending frames goes from 1 to 2.
Figure 5 shows triple-buffering, where there are two completed frames and the app will soon start rendering a third. This is because we've already dropped some frames, so the app keeps two pending frames instead of one to try to avoid further dropped frames.

Soon after, SurfaceFlinger's main thread is woken by a second EventThread so it can output the older pending frame to the display:


Figure 6. SurfaceFlinger's main thread is woken by a second EventThread.
SurfaceFlinger first latches the older pending buffer, which causes the pending buffer count to decrease from 2 to 1:


Figure 7. SurfaceFlinger first latches the older pending buffer.
After latching the buffer, SurfaceFlinger sets up composition and submits the final frame to the display. (Some of these sections are enabled as part of the mdss tracepoint, so they may not be there on your SOC.)


Figure 8. SurfaceFlinger sets up composition and submits the final frame.
Next, mdss_fb0 wakes on CPU 0. mdss_fb0 is the display pipeline's kernel thread for outputting a rendered frame to the display. We can see mdss_fb0 as its own row in the trace (scroll down to view).


Figure 9. mdss_fb0 wakes on CPU 0.
mdss_fb0 wakes up, runs for a bit, enters uninterruptible sleep, then wakes again.

Note: From this point forward, the trace is more complicated as the final work is split between mdss_fb0, interrupts, and workqueue functions. If you need that level of detail, refer to the exact characteristics of the driver stack for your SOC (as what happens on the Pixel XL might not be useful to you).

Example: Non-working frame
This example describes a systrace used to debug Pixel/Pixel XL jitter. To follow along with the example, download the zip file of traces (which also includes other traces referred to in this section), upzip the file, and open the systrace_tutorial.html file in your browser.

When you first open the systrace, you'll see something like this:


Figure 10. TouchLatency running on Pixel XL (most options enabled, including mdss and kgsl tracepoints).
When looking for jank, check the FrameMissed row under SurfaceFlinger. FrameMissed is a quality-of-life improvement provided by Hardware Composer 2 (HWC2). As of December 2016, HWC2 is used only on Pixel/Pixel XL; when viewing systrace for other devices, the FrameMissed row may not be present. In either case, FrameMissed is correlated with SurfaceFlinger missing one of its extremely-regular runtimes and an unchanged pending-buffer count for the app (com.prefabulated.touchlatency) at a vsync:


Figure 11. FrameMissed correlation with SurfaceFlinger.
Figure 11 shows a missed frame at 15598.29ms. SurfaceFlinger woke briefly at the vsync interval and went back to sleep without doing any work, which means SurfaceFlinger determined it was not worth trying to send a frame to the display again. Why?

To understand how the pipeline broke down for this frame, first review the working frame example above to see how a normal UI pipeline appears in systrace. When ready, return to the missed frame and work backwards. Notice that SurfaceFlinger wakes and immediately goes to sleep. When viewing the number of pending frames from TouchLatency, there are two frames (a good clue to help figure out what's going on).


Figure 12. SurfaceFlinger wakes and immediately goes to sleep.
Because we have frames in SurfaceFlinger, it's not an app issue. In addition, SurfaceFlinger is waking at the correct time, so it's not a SurfaceFlinger issue. If SurfaceFlinger and the app are both looking normal, it's probably a driver issue.

Because the mdss and sync tracepoints are enabled, we can get information about the fences (shared between the display driver and SurfaceFlinger) that control when frames are actually submitted to the display. The fences we care about are listed under mdss_fb0_retire, which denotes when a frame is actually on the display. These fences are provided as part of the sync trace category. Which fences correspond to particular events in SurfaceFlinger depends on your SOC and driver stack, so work with your SOC vendor to understand the meaning of fence categories in your traces.


Figure 13. mdss_fb0_retire fences.
Figure 13 shows a frame that was displayed for 33ms, not 16.7ms as expected. Halfway through that slice, that frame should have been replaced by a new one but wasn't. View the previous frame and look for anything interesting:


Figure 14. Frame previous to busted frame.
Figure 14 shows 14.482ms a frame. The busted two-frame segment was 33.6ms, which is roughly what we would expect for two frames (we render at 60Hz, 16.7ms a frame, which is close). But 14.482ms is not anywhere close to 16.7ms, which suggests that something is very wrong with the display pipe.

Investigate exactly where that fence ends to determine what controls it:


Figure 15. Investigate fence end.
A workqueue contains a __vsync_retire_work_handler that is running when the fence changes. Looking through the kernel source, you can see it's part of the display driver. It definitely appears to be on the critical path for the display pipeline, so it must run as quickly as possible. It's runnable for 70us or so (not a long scheduling delay), but it's a workqueue and might not get scheduled accurately.

Check the previous frame to determine if that contributed; sometimes jitter can add up over time and eventually cause us to miss a deadline.


Figure 16. Previous frame.
The runnable line on the kworker isn't visible because the viewer turns it white when it's selected, but the statistics tell the story: 2.3ms of scheduler delay for part of the display pipeline critical path is bad. Before we do anything else, we should fix that by moving this part of the display pipeline critical path from a workqueue (which runs as a SCHED_OTHER CFS thread) to a dedicated SCHED_FIFO kthread. This function needs timing guarantees that workqueues can't (and aren't intended to) provide.

Is this the reason for the jank? It's hard to say conclusively. Outside of easy-to-diagnose cases such as kernel lock contention causing display-critical threads to sleep, traces usually won't tell you directly what the problem is. Could this jitter have been the cause of the dropped frame? Absolutely. The fence times should be 16.7ms, but they aren't close to that at all in the frames leading up to the dropped frame. (There's a 19ms fence followed by a 14ms fence.) Given how tightly coupled the display pipeline is, it's entirely possible the jitter around fence timings resulted in an eventual dropped frame.

In this example, the solution involved converting __vsync_retire_work_handler from a workqueue to a dedicated kthread. This resulted in noticeable jitter improvements and reduced jank in the bouncing ball test. Subsequent traces show fence timings that hover very close to 16.7ms.

https://source.android.com/devices/tech/debug/systrace
分享到:
评论

相关推荐

    Android systrace运行需要的资源

    在Android开发过程中,性能优化是不可或缺的一环,而Systrace工具正是Android系统提供的一款强大的性能分析工具。它能够帮助开发者深入系统内部,追踪并可视化应用程序和系统的执行流程,从而找出性能瓶颈。本文将...

    Android Systrace 博客资源

    Android Systrace 博客资源

    android systrace support python3

    旧版的systrace tool 支持python2, 这里改为支持python3, 在mac上已经测过可以用。例如: python3 systrace.py --time=10 -o trace.html gfx input view hal res sched freq wm am

    Android SDK + Systrace

    Systrace则是Android系统性能分析的重要工具,它可以帮助开发者深入了解应用程序的执行流程,找出性能瓶颈,优化系统资源的使用。下面将详细阐述这两个工具的主要功能和使用方法。 **Android SDK** Android SDK...

    获取Android systrace文件的工具

    在Android开发过程中,性能优化是一项至关重要的任务,而`systrace`工具是开发者们进行系统级性能分析的得力助手。它能够帮助我们深入了解应用的运行情况,识别出CPU、GPU、IO等子系统的瓶颈,从而有针对性地优化...

    android 性能优化-Systrace分析UI性能demo

    Systrace工具是Android提供的一款强大的系统级追踪工具,可以帮助开发者深入理解应用的运行机制,特别是对于UI性能分析,它能够提供详尽的数据和报告。下面将详细探讨如何使用Systrace进行UI性能分析。 首先,了解...

    platform-tools_r31.0.3-windows-systrace.rar

    平台工具_r31.0.3 Windows版 Systrace压缩包包含了Android SDK中的一个关键组件——Systrace工具。Systrace是Android开发者用于性能分析和调试系统级操作的重要工具,它能帮助开发者深入理解应用在运行时的系统级...

    python 用到的systrace.py

    Python中的`systrace.py`是一个非常有用的工具,主要用于Android系统的性能分析和追踪。它能够帮助开发者深入了解应用程序在运行时的系统级性能,包括CPU、GPU、内存、输入事件等多方面的信息。在这个主题中,我们将...

    windows下抓android 4.4 systrace

    首先,我们来梳理一下在Windows环境下抓取Android 4.4设备的systrace所需要了解的知识点。 1. 了解systrace: systrace是Android提供的一个用于分析系统运行时性能的工具,它能够跟踪系统的I/O操作、内核调度、CPU...

    systrace-systrace

    **Android systrace工具详解** 在Android系统中,systrace是一个强大的性能分析工具,它允许开发者深入了解系统级的性能状况,包括CPU、GPU、I/O等子系统的活动。通过收集和分析这些数据,开发者可以有效地优化应用...

    Platform-Tools 中的Systrace 工具

    在Android开发和性能优化的过程中,`Systrace`是一个不可或缺的工具。它属于Android的`Platform-Tools`组件,用于系统级的性能分析和追踪,帮助开发者深入理解应用在运行时的系统资源使用情况,定位性能瓶颈。本文将...

    Android SDK platform-tools含Systrace.py

    **Android SDK平台工具中的Systrace.py** 在Android开发中,性能优化是至关重要的环节,而`Systrace.py`是Android SDK `platform-tools`组件中的一个强大工具,专门用于性能分析和系统追踪。它可以帮助开发者深入...

    android 资料

    8. **Android Systrace**:系统级的性能分析工具,可追踪系统服务和应用的执行流程。 通过这些工具,开发者可以高效地调试代码、定位问题、优化性能并确保代码质量。 至于“androidDemo”,这可能是包含了一些示例...

    Android SDK Platform-Tools-r30.0.0-linux(含Systrace工具)

    这是Android SDK Platform-Tools-Linux 的30.0.0版本,它包含了systrace等关键功能,适用于Android开发者在开发和调试应用程序时使用的工具集。 内容概要: Android SDK Platform-Tools是一组用于与Android设备通信...

    systrace:在Android内核源代码中跟踪系统调用

    在Android内核源代码中分析系统调用。 这是Amrita Vishwa Vidyapeetham完成的论文工作的一部分。 该脚本以一个参数执行。 该参数对应于kerenl源的路径。 该脚本要求Android源代码的根目录中存在pattern.txt。 该...

    cpu占用分析工具,systrace

    "systrace"就是一款强大的CPU占用分析工具,专为开发者设计,用于深入理解Android系统的运行情况,帮助找出性能瓶颈,提升应用程序的运行效率。本文将详细介绍systrace的工作原理、功能及其使用方法。 **一、什么是...

    systrace.zip

    《Android Q中的Systrace工具详解》 在深入探讨Android Q中的Systrace工具之前,首先需要理解什么是Systrace。Systrace是Android系统提供的一款强大的性能分析工具,它能够帮助开发者对Android系统的各种组件进行...

    抓取systrace脚本.rar

    在Android系统中,Systrace是一个强大的性能分析工具,它允许开发者深入了解系统级的执行细节,包括CPU、GPU、输入事件、网络等各个层面的活动。"抓取systrace脚本.rar"这个压缩包文件显然包含了用于自动或批量抓取...

    Mstar安卓方案常用修改和操作方法

    4. **性能分析**:借助Android Systrace或Traceview,分析系统和应用的性能瓶颈。 五、外设驱动开发 1. **GPIO控制**:编写内核驱动,控制GPIO口,实现与外部设备的交互。 2. **I2C/SPI通信**:开发对应的I2C或SPI...

    systrace性能分析使用详解

    Systrace作为一款强大的性能分析工具,在Android性能优化领域占据着举足轻重的地位。它不仅可以帮助开发者捕捉到系统层面的关键行为,还能深入探究应用程序的具体表现,进而识别出可能导致性能瓶颈的因素。 ##### *...

Global site tag (gtag.js) - Google Analytics