Android 4.4 Graphic Architecture

zangcf · 发表于 2016-4-24 11:56:17

Graphic Architecture本篇文章是基于谷歌有关Graphic的一篇概览文章的翻译：http://source.android.com/devices/graphics/architecture.html
大量文字以及术语的理解基于自身的理解，可能并不准确。文中有部分英文原文我也不能准确理解，对于这种语句，我在翻译的语句后加了（？）符号

This document describes the essential elements of Android’s “system-level” graphics architecture, and how it is used by the application framework and multimedia system. The focus is on how buffers of graphical data move through the system. If you’ve ever wondered why SurfaceView and TextureView behave the way they do, or how Surface and EGLSurface interact, you’ve come to the right place.

这篇文档描述了android系统的子模块Graphic的总体架构，以及APP Framework层和多媒体系统如何使用Graphic模块的过程。这篇文章的重点在于讲述Graphic的buffer数据如何在系统内部传输的。如果你曾经对SurfaceView和TextureView工作方式表示好奇，如果你希望了解Surface和EGLSurface的交互方式，那么朋友，你来对地方了。

Some familiarity with Android devices and application development is assumed. You don’t need detailed knowledge of the app framework, and very few API calls will be mentioned, but the material herein doesn’t overlap much with other public documentation. The goal here is to provide a sense for the significant events involved in rendering a frame for output, so that you can make informed choices when designing an application. To achieve this, we work from the bottom up, describing how the UI classes work rather than how they can be used.

阅读这篇文章前，我们假设你已经对android设备和应用开发有了一定的了解。你不需要了解app framework层的大量知识，文中会涉及少量api，但是所涉及的材料并不会跟其它文档有很大重叠。这篇文章重点讲解在一帧的渲染过程中的重要步骤，目的在于使你在开发应用程序时做出更明智的选择。为实现这个目标，我们将自下向上的讲解相关的UI类是如何工作的，至于如何使用这些类，则不在我们的讲解范围内。

We start with an explanation of Android’s graphics buffers, describe the composition and display mechanism, and then proceed to the higher-level mechanisms that supply the compositor with data.

我们从解释android Graphic buffer开始讲起，描述了buffer合成和显示的原理，然后，我们将在更高的层面讲解这些数据合成的原理。

This document is chiefly concerned with the system as it exists in Android 4.4 (“KitKat”). Earlier versions of the system worked differently, and future versions will likely be different as well. Version-specific features are called out in a few places.

这篇文章主要是基于android KK的，早期系统和后面的系统在一些细节方面会有一些不同。

At various points I will refer to source code from the AOSP sources or from Grafika. Grafika is a Google open-source project for testing; it can be found at https://github.com/google/grafika. It’s more “quick hack” than solid example code, but it will suffice.

zangcf · 发表于 2016-4-24 12:05:45

To understand how Android’s graphics system works, we have to start behind the scenes. At the heart of everything graphical in Android is a class called BufferQueue. Its role is simple enough: connect something that generates buffers of graphical data (the “producer”) to something that accepts the data for display or further processing (the “consumer”). The producer and consumer can live in different processes. Nearly everything that moves buffers of graphical data through the system relies on BufferQueue.

我们将从具体的场景来理解android Graphic 系统的运作。整个绘制系统的核心是一个叫做BufferQueue的类。它的作用其实很简单：将一些可以生产绘制数据（buffers of graphical data）的模块（producer）和一些将绘制数据显示出来或做进一步处理的模块（consumer）相连。生产者和消费者可以存在于不同的进程内。几乎所有和buffers of graphical data移动的过程都依赖与BufferQueue类。

The basic usage is straightforward. The producer requests a free buffer (dequeueBuffer()), specifying a set of characteristics including width, height, pixel format, and usage flags. The producer populates the buffer and returns it to the queue (queueBuffer()). Sometime later, the consumer acquires the buffer (acquireBuffer()) and makes use of the buffer contents. When the consumer is done, it returns the buffer to the queue (releaseBuffer()).

这个类基本的用法很简单。生产者申请一块空闲的buffer（dequeueBuffer()），在申请时指定宽度，高度，像素的格式，以及使用的用途的一系列参数。生产者填充缓冲区后，将它送还给队列（queueBuffer()）。之后，消费者申请buffer (acquireBuffer())，然后使用对应的buffer数据。当消费者使用完成后，它将buffer返回给队列（releaseBuffer()）。

The basic usage is straightforward. The producer requests a free buffer (dequeueBuffer()), specifying a set of characteristics including width, height, pixel format, and usage flags. The producer populates the buffer and returns it to the queue (queueBuffer()). Sometime later, the consumer acquires the buffer (acquireBuffer()) and makes use of the buffer contents. When the consumer is done, it returns the buffer to the queue (releaseBuffer()).

这个类基本的用法很简单。生产者申请一块空闲的buffer（dequeueBuffer()），在申请时指定宽度，高度，像素的格式，以及使用的用途的一系列参数。生产者填充缓冲区后，将它送还给队列（queueBuffer()）。之后，消费者申请buffer (acquireBuffer())，然后使用对应的buffer数据。当消费者使用完成后，它将buffer返回给队列（releaseBuffer()）。

Most recent Android devices support the “sync framework”. This allows the system to do some nifty thing when combined with hardware components that can manipulate graphics data asynchronously. For example, a producer can submit a series of OpenGL ES drawing commands and then enqueue the output buffer before rendering completes. The buffer is accompanied by a fence that signals when the contents are ready. A second fence accompanies the buffer when it is returned to the free list, so that the consumer can release the buffer while the contents are still in use. This approach improves latency and throughput as the buffers move through the system.

最新的android设备支持一种叫做sync framework的技术。这允许系统结合硬件来实现对Graphic数据的异步操作。比如说，一个生产者可以一次性提交一系列的OpenGL ES绘制命令，然后在渲染结束前将其入队到输出缓冲区中。Buffer会被一个fence保护，当内容准备好后，发出信号。当buffer回到free列表时（有第二个fence保护），所以消费者在内容仍在使用的时候释放缓冲区。这种方法提高了buffer在系统中移动的速度和吞吐量。

Some characteristics of the queue, such as the maximum number of buffers it can hold, are determined jointly by the producer and the consumer.

队列其他的一些特性，比如说它能拥有的最大缓冲区数量，则有生产者和消费者共同决定。

The BufferQueue is responsible for allocating buffers as it needs them. Buffers are retained unless the characteristics change; for example, if the producer starts requesting buffers with a different size, the old buffers will be freed and new buffers will be allocated on demand.

BufferQueue负责缓冲区的分配。除非buffer的一些属性发生变化，否则buffers将被保留。举例说，如果生成者申请一个大小不同的buffers，旧的buffers将被释放，而新的buffers将重新被申请。

The data structure is currently always created and “owned” by the consumer. In Android 4.3 only the producer side was “binderized”, i.e. the producer could be in a remote process but the consumer had to live in the process where the queue was created. This evolved a bit in 4.4, moving toward a more general implementation.

这个数据结构一直被消费者创建并“持有”。在4.3版本时，只有生成者一端是“binder化的”，也就是说生产者这端可以在远端进程（binder的另一侧进程）里，而消费者必须在队列被创建的进程里。这在KK版本里面有了一定程序的改进。

Buffer contents are never copied by BufferQueue. Moving that much data around would be very inefficient. Instead, buffers are always passed by handle.

BufferQueue 永远不会拷贝Buffer的数据，因为移动如此多的数据效率将十分低下，buffers只会以句柄的方式被传递。

zangcf · 发表于 2016-4-24 12:09:08

The actual buffer allocations are performed through a memory allocator called “gralloc”, which is implemented through a vendor-specific HAL interface (see hardware/libhardware/include/hardware/gralloc.h). The alloc()function takes the arguments you’d expect — width, height, pixel format — as well as a set of usage flags. Those flags merit closer attention.

事实上，缓冲区的分配是由一个叫做gralloc的内存分配模块控制的，这个模块是是由具体厂商来实现的一个HAL接口（参见 hardware/libhardware/include/hardware/gralloc.h）。使用alloc函数被传入你所需要的参数—宽度，高度，像素类型，以及用途的标志（usage flags）。

The gralloc allocator is not just another way to allocate memory on the native heap. In some situations, the allocated memory may not be cache-coherent, or could be totally inaccessible from user space. The nature of the allocation is determined by the usage flags, which include attributes like: • how often the memory will be accessed from software (CPU) • how often the memory will be accessed from hardware (GPU) • whether the memory will be used as an OpenGL ES (“GLES”) texture • whether the memory will be used by a video encoder

这个gralloc allocator并不是仅仅在native heap上分配内存。在一些场景中，分配的内存很可能并非缓存一致性（所谓缓存一致性，是指保留在高速缓存中的共享资源，保持数据一致性的机制）的，或者是从用户空间不可达的。分配到的内存具有哪些特性，取决于在创建时传入的usage flags：

• 从软件层次来访问这段内存的频率（CPU）
• 从硬件层次来访问这段内存的频率（GPU）
• 这段内存是否被用来做OpenGL ES的材质（GLES）
• 这段内存是否会被拿来做视频的编码

For example, if your format specifies RGBA 8888 pixels, and you indicate the buffer will be accessed from software — meaning your application will touch pixels directly — then the allocator needs to create a buffer with 4 bytes per pixel in R-G-B-A order. If instead you say the buffer will only be accessed from hardware and as a GLES texture, the allocator can do anything the GLES driver wants — BGRA ordering, non-linear “swizzled” layouts, alternative color formats, etc. Allowing the hardware to use its preferred format can improve performance. Some values cannot be combined on certain platforms. For example, the “video encoder” flag may require YUV pixels, so adding “software access” and specifying RGBA 8888 would fail. The handle returned by the gralloc allocator can be passed between processes through Binder.

比如说，如果你设置了RGBA 8888的像素格式，并且你设置了缓冲区被软件访问（这意味着你的程序可以直接修改像素的数据），那么allocator会创建一个四字节的缓冲区，其中顺序按照R-G-B-A的存储顺序来排列。。。。。设置成硬件推荐的数据格式可以提高性能。
一些数值的组合在某些特定的平台是不被允许的。比如说，video encoder对应的可能是YUV的数据格式，所以如果我们如果加入software access，并且指定数据格式为RGBA 8888就有可能失败。
gralloc allocator创建的缓冲区的句柄将通过binder在不同进程之间传输。

zangcf · 发表于 2016-4-24 12:19:05

Having buffers of graphical data is wonderful, but life is even better when you get to see them on your device’s screen. That’s where SurfaceFlinger and the Hardware Composer HAL come in.

拥有Graphic数据的缓冲区很美妙，但是如果你能看到它们显示在屏幕上才更是让你觉得人生完美。是时候让SurfaceFlinger 和 Hardware Composer HAL登场了。

SurfaceFlinger’s role is to accept buffers of data from multiple sources, composite them, and send them to the display. Once upon a time this was done with software blitting to a hardware framebuffer (e.g./dev/graphics/fb0), but those days are long gone.

SurfaceFlinger的工作是接受来自不同来源的缓冲区数据，将这些数据混合，然后发送数据到显示设备上。曾几何时，这些功能是由软件直接复制数据到硬件的framebuffer上（e.g./dev/graphics/fb0），但这样的日子早已一去不复返。

*When an app comes to the foreground, the WindowManager service asks SurfaceFlinger for a drawing surface. SurfaceFlinger creates a “layer”

the primary component of which is a BufferQueue - for which SurfaceFlinger acts as the consumer. A Binder object for the producer side is passed through the WindowManager to the app, which can then start sending frames directly to SurfaceFlinger. (Note: The WindowManager uses the term “window” instead of “layer” for this and uses “layer” to mean something else. We’re going to use the SurfaceFlinger terminology. It can be argued that SurfaceFlinger should really be called LayerFlinger.)*

当一个app转到前端，WindowManager服务会要求SurfaceFlinger绘制一个surface。SurfaceFlinger将创建一个layer（它的主要组成部分是一个BufferQueue），而实际上SurfaceFlinger扮演了一个消费者的角色。一个生产者一侧的binder对象通过WindowManager传输给了app，所以它可以直接向SurfaceFlinger发送帧数据（注意：WindowManager实际上使用的是window这个术语，而不是layer这个术语。但是我们这里将使用SurfaceFlinger体系下的术语，可以说SurfaceFlinger更应该被称作是LayerFlinger）。

For most apps, there will be three layers on screen at any time: the “status bar” at the top of the screen, the “navigation bar” at the bottom or side, and the application’s UI. Some apps will have more or less, e.g. the default home app has a separate layer for the wallpaper, while a full-screen game might hide the status bar. Each layer can be updated independently. The status and navigation bars are rendered by a system process, while the app layers are rendered by the app, with no coordination between the two.

对于大多数app来说，屏幕上一般总是有三个layer：屏幕上方的status bar，屏幕下方的navigation bar（实际上很多品牌的手机并没有navigation bar，比如三星），以及应用本身的UI。一些应用的layer可能会有不同，比如home app的壁纸会有一个独立的layer，而一直全屏幕的游戏可能不会有status bar。每个layer都是独立的被更新。status和navigation bars是被系统进程渲染的，而app的layer则被app渲染，这二者之间并不会有什么协同作业。

Device displays refresh at a certain rate, typically 60 frames per second on phones and tablets. If the display contents are updated mid-refresh, “tearing” will be visible; so it’s important to update the contents only between cycles. The system receives a signal from the display when it’s safe to update the contents. For historical reasons we’ll call this the VSYNC signal.

设备的显示刷新频率是一个特定的值，一般是60帧每秒。如果显示的内容刷新不够迅速，就可能出现显示撕裂的情况。因为按照周期来更新显示的内容至关重要。当显示系统可以安全的更新内容时，它会发送一个信号给系统。基于某种历史上的原因，我们将这个信号称之为VSYNC信号。

The refresh rate may vary over time, e.g. some mobile devices will range from 58 to 62fps depending on current conditions. For an HDMI-attached television, this could theoretically dip to 24 or 48Hz to match a video. Because we can update the screen only once per refresh cycle, submitting buffers for display at 200fps would be a waste of effort as most of the frames would never be seen. Instead of taking action whenever an app submits a buffer, SurfaceFlinger wakes up when the display is ready for something new.

设备的刷新率可能随时间变化，基于不同的场景，一些型号的刷新率可能在58到62之间变化。对于一个连接了HDMI的电视，这个值理论上可以下降到24或者48。因为我们只能在每个刷新周期上更新屏幕内容，如果我们以200fps的频率来提交buffer的数据，那么由于大多数的数据并不会被显示，这无疑是一种浪费。因为我们不会在每次app提交buffer数据时就做相应操作，只会在显示系统可以接受数据时才唤醒SurfaceFlinger。

When the VSYNC signal arrives, SurfaceFlinger walks through its list of layers looking for new buffers. If it finds a new one, it acquires it; if not, it continues to use the previously-acquired buffer. SurfaceFlinger always wants to have something to display, so it will hang on to one buffer. If no buffers have ever been submitted on a layer, the layer is ignored.

当VSYNC信号到达时，SurfaceFlinger会遍历它的layer列表来查找新的buffer。如果查找到一个，SurfaceFlinger将请求它（acquires），否则的话，SurfaceFlinger将继续使用之前的数据。SurfaceFlinger总是需要一些数据来显示，因此它依赖于一个buffer（？）。如果一个layer没有buffer被提交，那么这个layer将被忽略。

Once SurfaceFlinger has collected all of the buffers for visible layers, it asks the Hardware Composer how composition should be performed.

一旦SurfaceFlinger已经收集到了所有可见layer的buffer，它将请求Hardware Composer来执行混合的操作。

zangcf · 发表于 2016-4-24 12:33:04

The Hardware Composer HAL (“HWC”) was first introduced in Android 3.0 (“Honeycomb”) and has evolved steadily over the years. Its primary purpose is to determine the most efficient way to composite buffers with the available hardware. As a HAL, its implementation is device-specific and usually implemented by the display hardware OEM.

HWC是从android 3.0版本引入的，在过去的数年中逐渐变得稳定。它的作用是使用现有的硬件选择最有效的方式来合成缓冲区。做为一个HAL层接口，它的内容是由显示硬件设备厂商来具体实现的。

The value of this approach is easy to recognize when you consider “overlay planes.” The purpose of overlay planes is to composite multiple buffers together, but in the display hardware rather than the GPU. For example, suppose you have a typical Android phone in portrait orientation, with the status bar on top and navigation bar at the bottom, and app content everywhere else. The contents for each layer are in separate buffers. You could handle composition by rendering the app content into a scratch buffer, then rendering the status bar over it, then rendering the navigation bar on top of that, and finally passing the scratch buffer to the display hardware. Or, you could pass all three buffers to the display hardware, and tell it to read data from different buffers for different parts of the screen. The latter approach can be significantly more efficient.

如果设想一下”overlay planes.”的场景，那么这个方法的价值是显而易见的。”overlay planes”的作用是在display hardware而不是GPU中同时混合不同的buffer。打比方说，典型场景下，屏幕上方的status bar，屏幕下方的navigation bar，以及应用本身的UI。每个layer都有自己独立的buffer。你可以通过逐步绘制每个layer到缓冲区里的方式来合成，最后将缓冲区的数据传递给显示硬件设备；或者，你也可以将每个layer数据分别传给显示硬件设备，然后告知显示硬件设备从不同的缓冲区中读取数据。显然后一种方法更有效率。

*As you might expect, the capabilities of different display processors vary significantly. The number of overlays, whether layers can be rotated or blended, and restrictions on positioning and overlap can be difficult to express through an API. So, the HWC works like this:

SurfaceFlinger provides the HWC with a full list of layers, and asks, “how do you want to handle this?”
The HWC responds by marking each layer as “overlay” or “GLES composition.”
SurfaceFlinger takes care of any GLES composition, passing the output buffer to HWC, and lets HWC handle the rest.*

如你所料，不同显示处理器之间的性能有巨大的差距。很多Overlay， layer被旋转或者混合，因此一个api很难准确表达在位置和遮盖上的限制。因此，HWC模块是这样运作的：
1.SurfaceFlinger给HWC提供一份完整的layer列表，然后问，“你打算如何处理？”
2.HWC将每个layer标记为overlay或者GLES composition然后回复给SurfaceFlinger
3.SurfaceFlinger来处理被标记为GLES composition的layer，将处理之后的数据传输给HWC，并且让HWC模块来处理剩下的工作。

Since the decision-making code can be custom tailored by the hardware vendor, it’s possible to get the best performance out of every device.

因为硬件厂商可以自己定制decision-making的代码，所以每台机器达到性能最优成为了可能。

Overlay planes may be less efficient than GL composition when nothing on the screen is changing. This is particularly true when the overlay contents have transparent pixels, and overlapping layers are being blended together. In such cases, the HWC can choose to request GLES composition for some or all layers and retain the composited buffer. If SurfaceFlinger comes back again asking to composite the same set of buffers, the HWC can just continue to show the previously-composited scratch buffer. This can improve the battery life of an idle device.

当屏幕上没有任何东西变化时，Overlay planes的效率并不如GL composition的效率高。当overlay的内容中有很多透明的像素，或者重叠的layer在一起被混合时，这种差距尤其明显。在这种情况下，HWC会请求让GLES composition来处理部分或者全部的layer，并且保留混合后的buffer。如果Surfaceflinger又来请求混合相同的buffer时，HWC会直接显示之前保存的混合好的buffer。这么做将可以提升设备待机时间。

Devices shipping with Android 4.4 (“KitKat”) typically support four overlay planes. Attempting to composite more layers than there are overlays will cause the system to use GLES composition for some of them; so the number of layers used by an application can have a measurable impact on power consumption and performance.

搭载了KK的android设备一般支持四条overlay planes。如果我们尝试混合更多的layer时，系统会使用GLES composition来处理其中的部分；所以一个应用使用了多少layer会影响到系统的功耗和性能。

你可以通过adb shell dumpsys SurfaceFlinger这个命令来查看Surfaceflinger具体使用了什么。这个命令的输出十分的长，其中和我们上面探讨的问题关连最深的是HWC的一段总结，这段一般在输出内容的底部：

This tells you what layers are on screen, whether they’re being handled with overlays (“HWC”) or OpenGL ES composition (“GLES”), and gives you a bunch of other facts you probably won’t care about (“handle” and “hints” and “flags” and other stuff that we’ve trimmed out of the snippet above). The “source crop” and “frame” values will be examined more closely later on.

从这里我们可以看到那些显示在屏幕上的layer，是被overlays (“HWC”)处理，还是被OpenGL ES composition (“GLES”)处理，另外还有一些我们目前不太关注的别的属性（”handle” and “hints” and “flags”还有别的一些属性，我们没有粘帖在上面的输出中）。我们会在后面详细解释”source crop” and “frame”这两个值的含义。

The FB_TARGET layer is where GLES composition output goes. Since all layers shown above are using overlays, FB_TARGET isn’t being used for this frame. The layer’s name is indicative of its original role: On a device with/dev/graphics/fb0 and no overlays, all composition would be done with GLES, and the output would be written to the framebuffer. On recent devices there generally is no simple framebuffer, so the FB_TARGET layer is a scratch buffer. (Note: This is why screen grabbers written for old versions of Android no longer work: They’re trying to read from The Framebuffer, but there is no such thing.)

FB_TARGET这个layer就是由GLES composition输出组成。这个layer上面的其余layer都是由overlay渲染而成，所以在这一帧里面，FB_TARGET并没有被使用。这个layer的名字表明了初始的角色：一个在/dev/graphics/fb0 的设备，所有的合成工作由GLES来完成，然后输入将会被写入framebuffer中。在当前的设备上并没有一个单纯的framebuffer，所有这个FB_TARGET layer 实际上是一个scratch buffer(这就是为什么在android早期版本上写的一些屏幕截图工具现在不能正常工作的原因：程序试图从framebuffer中读取数据，但是现在已经没有了framebuffer).

The overlay planes have another important role: they’re the only way to display DRM content. DRM-protected buffers cannot be accessed by SurfaceFlinger or the GLES driver, which means that your video will disappear if HWC switches to GLES composition.

overlay planes有另外一个重要的作用：这是显示DRM内容的唯一方法。受保护的DRM视频的buffer是无法被Surfaceflinger或者GLES来读取的，这意味着如果你使用GLES而不是HWC的话，你的视频将无法播放。

zangcf · 发表于 2016-4-24 12:37:46

Because of the way

SurfaceFlinger is triggered, our double-buffered pipeline will have a bubble. Suppose frame N is being displayed, and frame N+1 has been acquired by SurfaceFlinger for display on the next VSYNC. (Assume frame N is composited with an overlay, so we can’t alter the buffer contents until the display is done with it.) When VSYNC arrives, HWC flips the buffers. While the app is starting to render frame N+2 into the buffer that used to hold frame N, SurfaceFlinger is scanning the layer list, looking for updates. SurfaceFlinger won’t find any new buffers, so it prepares to show frame N+1 again after the next VSYNC. A little while later, the app finishes rendering frame N+2 and queues it for SurfaceFlinger, but it’s too late. This has effectively cut our maximum frame rate in half.

假设frame N正在被显示，而frame N+1已经被Surfaceflinger获取用于下一次VSYNC发生时的显示（假设frame N使用了overylay来做渲染，所以显示处理完成之前，我们没办法修改buffer的内容）。当VSYNC信号到来时，HWC投递了缓冲区。当app开始渲染frame N+2 到Frame N用过的缓冲区内时，Surfaceflinger开始检查layer列表，查看是否有更新。此时Surfaceflinger并不会发现任何新的buffer，所以它会准备在下一个VSYNC到来时继续显示N+1帧的内容。一段时间后，app结束了N+2帧的渲染，然后将数据传给Surfaceflinger，但是此时已经为时太晚。这将导致我们最大帧率缩减为一半。

We can fix this with triple-buffering. Just before VSYNC, frame N is being displayed, frame N+1 has been composited (or scheduled for an overlay) and is ready to be displayed, and frame N+2 is queued up and ready to be acquired by SurfaceFlinger. When the screen flips, the buffers rotate through the stages with no bubble. The app has just less than a full VSYNC period (16.7ms at 60fps) to do its rendering and queue the buffer. And SurfaceFlinger / HWC has a full VSYNC period to figure out the composition before the next flip. The downside is that it takes at least two VSYNC periods for anything that the app does to appear on the screen. As the latency increases, the device feels less responsive to touch input.

三重缓冲可以解决我们的这个问题。VSYNC信号之前，帧N已经被显示，帧N+1已经合成完毕（或者计划进行overlay），等待被显示，而帧N+2已经在排队等候被Surfaceflinger获取。When the screen flips, the buffers rotate through the stages with no bubble.App有略少于一个完整VSYNC周期的时间（当帧率为60时，这个时间为16.7毫秒）去做它的渲染工作并且将buffer入队。在下一个VSYNC到来之前，Surfaceflinger/HWC有一个完整的VSYNC周期去完成合成的工作。坏消息是，app将内容显示在屏幕上，将需要花费两个VSYNC的周期。因为延迟增加了，所以设备会显得会触摸事件的响应不够灵敏。

zangcf · 发表于 2016-4-24 12:46:16

Figure 1. SurfaceFlinger + BufferQueue

The diagram above depicts the flow of SurfaceFlinger and BufferQueue. During frame:

上面的图表描述了SurfaceFlinger and BufferQueue的处理流程，在每一帧中：

1.red buffer fills up, then slides into BufferQueue
2.after red buffer leaves app, blue buffer slides in, replacing it
3.green buffer and systemUI shadow-slide into HWC (showing that SurfaceFlinger still has the buffers, but now HWC has prepared them for display via overlay on the next VSYNC).
The blue buffer is referenced by both the display and the BufferQueue. The app is not allowed to render to it until the associated sync fence signals.*

1.红色的缓冲区填满后，进入BufferQueue中
2.当红色缓冲区离开app后，蓝色的缓冲区进入并代替了它
3.绿色缓冲区和SystemUI的数据进入HWC（这里显示Surfaceflinger依然持有这些缓冲区，但是现在HWC已经准备好在一个VSYNC到来时，将数据通过overlay显示在屏幕上了）
蓝色的缓冲区同时被显示和BufferQueue引用，因此在相关的同步信号到来前，app是不能在这块缓冲区上渲染的。

On VSYNC, all of these happen at once:
当VSYNC到来时，以下操作同时发生：

1.Red buffer leaps into SurfaceFlinger, replacing green buffer
2.Green buffer leaps into Display, replacing blue buffer, and a dotted-line green twin appears in the BufferQueue
3.The blue buffer’s fence is signaled, and the blue buffer in App empties
4.Display rect changes from to

1.红色的缓冲区进入Surfaceflinger，取代了绿色缓冲区
2.绿色缓冲区取代了蓝色缓冲区，开始显示，同时图中虚线连接的，绿色缓冲区的复制保存在了BufferQueue中
3.蓝色缓冲区的fence被解除，进入到了App empties**中
4.显示内容从蓝色缓冲区+SystemUI变成了绿色缓冲区+systemUI

The System UI process is providing the status and nav bars, which for our purposes here aren’t changing, so SurfaceFlinger keeps using the previously-acquired buffer. In practice there would be two separate buffers, one for the status bar at the top, one for the navigation bar at the bottom, and they would be sized to fit their contents. Each would arrive on its own BufferQueue.

SystemUI提供了状态栏和导航栏，我们这里认为它是不变的，因此Surfaceflinger使用了前面保存的buffer。而实际上，这里会有两个独立的buffer，一个属于上面的状态栏，一个属于下面的导航栏，并且他们的大小和内容是匹配的。每一个都会独立达到自己的BufferQueue中。

The buffer doesn’t actually “empty”; if you submit it without drawing on it you’ll get that same blue again. The emptying is the result of clearing the buffer contents, which the app should do before it starts drawing.

这里的buffer并非真的是空的，如果你不在上面绘制而是直接提交的话，你将会得到一个同样的蓝色缓冲区。App在绘制执行应该先执行清空缓冲区的命令，这将会buffer变空。

We can reduce the latency by noting layer composition should not require a full VSYNC period. If composition is performed by overlays, it takes essentially zero CPU and GPU time. But we can’t count on that, so we need to allow a little time. If the app starts rendering halfway between VSYNC signals, and SurfaceFlinger defers the HWC setup until a few milliseconds before the signal is due to arrive, we can cut the latency from 2 frames to perhaps 1.5. In theory you could render and composite in a single period, allowing a return to double-buffering; but getting it down that far is difficult on current devices. Minor fluctuations in rendering and composition time, and switching from overlays to GLES composition, can cause us to miss a swap deadline and repeat the previous frame.

通过让合成不占用一整个VSYNC时间的办法，我们可以降低延迟。如果合成是由overlay来实现的，那么它几乎不需要消耗CPU和GPU时间。但我们不能依赖于此，因此我们需要一点额外的时间。如果app在两个VSYNC信号中间开始渲染，而surfaceFlinger直到VSYNC到达前的几毫秒才进行了HWC的设置（译者注：setUpHWComposer调用，也就是把需要显示的layer数据准备好，报给HWC模块来决定使用谁来合成），那么我们可以将延迟从2帧降到1.5帧。理论上来说我们可以让渲染和合成在一个周期内，这样双重缓冲区足矣（译者注：的确，理论上来说如果这个过程不消耗时间的话，app在VSYNC之后dequeue到buffer，开始渲染，然后在这个VSYNC时间内完成渲染，要求合成，合成如果瞬间完成，的确不需要多一个VSYNC周期，两个周期足矣，但这要求太高了）；但这对当前的设备来说要求太高了，渲染和合成时一点微小的耗时变化（使用GLES而不是HWC来合成），都会导致错过更新时间，导致重复显示上一帧。

SurfaceFlinger’s buffer handling demonstrates the fence-based buffer management mentioned earlier. If we’re animating at full speed, we need to have an acquired buffer for the display (“front”) and an acquired buffer for the next flip (“back”). If we’re showing the buffer on an overlay, the contents are being accessed directly by the display and must not be touched. But if you look at an active layer’s BufferQueue state in the dumpsys SurfaceFlinger output, you’ll see one acquired buffer, one queued buffer, and one free buffer. That’s because, when SurfaceFlinger acquires the new “back” buffer, it releases the current “front” buffer to the queue. The “front” buffer is still in use by the display, so anything that dequeues it must wait for the fence to signal before drawing on it. So long as everybody follows the fencing rules, all of the queue-management IPC requests can happen in parallel with the display.

Surfaceflinger buffer的处理过程展示了我们前面提过的fence-based buffer的管理过程。如果画面高速的变化，我们需要申请一个缓冲区用于显示（front），同时需要申请一个缓冲区用于下一帧（back）。如果显示的buffer是被overlay使用的，那么这里面的内容是直接被显示系统读取的，因此不能被修改。但是如果你通过dumpsys SurfaceFlinger命令来check一个活动的layer的BufferQueue状态时，你会看到一个acquired buffer, 一个queued buffer, 还有一个free buffer.这是因为，当Surfaceflinger申请一个新的back buffer时，它释放了front buffer给队列。但是这个front buffer依然被display使用，所以任何想要在绘制之前dequeue这段buffer的进程，都必须等待fence signal的通知。只要每个人都遵守这套规则，所有的同步队列管理IPC请求都可以在显示系统中被并行的处理。

zangcf · 发表于 2016-4-24 13:06:01

SurfaceFlinger supports a “primary” display, i.e. what’s built into your phone or tablet, and an “external” display, such as a television connected through HDMI. It also supports a number of “virtual” displays, which make composited output available within the system. Virtual displays can be used to record the screen or send it over a network.

Surfaceflinger支持一个主显示，也支持一个额外的显示，比如一个通过HDMI线连接的电视机。同时它也支持一些虚拟的显示，虚拟显示可被用于录制屏幕或者通过网络发送。

Virtual displays may share the same set of layers as the main display (the “layer stack”) or have its own set. There is no VSYNC for a virtual display, so the VSYNC for the primary display is used to trigger composition for all displays.

虚拟电视可以有跟主显示相同的layer，也可以有它自己的layer stack。但是虚拟显示并没有VSYNC，所以主显示的VSYNC将用于触发所有显示的合成工作。

In the past, virtual displays were always composited with GLES. The Hardware Composer managed composition for only the primary display. In Android 4.4, the Hardware Composer gained the ability to participate in virtual display composition.

在过去，虚拟显示的合成一直是由GLES来完成的。HWC仅仅用于主显示，但是在KK，HWC也可以参与虚拟显示的合成工作了。

As you might expect, the frames generated for a virtual display are written to a BufferQueue.

正如你所料，虚拟显示的帧是被写入了一个BufferQueue的。

zangcf · 发表于 2016-4-24 13:08:40

The Surface class has been part of the public API since 1.0. Its description simply says, “Handle onto a raw buffer that is being managed by the screen compositor.” The statement was accurate when initially written but falls well short of the mark on a modern system.

Surface类从1.0开始就是公开api的一部分。它的描述是这样的：处理被屏幕合成器管理的raw buffer。这句话在当时被写下时（1.0时代）是准确的，但是在当代的操作系统的标准下这句话已经远远落后。

The Surface represents the producer side of a buffer queue that is often (but not always!) consumed by SurfaceFlinger. When you render onto a Surface, the result ends up in a buffer that gets shipped to the consumer. A Surface is not simply a raw chunk of memory you can scribble on.

Surface代表了一个buffer queue的生产者一侧，这个buffer queue一般被（但不是总是）被Surfaceflinger来消费。当你向一个Surface渲染时，结果最终在一个缓冲区内被运送到消费者那里。一个Surface并不是一个可以任意修改的简单raw内存数据块。

The BufferQueue for a display Surface is typically configured for triple-buffering; but buffers are allocated on demand. So if the producer generates buffers slowly enough — maybe it’s animating at 30fps on a 60fps display — there might only be two allocated buffers in the queue. This helps minimize memory consumption. You can see a summary of the buffers associated with every layer in the dumpsys SurfaceFlinger output.

一个显示surface的bufferQueue一般被配置为三重缓冲区，但是缓冲区是按需分配的。所以如果生产者生产缓冲区足够缓慢（比如在一个刷新率60的设备上只有30的刷新率），这种情况下可能队列中只有两个被分配的缓冲区，这样可以有效的降低内存使用。通过命令dumpsys SurfaceFlinger，你可以看到每个layer关连的buffer的汇总。

zangcf · 发表于 2016-4-24 13:11:27

Once upon a time, all rendering was done in software, and you can still do this today. The low-level implementation is provided by the Skia graphics library. If you want to draw a rectangle, you make a library call, and it sets bytes in a buffer appropriately. To ensure that a buffer isn’t updated by two clients at once, or written to while being displayed, you have to lock the buffer to access it. lockCanvas() locks the buffer and returns a Canvas to use for drawing, and unlockCanvasAndPost() unlocks the buffer and sends it to the compositor.

曾经，所有的渲染工作都可以由软件来完成，在今天你依然可以这么做。底层的实现是由Skia库来实现的。如果你想绘制一个矩形，你调用一个库函数，函数就会设置好缓冲区中的数据。为了确保buffer同时被两个客户端同时更新，或者在显示时被写入，你需要在使用它之前锁定这块buffer。函数lockCanvas会锁定一块缓冲区并且返回一个canvas用来绘制，函数unlockCanvasAndPost函数解锁缓冲区，并且把它发送给合成器。

As time went on, and devices with general-purpose 3D engines appeared, Android reoriented itself around OpenGL ES. However, it was important to keep the old API working, for apps as well as app framework code, so an effort was made to hardware-accelerate the Canvas API. As you can see from the charts on the Hardware Acceleration page, this was a bit of a bumpy ride. Note in particular that while the Canvas provided to a View’s onDraw() method may be hardware-accelerated, the Canvas obtained when an app locks a Surface directly with lockCanvas() never is.

随着时间的推移，带有通用3D加速引擎的设备出现了。Android围绕OpenGL ES做了调整。然而，保证旧的API可以运行同样重要。因此我们努力使得Canvas的API支持硬件加速。如你在Hardware Acceleration页面所能看到的图表一样，这是一段艰苦的旅程。特别要注意的是，当一个Canvas提供到一个View的onDraw方法时，它可能是硬件加速的；而你通过lockCanvas方法获取到的Canvas则绝不可能是硬件加速的。

When you lock a Surface for Canvas access, the “CPU renderer” connects to the producer side of the BufferQueue and does not disconnect until the Surface is destroyed. Most other producers (like GLES) can be disconnected and reconnected to a Surface, but the Canvas-based “CPU renderer” cannot. This means you can’t draw on a surface with GLES or send it frames from a video decoder if you’ve ever locked it for a Canvas.

当你为了使用Canvas而锁定一个Surface的时候，”CPU renderer”连接到了BufferQueue的生产者一端，直到Surface被销毁才会断开。大多数其他的生产者（比如GLES）可以断开连接，并且重新连接到一个Surface之上；但是基于CPU渲染的Canvas不行。这意味着，一旦你为了使用一个Canvas而lock了一个Surface，你就不能使用GLES绘制这个Surface，你也不能将视频解码器生成的帧发送给它。

The first time the producer requests a buffer from a BufferQueue, it is allocated and initialized to zeroes. Initialization is necessary to avoid inadvertently sharing data between processes. When you re-use a buffer, however, the previous contents will still be present. If you repeatedly call lockCanvas() and unlockCanvasAndPost() without drawing anything, you’ll cycle between previously-rendered frames.

第一次生产者从BufferQueue中请求一个buffer时，它被分配并且被初始化为空。为了避免出现进程间不经意的分享数据，初始化是必要的。因为当你重新使用一个buffer时，之前的内容可能还在那里。如果你重复调用 lockCanvas() 和unlockCanvasAndPost()函数而不绘制任何东西的话，你将会循环显示前面渲染过的帧。

The Surface lock/unlock code keeps a reference to the previously-rendered buffer. If you specify a dirty region when locking the Surface, it will copy the non-dirty pixels from the previous buffer. There’s a fair chance the buffer will be handled by SurfaceFlinger or HWC; but since we need to only read from it, there’s no need to wait for exclusive access.

Surface的lock/unlock代码保持了上次渲染过的buffer的引用。如果你在lock时指定了脏区域，那么它会将前一个缓冲区内非脏区域的像素拷贝过来。有相当大的可能这块buffer正在被Surfaceflinger或者HWC处理，但是因为我们只是要从中读取内容，因此我们没必要一直等待互斥锁。

The main non-Canvas way for an application to draw directly on a Surface is through OpenGL ES. That’s described in the EGLSurface and OpenGL ES section.

一个app不通过Canvas这个方法，而直接在Surface上绘制的办法是通过OpenGL ES。我们将在EGLSurface and OpenGL ES 这一节中讲到这个问题。

		自动登录	找回密码
密码			立即注册

Android 4.4 Graphic Architecture

BufferQueue and gralloc

Gralloc HAL

SurfaceFlinger and Hardware Composer

Hardware Composer

The Need for Triple-Buffering

SurfaceFlinger with BufferQueue

本帖子中包含更多资源

Virtual Displays

Surface and SurfaceHolder

Canvas Rendering

浏览过的版块