Frame Timing: the Simple Way

This is a follow-up to my post from last week; I wanted to look into why the different methods of frame timing looked different from each other. As a starting point, we can graph the position position of a moving object (like the ball) as displayed to the user versus it’s theoretical position (gray line):

The “Frame Start” and “Frame Center” lines represent the two methods described in the last post. We either choose position of objects in the frame based on the theoretical position at the start of the time that the frame will be displayed, or at the center of the frame time. The third line “Constant Steps” shows a method that wasn’t described in the last post, but you would have seen if you tried the demo: it’s actually the simplest possible algorithm you can imagine: compute the positions of objects at the current time, draw the frame as fast as possible, start a new frame.

The initial reaction to the above graph is that the “Frame Center” method is about as good as you can get at tracking the theoretical position, and the “Constant Steps” method is much worse than the other two. But this isn’t what you see if you try out the demo – Constant Steps is actually quite a bit better than Frame Start. Trying to understand this, I realized that the delay – the vertical offset in the graph – is really completely irrelevant to how smooth things look – the user has no idea of where things are supposed to be – just how things change from frame to frame. What matters for smoothness is mostly the velocity – the distance things move from frame to frame. If we plot this, we see a quite different picture:

Here we see something more like the visual impression – that the “Frame Start” method has a lot more bouncing around in velocity as compared to the other two. (The velocity for all three drops to zero when we miss a frame.) We can quantify this a bit by graphing the variance of the velocity versus time to draw a frame. We look at the region from a frame draw time of 1 frame (60fps) to a frame draw time of 2 frames (30fps).

Here we see that in terms of providing consistent motion velocity, Constant Steps is actually is a bit better than the Frame Center at all times.

What about latency? You might think that Constant Steps is worse because it’s tracking the theoretical position less closely, but really, this is an artifact – to implement Frame Center, we have to predict future positions. And, unless we can predict what the user is going to do, predicting future positions cannot reduce latency in responding to input. The only thing that tracking the theoretical positions closer helps with is is if we’re trying to do something iike sync video to audio. And of course, to the extent that we can predict future positions or compute a delay to apply to the audio track, we can do that for Constant Steps as well: instead of drawing everything at their current positions, we can choose them based on the position at a time shortly in the future.

If such a simple method work well, do we actually need compositor to application synchronization? It’s likely needed because we can’t really draw frames as fast as possible, we should draw frames only as fast as possible while still allowing the compositor to always be able to get in, get GPU resources, and draw a frame at the right time.

What to do if you can’t do 60fps?

I’ve been working recently on figuring out application to compositor synchronization. One aspect of that is the timing information does the compositor need to send back to the application and how should the application use it. In the case where everything is lightly loaded and we can hit 60fps, it’s pretty obvious what we want – we just output constantly spaced frames:

But what if we can’t do that? Say we only have the CPU and GPU resources to draw 40fps. To keep things simple and frame timing consistent, do we drop every other frame and draw the animation at 30fps?

(Gray frames are frames where we don’t do an update and reuse the previous image. The dotted circles show the theoretical position of the moving ball at the time of the frame.)

Or maybe it would be better to show more frames, to drop only one out of every three frames?

Or maybe we need to do something more sophisticated than to just drop frames – maybe when rendering a frame we need to take into account how long the frame will be displayed for and calculate positions at the center of the frame display period?

The answers to what looked better wasn’t at all obvious to me, even after a few years of playing with drawing animations for GNOME Shell, so I wrote a demo application to try out various things. If you want to test it out, note that it needs to be run uncomposited, so under GNOME 3, run metacity --replace & from a terminal and then use Alt-F4 to close the “something has gone wrong” screen. (gnome-shell --replace & to get back to your desktop.)

So, what conclusions have I drawn from looking at my demo? The first conclusion is that 60fps is visually way better than anything else. This wasn’t completely obvious to me going in – after all, movies run at 24fps. But movies have motion blur from the exposure time, which we don’t have here. (Adding motion blur to desktop animations would increase computational work considerably, and it seems unlikely that 30fps + motion blur looks better than 60fps without motion blur.)

The second conclusion is that how we time things matters a lot. Of the two methods above for dropping every third frame, the second method is obviously much better than the first one.

The third conclusion, is that if we can get frame timing right, then running at 40fps looks better than running at 30fps, but if we don’t get frame timing right, then the visual appearance is about the same, or possibly even worse.

What does this mean for an application to compositor synchronization protocol? I don’t have the final answer to that yet, but in very general terms we need to support applications that want to draw at frame rates like 40fps, because it can potentially look better – but we have to be careful that we support doing it with algorithms that actually look better.

Update: BTW, if anybody knows useful literature references about this area, I’d be interested.

Benchmarking compositor performance

Recently Phoronix did an article about performance under different compositing and non-compositing window managers. GNOME Shell didn’t do that well, so lots of people pointed it out to me. Clearly there was a lot of work put into making measurements for the article, but what is measured is a wide range of 3D fullscreen games across different graphics drivers, graphics hardware, and environments.

Now, if what you want to do with your Linux system is play 3D games this is very relevant information, but it really says absolutely nothing about performance in general. Because the obvious technique to use when a 3D game is running is to “unredirect” the game – and let it display normally to the screen without interference from the compositor. Depending on configuration options, both Compiz and KWin will unredirect, while GNOME Shell doesn’t do that currently, so this (along with driver bugs) probably explains the bulk of difference between GNOME Shell and other environments.

Adel Gadllah has had patches for Mutter and GNOME Shell to add unredirection for over a year, but I’ve dragged my feet on landing them, because there were some questions about when it’s appropriate to unredirect a window and when not that I wasn’t sure we had fully answered. We want to unredirect fullscreen 3D games, but not necessarily all fullscreen windows. For example, a fullscreen Firefox window is much like any other window and can have separate dialog windows floating above it that need compositing manager interaction to draw properly.

We should land some sort of unredirection soon to benefit 3D gamers, but really, I’m much more interested in compositing manager performance in situations where the compositing manager actually has to composite. So, that’s what I set out this week to do: to develop a benchmark to measure the effect of the compositing manager on application redraw performance.

Creating a benchmark

The first thing that we need to realize when creating such a benchmark is that the only drawing that matters is drawing that gets to the screen. Any frames drawn that aren’t displayed by the compositor are useless. If we have a situation where the application is drawing at 60fps, but the compositor only is drawing 1fps, that’s not a great performing compositor, that’s a really bad performing compositor. Application frame rate doesn’t matter unless it’s throttled to the compositor frame rate.

Now, this immediately gets us to a sticky problem: there are no mechanisms to throttle application frame rate to the compositor frame rate on the X desktop. Any app that is doing animations or video, or anything else, is just throwing frames out there and hoping for the best. Really, doing compositor benchmarks before we fix that problem is just pointless. Luckily, there’s a workaround that we can use to get some numbers out in the short term – the same damage extension that compositors use to find out when a window has been redrawn and has to be recomposited to the screen can also be used to monitor the changes that the compositor is making to the screen. (Screen-scraping VNC servers like Vino use this technique to find out what they need to send out over the wire.) So, our benchmark application can draw a frame, and then look for damage events on the root window to see when the drawing they’ve done has taken effect.

This looks something like:

In the above picture, what is shown is a back-buffer to front-buffer copy that creates damage immediately, but is done asynchronously during the vertical blanking interval. The MESA_copy_sub_buffer GL extension basically does with, with the caveat that (for the Intel and AMD drivers) it can entirely block the GPU while waiting for the blank.

I’ve done some work to develop this idea into a benchmark I’m calling xcompbench. (Source available.)

Initial Results

Below is a graph of some results. What is shown here is the frame rate of a benchmark that blends a bunch of surfaces together via cairo as we increase an arbitrary “load factor” which is proportional to the number of surfaces blended together. Since having only one window open isn’t normal, the results are shown for different “depths”, which are how many xterms are stacked underneath the benchmark window.

Compositor Benchmark (Cairo Blending)

So, what we see above is that if we are drawing to an offscreen pixmap, or we are running with metacity and no compoisting, frame rate decreases smoothly as the load factor increases. When you add a compositor, things change: if you look at solid blue line for mutter you see the prototypical behavior – the frame rate pins at 60fps (the vertical refresh rate) until it drops below it, then you see some “steps” where it preferentially runs locked to integral fractions of the frame rate – 40fps, 30fps, 20fps, etc. Other things seen above – kwin runs similarly to mutter with no other windows open, but drops off as more windows are added, while mutter and compiz are pretty much independent of number of windows. And compiz is running much slower than the other compositors.

Since the effect of the compositor on performance depends on what resources the compositor and application are competing for, it clearly matters what resources the benchmark is using – is it using CPU time? is it using memory bandwidth? is it using lots of GPU shaders? So, I’ll show results for two other benchmarks as well. One draws a lot of text, and another is a simple GL benchmark that draws a lot of vertices with blending enabled.

Compositor Benchmark (Text Drawing)

Compositor Benchmark (GL Drawing)

There are some interesting quirks there that would be worth some more investigation – why is the text benchmark considerably faster drawing offscreen than running uncomposited? why is the reverse true for the GL benchmark? But the basic picture we see is the same as for the first benchmark.

So, this looks pretty good for Mutter right? Well, yes. But note:

It’s all about Timing

The reason Compiz is slow here isn’t that it has slow code, it’s that the timing of when it redraws is going wrong with this benchmark. The actual algorithm that it uses is rather hard to explain, and so are the ways it interacts with the benchmark badly, but to give a slight flavor of what might be going on, take a look at the following diagram.

If a compositor isn’t redrawing immediately when it receives damage from a client, but is waiting a bit for more damage, then it’s possible it might wait too long and miss the vertical reblank entirely. Then the frame rate could drop way down, even if there was plenty of CPU and GPU available.

Future directions

One thing I’d like to do is to be able to extract a more compact set of numbers. The charts above clearly represent relative performance between different compositors, but individual data points tell much less. If someone runs my benchmark and reports that on their system, kwin can do 45 fps when running at a load factor of 8 on the blend benchmark, that is most representative of hardware differences and not of compositor code. The ratio of the offscreen framerate to the composited framerate at the “shoulder” where we drop off from 60fps might be a good number. If one compositor drops off from 60fps at an offscreen framerate of 90fps, but for a different compositor we have to increase the load factor so that the offscreen framerate is only 75fps at the shoulder, then that should be a mostly hardware independent result.

It is also important to look at the effect of going from a “bare” compositor to a desktop environment? The results above are with bare compiz, kwin, and mutter ,and not with Unity, Plasma, or GNOME Shell. My testing indicates pretty similar results with GNOME Shell as with the full desktop. Can I put numbers to that? Is the same true elsewhere?

And finally, how do we actually add proper synchronization instead of using the damage hack? I’ve done an implementation of an idea that was come up with a couple of years ago in a discussion between me and Denis Dzyubenko and it looks promising. This blog post is, however, too long already to give more details at this point.

My goal here is that this is a benchmark that we can all use to figure out the right timing algorithms and get them implemented across compositors. At that point, I’d expect to see only minimal differences, because the basic work that every compositor has to do is the same: just copy the area that the application updated to the screen and let the application start drawing the next frame.

Test Configuration

Intel Core i5 laptop @2.53GHz, integrated intel Ironlake graphics
KWin 4.6.3
Compiz 0.9.4
Mutter 3.0.2

Update: The sentence “why is the text benchmark considerably faster drawing offscreen than running uncomposited” was originally reversed. Pointed out by Benjamin Otte and fixed.

What does the user see?

As a long-time GNOME module maintainer and as a team lead within Red Hat, I often get people coming to me for advice about some technical issue or another. And no matter the issue, there’s one question that I’ll almost always end up asking at some point: “what does the user see?” Code, APIs, protocols are all just means to the end-user experience. Discussion of the future of GNOME should also start with what the user sees.

Mark argues that GNOME should be a place where we have internal competition. But his idea of internal competition seems to be competition between different end-user experiences. His entrant into the competition is Unity, an environment with a user experience designed completely in isolation from GNOME. The other entrant would, I suppose, be the GNOME 3 desktop that GNOME has created.

This competition doesn’t make sense to me: what would be left of GNOME if Unity “won” that competition? Not even the libraries are left, because every decision that is made about what goes into library should be driven by that same question “what does the user see?” No widget should go into GTK+ unless it makes sense in a GNOME application. GNOME cannot cede the user experience and still work as a project.

The sort of internal competition I’d like to see within GNOME is competition of ideas. Competition of mockups and prototypes, and even entire applications. We know that we need better file management within the GNOME Activities Overview for 3.2. Is that organized as a timeline? Does it involve tagging? Is it best implemented with Zeitgeist? With Tracker? With both? Those are things that are still open, and the more people that are working on different designs and approaches, the better off the final result will be.

The basic constraint of any sort of internal competition within GNOME is that you have to be willing for some of your ideas to win and some of your ideas to lose. If you are starting out with the premise that you have complete final control over the user experience, then you aren’t working on GNOME, you are working on something else. So far, this seems to be the approach of Canonical. In the past, they took GNOME, modified it, and presented the modified result to their users. Now they are taking some GNOME libraries, building a new desktop on top of that, and presenting that to their users. But I’ve never seen Canonical make the leap and realize that they could actually dive in and make GNOME itself better.

Diving in means a commitment – it means fighting for your ideas at every step of the way, from the design level, to figuring out how the code pieces fit together, to the line-by-line details of the code. But the thing about open source is that the more you engage at this level with a project, the more you win. You become more in sync with the other contributors about end goals. You learn how to communicate with them better. And soon enough you no longer think of yourself as an outsider. You are just are another insider.

Make no mistake: I’m very sad to see further splintering of the Linux desktop. I think GNOME 3 is going to be amazing, but how much more amazing could it have been if the design and coding talent that is going into Unity could have been pooled with the work being done inside GNOME? An application developer can create an application that works both within GNOME and within Unity, but we’re adding extra barriers to the task of creating an application for Linux. That’s already far too hard.

No matter what happens, all desktops on Linux need to continue to work together to try and provide as much cross-desktop compatibility as possible. But we have to realize the limits of specifications and standards. Many of the early successes of were in places where there was broad user interface consensus. Drag-and-drop of text from one application to another made sense in all toolkits, so we made it work between toolkits. But if there isn’t consensus on the user experience, then the specification isn’t that useful.

For example, appindicators start off with the proposition any application should be able to create an icon with a drop-down menu and make it a permanent part of the desktop. (I’m simplifying somewhat – the original Status Notifier specification leaves the user experience quite unspecified, but that’s the way that Canonical was using the specification.) If you don’t have that user interface concept, it’s not clear how the spec helps. So that’s what made the Canonical proposal of libappindicator strange. They didn’t engage with GNOME to make the user interface idea part of future designs. They didn’t even propose changes to core GNOME components to support application indicators. They showed up with a library that would allow applications to display indicators in a modified GNOME desktop, and proposed that GNOME should bless it as a dependency.

(From the GNOME Shell side we were never considering whether appindicators were useful for their original designed purpose; we were considering whether they were a useful way to implement the fixed system icons we had in the GNOME Shell design. In the end, it seemed much simpler to just code the fixed system icons, and I think that decision has been supported in how things have turned out in practice. We’ve been able to create system icon drop-downs that match our mockups and use the full capabilities of the shell toolkit without having to figure out how to funnel everything over a D-Bus protocol.)

So, by all means, we should collaborate on standards, but we can’t just collaborate on standards for the sake of collaborating on standards. We have to start off from understanding what the user sees. Once we understand what the user sees, if there’s a place to make an application written for one environment work better in another environment, that’s a place where standardization is useful. Of course, the more that designers from different environments exchange ideas and go down similar user interface paths, the more opportunity there will be for standards.

Is collaboration on standards and on bits of infrastructure, and friendly exchange of UI ideas the way forward for GNOME and Unity? Are they completely separate desktops? Perhaps it’s the only feasible way forward at this point, but it certainly doesn’t make me happy. Mark: any time you want discuss how we can work together to create to a single great desktop, we’re still ready to talk. Design choices, technological choices, the channels and ways we communicate in GNOME, are all things we can reconsider. The only thing to me that’s untouchable is the central idea that GNOME is ultimately about the experience of our users.

Setting Goals for GNOME

Often in GNOME, we think of goal setting is something that we can leave up to the board, or up to the marketing team. An appearance of direction that we layer on top of the what we are really working on. This is obviously backwards … everybody in GNOME should consider the goals of GNOME to be their business. I led a session Sunday morning at the Boston GNOME Summit to try and get some broader brainstorming going about where we want to go with GNOME. So, I wanted to write up both how I set up the discussion and some of the ideas that came out.

Why should do we need goals for GNOME? Goals inspire us. They are great tools for recruiting contributors of all types. They allow us to create compelling marketing materials that explain to user’s what is significant about what we are creating and where we are going. And importantly, they drive decisions – they let us choose between path A and path B. This leads us to what makes a good goal: a good goal is motivational – it can inspire. It’s realistic – it has to be achievable. And it is concrete enough to let you make decisions.

We can look at how some past GNOME goals fit into this framework. The most famous explicitly stated goal was the the 10×10 goal. 10% market share by 2010. It was very catchy and memorable. But even from the start realism was a huge question mark. And worse than that, it really didn’t help answer what we should be doing. By contrast, the goal of the early years of GNOME, though it was never explicitly stated, was to provide a free software replacement for Windows. Not nearly as neat-sounding a goal, but when you line it up against the criteria above it actually stacks up well. At that time Windows was the big barrier to putting users in control of their software through Free Software, so people were motivated to work on replacing it. The goal was realistic – we eventually achieved a lot of it. And it gave us lots of concrete tasks to work on. Things have moved on, but it was an effective goal for that time.

Any sort of exploration of goals for GNOME involves some idea of what GNOME fundamentally is. A phrase I think captures it: “GNOME is a community of people building Free Software for users”. The direction of GNOME is set by the people working on it as individuals, not the companies that might be sponsoring some of that work. GNOME is strongly committed to Free Software, not as a temporary strategy but as a fundamental principle. And we’re not building toys for ourselves, or creating technology masterpieces for their own sake, we are trying to make user’s lives better.

Within that broad set of parameters, we really have the option to do anything. We shouldn’t feel constrained by the set of things we do currently. Another thing to keep in mind is that the computing space is mind-bogglingly big these days. We don’t need to dominate even one segment of computing to be a big and successful project. But what we do need to do is create something that’s really great for the people we do touch: that meets their needs and makes a portion of their day better. And that means direct influence over the user experience. It’s pretty hard to build something that is great for users if you are just building components that other people take and re-purpose. It’s also pretty hard to to be great for users if we’re just a small slice of the total experience. To be concrete: if we’re just the stuff around the edges of the web browser, and the web browser is a tool to look at Facebook, and the user is looking at Facebook on their phone most of the time anyways. Then that’s not an experience we can do a lot to make better. We need to engage with the user beyond traditional “computers” and beyond the local application.

I finished my intro with the question: the user actually gets big benefits by giving all their searches and documents and mail over to Google. Giving their social interactions over to Facebook. While the downsides of centralizing your data under someone else’s control and being able to only do the things with that data that they want to let you do may be obvious, we can’t pretend that this is a trap for the unwary and smart users will keep everything locally. How do we, as GNOME, enable an experience that is both under the user’s control and also as good or better than the experience they can get by giving up that control?

In my next post I’ll describe some of the ideas that came out of the brainstorming session.


Get every new post delivered to your Inbox.