git-bz: Bugzilla subcommand for Git

Update: For current documentation, see: http://git.fishsoup.net/man/git-bz.html

Now that gnome-shell development is in full swing, I’ve been spending a lot of time creating patches with git and git-svn, and then pasting the git-format-patch results into Bugzilla. That seemed like a highly automatable task. git-send-bugzilla wasn’t really what I was looking for. Hence, git-bz:

 
# File the last commit as a new bug on the default tracker
git bz file my-product/some-component HEAD^

# Attach a series of patches to an existing bug
git bz attach bugzilla.gnome.org:1234 b50ea9bd^

# Prompt if you want to apply each patch attachment from a bug
git bz apply bgo:1234

Just the script: http://git.fishsoup.net/cgit/git-bz/plain/git-bz
GIt repository: git clone git://git.fishsoup.net/git-bz

I haven’t tested with anything but bugzilla.gnome.org yet; expect some things to need fixing.

Implementing the next GNOME shell

By now, I think most people have had a chance to look at the text and mockups that Vincent posted of the new desktop shell ideas that came out of the user experience hackfest. Obviously some parts of the ideas are controversial, some parts will be improved as we get some experience with them in practice, but it is a compelling set of ideas that I’m excited about trying out. So here I’d like to put out some ideas I have about how we could go about implementation. (Much of this was covered at a session we had Sunday morning at the GNOME summit where we presented the ideas that came out of the hackfest.)

One Process. The first thing to note is that the ideas don’t naturally split into “window manager” and “panel”. The “Activities” view combines showing the windows that are currently open with launchers for existing applications. It would be possible to add complex API’s to the window manager to allow putting extra things into its scene graph. But it is going to be far easier to simply work in a single process with clean internal programming interfaces.

Javascript. This is an area that really calls for a high level language. There’s going to be lots of code that needs experimentation to get the right user behavior, but not a lot of code that is implementing some complicated algorithm over lots of data. For applets (which would run within the shell), we want a low barrier to getting involved. Javascript doesn’t pull in another complicated platform, almost everyone is familiar with it to some extent another, it offers good possibilities for sandboxing applets, it’s pretty light-weight for memory, and there is a lot of work going on to make it fast. And, especially with some of the Mozilla JS-1.7/JS-1.8 improvements, it’s not as painful to work in as you are thinking.

Clutter. Once we mix together windows, other UI bits like panels, overlay views, and so forth, we need a scene graph to manage everything and put it on the screen. The obvious candidate is there is Clutter. Clutter isn’t (yet) a very good replacement for GTK+ for a general purpose application toolkit, but UI like what is mocked up in the designs very much plays to Clutter’s strengths. And its a big plus to me that Clutter has a tested, documented API… something we wouldn’t have for a custom written scene graph.

Start from Metacity. If the shell subsumes the window manager, then what we should avoid is spending a bunch of time getting all the window manager details right once again … how do you constrain sizes and positions? how do you read ICCCM properties? etc. For this reason, starting from an existing window manager codebase is the right thing to do. I’m less convinced that we should try to have a single code base that works both ways… being able to aggressively refactor, convert thing that would be better in a high level language to Javascript seems important.

So, those are my ideas. No running code, svn module, or even project name yet. But that should change soon. (Current leader for a project name is the exciting “gnome-shell”… gnomesh to go with gnomecc.)

Istanbul Wed/Thu

I’m getting into Istanbul on Wednesday afternoon; going to spend Thursday there before driving over to visit Troy with JRB. (I need to do my touristing pre-GUADEC since I have other plans afterwards.) If anybody else is getting early and wants to meet up for dinner on Wednesday or sightseeing on Thurday, send me an email or catch me on IRC over the next few days.

Fast text: use a single cache pixmap

A couple of weeks ago Dave Airlie pointed out to me that Alex Deucher had added RENDER extension acceleration for R3xx/R5xx to the Radeon EXA driver. Seeing an opportunity to have a desktop that was both composited and accelerated on my R300 laptop, I tried it out. The initial results: everything slowed to a crawl. I fixed one problem that was causing a software fallback for gradients, and the next bottleneck was text. Less than 18,000 glyphs / second. I eventually tracked down the R300-specific problem, and that got things to a usable rate: 130,000 glyphs / second. But still, that’s slower than unaccelerated text. How can we make it fast?

To make sure that everybody is on the same page about what we are trying to make fast, let me show a picture of a drawing some text via the RENDER extension:

Drawing text with the RENDER extension

It’s a two step process: first we take all the glyph images and draw them onto a mask. Then we draw the source color or pattern through that mask onto the destination surface. So, which of the two steps is slow? We can get a very good idea of that by measuring the drawing speed as a function of the size of glyph and of the number of glyphs in the string.

R100 R300 i965 i965-BB
count 10px 24px 10pt 24px 10px 24px 10px 24px
1 30300 30200 29400 29300 47500 47400 34500 34500
5 92900 90500 84500 81700 120000 119000 119000 119000
10 126000 111000 109000 108000 149000 148000 172000 172000
30 172000 127000 139000 138000 178000 178000 249000 238000
80 195000 130000 151000 151000 190000 190000 290000 278000

(Notes: All timings using EXA. R100/R300 timings on a P4-3.0Ghz, i965 timings on a core 2 dua 2.0Ghz. i965-BB is intel-batchbuffers branch, i965 master. Test program.)

We can also plot the numbers:

Timings of current code

A couple of interesting things: first the amount of pixels we are pushing is completely irrelevant … on all the cards the 10 pixel and 24 pixel font sizes perform the same, even though there are almost 6 times as many pixels at the larger size (2.4^2 == 5.76.) Second, our performance in terms of glyphs/second is pretty much flat for all but the shortest strings. So we know what the bottleneck is: it’s the per-glyph setup cost of copying the glyphs onto the mask.

If we take a look at the EXA hooks for composite acceleration, we see the basic problem:

Bool (*PrepareComposite) (int                op,
                          PicturePtr         pSrcPicture,
                          PicturePtr         pMaskPicture,
                          PicturePtr         pDstPicture,
                          PixmapPtr          pSrc,
                          PixmapPtr          pMask,
                          PixmapPtr          pDst);

void  (*Composite)       (PixmapPtr         pDst,
                          int srcX,         int srcY,
                          int maskX,        int maskY,
                          int dstX,         int dstY,
                          int width,        int height);

void (*DoneComposite)    (PixmapPtr         pDst);

A composite operation consists of copying a number of rectangles from the same source, to the same destination, with the same operator. But in the building the mask, we are copying a number of  rectangles from different sources. Each glyph must be added to the mask in a separate composite operation. And at the beginning and ends of a composite operation we do expensive stuff: at the beginning we set up all the state for the 3D engine. At the end we wait until the 3D engine is idle so we can go off and use the drawn result as a source for some other operation.

I had two thoughts initially: the first was to try and optimize separate composite operations done sequentially: if we do two operations in a row that need the 3D engine set up the same way, don’t set it up twice. The second was to extend the composite acceleration hooks so that we could change the source in the middle of an operation: to add a SwitchCompositeSource() that could be called between PrepareComposite() and DoneComposite(). But the first is tricky to get right if you want to avoid enough work to actually make a performance difference, and the second requires a minor version bump in EXA and is, beyond that, a hack. (Why just have an operation for switching the source, and not the operator or the mask.) Neither gets rid of the actual need to switch the source texture between every glyph: something that is inherently expensive to do for most graphics chipsets. (In the brave new world of the TTM memory manager, switching textures also means a lot of relocations for the submitted command buffer.)

Then Dave suggested something on IRC: what if instead of uploading each glyph into a separate glyph pixmap, we used a single glyph cache pixmap, uploaded glyphs there as needed, then composited to from that cache pixmap to the mask. That matches the current composite hooks perfectly, and allows us to set up the 3D engine once and just send a stream of vertices for each rectangle we want to draw to the card.

I took a stab at implementing the approach over the weekend and the results are definitely encouraging. Here’s the table and graph from above repeated with the new code:

R100 R300 i965 i965-BB
count 10pt 24pt 10pt 24pt 10pt 24pt 10pt 24pt
1 31600 31100 30100 29300 48700 47900 34500 43500
5 137000 110000 131000 127000 198000 195000 153000 152000
10 229000 131000 222000 221000 322000 319000 269000 269000
30 438000 156000 435000 431000 558000 541000 549000 496000
80 611000 159000 620000 504000 727000 600000 762000 593000

Timings with glyph cache

Obviously everything is much faster, but we also see a qualitative change: as compared to the previous numbers there is a much stronger dependence on the length of the glyph string (overall setup costs) and on the size of the glyphs (per-pixel costs.) We’ve taken the per-glyph setup cost mostly out of the equation. And at the larger glyph size the R100 starts falling way behind. It should be noted that there is significant overhead from my test code when we get to these speeds: ‘x11perf -aa10text’ is somewhat faster.

The code can be found in the glyph-cache branch of my xserver git tree at git://git.fishsoup.net/xserver. In an existing xserver git tree, you’d check out that branch as:

git remote add -f otaylor git://git.fishsoup.net/xserver
git checkout -b glyph-cache otaylor/glyph-cache

Web View.

What remains to be done? Well, first the current patch just uses static glyph cache size. The first time you draw an A8 glyph, you get a 300k pixmap allocated to hold 256 16×16 and 256 32×32 glyphs. The first time you draw a ARGB32 glyph (subpixel antialiasing) you get a 1.3M pixmap allocated. (4 times as big since each glyph is 4 times as big.) But 256 glyphs is not big enough for all languages. And immediately allocating 1.3M the first time we see an ARGB glyph is probably a bad idea, especially when memory is tight. A better approach would be to smart small, track the glyph cache hit rate, and at if we are getting a too low hit rate, dump the current cache, reallocate the cache to a larger size, and start over.

The second major thing left to do is to improve the way that glyphs are added to the cache pixmap. Right now, when a glyph is uploaded to the server by an application, it is immediately stored in a pixmap in video memory. With the glyph cache pixmap, it likely make more sense to keep the glyph in system memory until first needed, and then upload it directly to the glyph cache pixmap. I didn’t try to do that yet: what my patch does is simply copy directly from the in-memory glyph pixmap to the cache pixmap. So potential improvements from reducing the number of small pixmaps that must be memory managed are not yet achieved.

A few new Reinteract features

I had some time today to finish some Reinteract features I’ve been working on over the last few weeks, namely completion and mouse-over tooltips on the editor contents. Some screenshots:

Completion
Reinteract Completion

Tooltip showing documentation
Reinteract Docs Mouseover

Tooltip showing variable contents
Reinteract Variable Mouseover

The majority of features in my completion design notes are now implemented. (The design took the bold and innovative approach of “copy how completion works in Eclipse”). The main thing that’s still missing is assist for function parameters. But I’ll probably leave that to the side for the moment and turn to notebooks. And catching up with some bugs fixes. For one thing, the code used to format the variable tooltips should be easily reusable to fix a reported problem where accidentally displaying a large array in Reinteract makes it dead-slow.