<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Fast text: use a single cache pixmap</title>
	<atom:link href="http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/</link>
	<description>Owen Taylor on Coding, Food, etc.</description>
	<lastBuildDate>Tue, 08 Nov 2011 23:41:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Nurul Choudhury</title>
		<link>http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/#comment-1675</link>
		<dc:creator><![CDATA[Nurul Choudhury]]></dc:creator>
		<pubDate>Wed, 14 May 2008 18:24:42 +0000</pubDate>
		<guid isPermaLink="false">http://owtaylor.wordpress.com/?p=63#comment-1675</guid>
		<description><![CDATA[This is very interesting and it will be a big improvement. I had an extension to the idea that might be useful and my ititial test seems to indicate that. 

I am pasting in mail I sent to carl worth about the idea and my initial results.

...
I recently installed Ubuntu (Hardy Heron) on my rather ancient desktop at home. This machine is mostly used by my wife to read email and to view YouTube. The machine is rather memory limited 512 MB ram and an ATI Radeon 9550 with 64MB video RAM. As you can see not a great machine, but Linux was fine on it until I installed Hardy; now the machine feels as pigishly slow as a windows machine - ah well. To cut to the chase, I thought that things might improve if I enable hardware accelerated graphics and in particularly xgl+compiz. The good thing about the software is that it can be set up very little effort and the and the desktop effects seem to work quite well. But mow Firefox rendering is really slow and scrolling is glacial. That is not the worst of it, most video playback li more like a slugging slide-show. Before you ask I am using the proprietary fglx driver. Most of the time xgl shows over 30% CPU utilization, now I think this is because most websites have flash videos in advertisements, and xgl really has a lot of trouble that those. On the plus side the window manager is very smooth them moving windows around the screen even with the wobble effect and there is no visible repainting of the exposed area when moving windows. Although it is not significant that every Linux distribution I have used previously used  would show a &#039;trailing&#039; effect in particular going over the firefox browser.

The Premise

Any down to business, I was reading about the improvements you have been making to the hardware rendering portion of X and XAA rendering was still twice as fast as the EXA rendering that you are implementing. In a video presentation you mentioned that if the hardware was infinately fast that the max rate for glyph drawing would be about 700K/s and you are getting around 125K glyphs per sec. I am sure you have thought about this, if instead of displaying one glyph at a time we cached sequences of multiple glyphs at a time, would that not improve the performance of glyph display. Since regular text has a awful number repeated text fragments and if one considers how much zip compression can reduce a text file size, this seems to be an effective line of enquiry.

Some experiments

Purpose: See if is effective to cache common sequesnces of glyphs instead of individual glyphs. The idea is an extension of all the individual glyphs in a single image. For example:

How the sequences are generated:

the text &quot;this is a test&quot; 
would generate the following doule and tripple sequences.  
=&gt; &quot;th&quot;, &quot;thi&quot;, &quot;hi&quot;, &quot;his&quot;, &quot;is&quot;, &quot;is &quot;, &quot;s &quot;, &quot;s i&quot;,
                &quot; i&quot;, &quot; is&quot;, &quot;is&quot;, &quot;is &quot;, &quot;s &quot;, &quot;s a&quot;, &quot; a&quot;,
                &quot; a &quot;, &quot;a &quot;, &quot;a t&quot;, &quot; t&quot;, &quot; te&quot;, &quot;te&quot;, &quot;tes&quot;, &quot;es&quot;, &quot;est&quot;, &quot;st&quot; 

Clearly not all of these sequences are useful. So we need some heuristic to keep only the one that would be useful. 

As stated earlier there is a rather efficient method of building these sequences using a &#039;trie&#039; data structure. We start of by building 2 character sequences. Every time we see the same sequence again we increment the use count. When the use-count exceeds a pre-determined threshold say 10, we start to extend that pair to include a third characte. Further we cache an image of the two glyph sequence.

Again an example woul be useful. Suppose we have the sequence &#039;th&#039; we see this occuring 10 time, we create an image for &#039;th&#039; and we also extending the sequence for &#039;th&#039;

Now we might get:
Seequence   Occurance
&#039;the&#039;        10  
&#039;tho&#039;         3  
&#039;thi&#039;        10

we would now create an image for &#039;the&#039; but not for &#039;tho&#039; and &#039;thi&#039;. 


Assumptionns:
        1&gt; The glyph sequences to cache should be computed dynamically
         2&gt; The process should be efficient both in compution power and memory usage
         3&gt; The maximum sequence lengths should be determined in advance
         4&gt; The solution should be esay to implement.
         5&gt; English text was used for these experiments, use the text form of books in project Guttenburg
      
From my half remeberd data structure courses at school, I assumes that the trie data structure would probably be the most efficient way of determining the best glyph sequences. These are many methods of implementing the TRIE structure and I did find a  a really good paper on efficient trie implemenation &quot;An Efficient Implementation of Trie Structures (1992) , Jun-Ichi Aoe, Katsushi Morimoto, Takashi Sato; Software --- Practice and Experience&quot;, I did not use that code for these experiments. The purpose of the experiments was to determing if this would be a useful line to follow. I am please to report that the results are ensouraging:

I use a simple Trie implemenation i Java that is not particularly efficient and certainly I would not suggest that that this implememtation should be transcribed to C. I chose Java so as to do these experiments quickly. Secondly, I have not done graphics programming in over 10 years and I have no experience with OpenGL, so I did not attempt to do any graphics in these experiments.

Summary

As you can see I have presented the results for caching 2, 3, 4 glyph sequences. The results suggest we get very good inprovement with images generated for glyph pairs, and a 35% improvement if we use glyph triples (but we have to cache a far greater number of images) but the extra space may be worth the overhead. Above 3 we get much smaller improvement for a great deal of extra processing time and image caching, this in my opinion would not be worth the effort.





Results:
 Run on Ubuntu 8.04
1 GHz Dell Pentium PIII 512 MB Ram
JRE java 6 - sun 1.6.0.06


Test

Up to 3 glyph sequences
Generate a image after noticing 20  repeats
Time to Complete: 278 mili sec
Seq Len: Number of writes
1:             52248
2:         1172340
Total writes: 1224588
Total chars: 2396928
Total chars in file: 2396928
Improvement: 1.957334221795412
Images generated: 1358 (number of cached images of glyph sequence)
Nodes in Trie: 2396

-----
Up to 3 glyph sequences
Generate a image after noticing 20  repeats
Time to Complete: 564 mili sec
Seq Len: Number of writes
1:             47314
2:             132583
3:             694816
Total writes: 874713
Total chars: 2396928
Total chars in file: 2396928
Improvement: 2.740245086102527
Images generated: 5005 (number of cached images of glyph sequence)
Nodes in Trie: 15486
---
Upto 4 glyph sequences
Generate a image after noticing 20  repeats
Time to Complete: 1124 ms
Seq Len: Number of writes
1:             44938
2:             126648
3:             207050
4:             369386
Total writes: 748022
Total chars: 2396928
Total chars in file: 2396928
Improvement: 3.2043549521270767
Images generated: 10099 (number of cached images of glyph sequence)
Nodes in Trie: 39572]]></description>
		<content:encoded><![CDATA[<p>This is very interesting and it will be a big improvement. I had an extension to the idea that might be useful and my ititial test seems to indicate that. </p>
<p>I am pasting in mail I sent to carl worth about the idea and my initial results.</p>
<p>&#8230;<br />
I recently installed Ubuntu (Hardy Heron) on my rather ancient desktop at home. This machine is mostly used by my wife to read email and to view YouTube. The machine is rather memory limited 512 MB ram and an ATI Radeon 9550 with 64MB video RAM. As you can see not a great machine, but Linux was fine on it until I installed Hardy; now the machine feels as pigishly slow as a windows machine &#8211; ah well. To cut to the chase, I thought that things might improve if I enable hardware accelerated graphics and in particularly xgl+compiz. The good thing about the software is that it can be set up very little effort and the and the desktop effects seem to work quite well. But mow Firefox rendering is really slow and scrolling is glacial. That is not the worst of it, most video playback li more like a slugging slide-show. Before you ask I am using the proprietary fglx driver. Most of the time xgl shows over 30% CPU utilization, now I think this is because most websites have flash videos in advertisements, and xgl really has a lot of trouble that those. On the plus side the window manager is very smooth them moving windows around the screen even with the wobble effect and there is no visible repainting of the exposed area when moving windows. Although it is not significant that every Linux distribution I have used previously used  would show a &#8216;trailing&#8217; effect in particular going over the firefox browser.</p>
<p>The Premise</p>
<p>Any down to business, I was reading about the improvements you have been making to the hardware rendering portion of X and XAA rendering was still twice as fast as the EXA rendering that you are implementing. In a video presentation you mentioned that if the hardware was infinately fast that the max rate for glyph drawing would be about 700K/s and you are getting around 125K glyphs per sec. I am sure you have thought about this, if instead of displaying one glyph at a time we cached sequences of multiple glyphs at a time, would that not improve the performance of glyph display. Since regular text has a awful number repeated text fragments and if one considers how much zip compression can reduce a text file size, this seems to be an effective line of enquiry.</p>
<p>Some experiments</p>
<p>Purpose: See if is effective to cache common sequesnces of glyphs instead of individual glyphs. The idea is an extension of all the individual glyphs in a single image. For example:</p>
<p>How the sequences are generated:</p>
<p>the text &#8220;this is a test&#8221;<br />
would generate the following doule and tripple sequences.<br />
=&gt; &#8220;th&#8221;, &#8220;thi&#8221;, &#8220;hi&#8221;, &#8220;his&#8221;, &#8220;is&#8221;, &#8220;is &#8220;, &#8220;s &#8220;, &#8220;s i&#8221;,<br />
                &#8221; i&#8221;, &#8221; is&#8221;, &#8220;is&#8221;, &#8220;is &#8220;, &#8220;s &#8220;, &#8220;s a&#8221;, &#8221; a&#8221;,<br />
                &#8221; a &#8220;, &#8220;a &#8220;, &#8220;a t&#8221;, &#8221; t&#8221;, &#8221; te&#8221;, &#8220;te&#8221;, &#8220;tes&#8221;, &#8220;es&#8221;, &#8220;est&#8221;, &#8220;st&#8221; </p>
<p>Clearly not all of these sequences are useful. So we need some heuristic to keep only the one that would be useful. </p>
<p>As stated earlier there is a rather efficient method of building these sequences using a &#8216;trie&#8217; data structure. We start of by building 2 character sequences. Every time we see the same sequence again we increment the use count. When the use-count exceeds a pre-determined threshold say 10, we start to extend that pair to include a third characte. Further we cache an image of the two glyph sequence.</p>
<p>Again an example woul be useful. Suppose we have the sequence &#8216;th&#8217; we see this occuring 10 time, we create an image for &#8216;th&#8217; and we also extending the sequence for &#8216;th&#8217;</p>
<p>Now we might get:<br />
Seequence   Occurance<br />
&#8216;the&#8217;        10<br />
&#8216;tho&#8217;         3<br />
&#8216;thi&#8217;        10</p>
<p>we would now create an image for &#8216;the&#8217; but not for &#8216;tho&#8217; and &#8216;thi&#8217;. </p>
<p>Assumptionns:<br />
        1&gt; The glyph sequences to cache should be computed dynamically<br />
         2&gt; The process should be efficient both in compution power and memory usage<br />
         3&gt; The maximum sequence lengths should be determined in advance<br />
         4&gt; The solution should be esay to implement.<br />
         5&gt; English text was used for these experiments, use the text form of books in project Guttenburg</p>
<p>From my half remeberd data structure courses at school, I assumes that the trie data structure would probably be the most efficient way of determining the best glyph sequences. These are many methods of implementing the TRIE structure and I did find a  a really good paper on efficient trie implemenation &#8220;An Efficient Implementation of Trie Structures (1992) , Jun-Ichi Aoe, Katsushi Morimoto, Takashi Sato; Software &#8212; Practice and Experience&#8221;, I did not use that code for these experiments. The purpose of the experiments was to determing if this would be a useful line to follow. I am please to report that the results are ensouraging:</p>
<p>I use a simple Trie implemenation i Java that is not particularly efficient and certainly I would not suggest that that this implememtation should be transcribed to C. I chose Java so as to do these experiments quickly. Secondly, I have not done graphics programming in over 10 years and I have no experience with OpenGL, so I did not attempt to do any graphics in these experiments.</p>
<p>Summary</p>
<p>As you can see I have presented the results for caching 2, 3, 4 glyph sequences. The results suggest we get very good inprovement with images generated for glyph pairs, and a 35% improvement if we use glyph triples (but we have to cache a far greater number of images) but the extra space may be worth the overhead. Above 3 we get much smaller improvement for a great deal of extra processing time and image caching, this in my opinion would not be worth the effort.</p>
<p>Results:<br />
 Run on Ubuntu 8.04<br />
1 GHz Dell Pentium PIII 512 MB Ram<br />
JRE java 6 &#8211; sun 1.6.0.06</p>
<p>Test</p>
<p>Up to 3 glyph sequences<br />
Generate a image after noticing 20  repeats<br />
Time to Complete: 278 mili sec<br />
Seq Len: Number of writes<br />
1:             52248<br />
2:         1172340<br />
Total writes: 1224588<br />
Total chars: 2396928<br />
Total chars in file: 2396928<br />
Improvement: 1.957334221795412<br />
Images generated: 1358 (number of cached images of glyph sequence)<br />
Nodes in Trie: 2396</p>
<p>&#8212;&#8211;<br />
Up to 3 glyph sequences<br />
Generate a image after noticing 20  repeats<br />
Time to Complete: 564 mili sec<br />
Seq Len: Number of writes<br />
1:             47314<br />
2:             132583<br />
3:             694816<br />
Total writes: 874713<br />
Total chars: 2396928<br />
Total chars in file: 2396928<br />
Improvement: 2.740245086102527<br />
Images generated: 5005 (number of cached images of glyph sequence)<br />
Nodes in Trie: 15486<br />
&#8212;<br />
Upto 4 glyph sequences<br />
Generate a image after noticing 20  repeats<br />
Time to Complete: 1124 ms<br />
Seq Len: Number of writes<br />
1:             44938<br />
2:             126648<br />
3:             207050<br />
4:             369386<br />
Total writes: 748022<br />
Total chars: 2396928<br />
Total chars in file: 2396928<br />
Improvement: 3.2043549521270767<br />
Images generated: 10099 (number of cached images of glyph sequence)<br />
Nodes in Trie: 39572</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anonymous</title>
		<link>http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/#comment-1670</link>
		<dc:creator><![CDATA[anonymous]]></dc:creator>
		<pubDate>Mon, 21 Apr 2008 14:35:02 +0000</pubDate>
		<guid isPermaLink="false">http://owtaylor.wordpress.com/?p=63#comment-1670</guid>
		<description><![CDATA[I &#039;m using intel driver and recently enabled EXA. The speed is *pathetic* compared to XAA, it&#039;s like working on a 200MHz Pentium. All redraws are visible (you can see the screen filling...).

So thank you very much for this improvement, it is much needed!]]></description>
		<content:encoded><![CDATA[<p>I &#8216;m using intel driver and recently enabled EXA. The speed is *pathetic* compared to XAA, it&#8217;s like working on a 200MHz Pentium. All redraws are visible (you can see the screen filling&#8230;).</p>
<p>So thank you very much for this improvement, it is much needed!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Owen</title>
		<link>http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/#comment-1669</link>
		<dc:creator><![CDATA[Owen]]></dc:creator>
		<pubDate>Mon, 21 Apr 2008 09:45:57 +0000</pubDate>
		<guid isPermaLink="false">http://owtaylor.wordpress.com/?p=63#comment-1669</guid>
		<description><![CDATA[Xav: I would expect it to be useful for pretty much any EXA-based driver that is accelerating RENDER. It certainly helps the Intel driver as well (the lines labeled i965.)

Giacomo: There are a lot of factors that can affect things, and it&#039;s hard to sort them out without seeing a profile. I would certainly wonder if you are seeing software rendering. Offhand, I wouldn&#039;t expect libpciaccess to be important here: with or without my patch the commands and data are going through the DRM. But I&#039;m not an expert in that area.]]></description>
		<content:encoded><![CDATA[<p>Xav: I would expect it to be useful for pretty much any EXA-based driver that is accelerating RENDER. It certainly helps the Intel driver as well (the lines labeled i965.)</p>
<p>Giacomo: There are a lot of factors that can affect things, and it&#8217;s hard to sort them out without seeing a profile. I would certainly wonder if you are seeing software rendering. Offhand, I wouldn&#8217;t expect libpciaccess to be important here: with or without my patch the commands and data are going through the DRM. But I&#8217;m not an expert in that area.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Giacomo</title>
		<link>http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/#comment-1668</link>
		<dc:creator><![CDATA[Giacomo]]></dc:creator>
		<pubDate>Mon, 21 Apr 2008 08:24:32 +0000</pubDate>
		<guid isPermaLink="false">http://owtaylor.wordpress.com/?p=63#comment-1668</guid>
		<description><![CDATA[Hi! Good work. I&#039;ve been using the latest git for a few days and now composite with EXA is finally usable, even with a old xserver (1.4.0.90).
To tell the truth, on my rv370 I&#039;m getting higher results than you (300k char/s with metacity, 240k char/s with compiz), on #radeon agdf5 mentioned a libpciaccess bug, could it be that? (I had noticed the performance drop with recent xservers before)]]></description>
		<content:encoded><![CDATA[<p>Hi! Good work. I&#8217;ve been using the latest git for a few days and now composite with EXA is finally usable, even with a old xserver (1.4.0.90).<br />
To tell the truth, on my rv370 I&#8217;m getting higher results than you (300k char/s with metacity, 240k char/s with compiz), on #radeon agdf5 mentioned a libpciaccess bug, could it be that? (I had noticed the performance drop with recent xservers before)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xav</title>
		<link>http://blog.fishsoup.net/2008/04/20/fast-text-use-a-single-cache-pixmap/#comment-1667</link>
		<dc:creator><![CDATA[Xav]]></dc:creator>
		<pubDate>Mon, 21 Apr 2008 08:04:53 +0000</pubDate>
		<guid isPermaLink="false">http://owtaylor.wordpress.com/?p=63#comment-1667</guid>
		<description><![CDATA[Nice work.
So, if I understand correctly, the acceleration profits all drivers, not only radeons ?]]></description>
		<content:encoded><![CDATA[<p>Nice work.<br />
So, if I understand correctly, the acceleration profits all drivers, not only radeons ?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

