<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: More database compression means more speed? Right?</title>
	<atom:link href="http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/feed/" rel="self" type="application/rss+xml" />
	<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/</link>
	<description>Computer Scientist and Open Scholar: Databases, Information Retrieval, Business Intelligence.</description>
	<lastBuildDate>Thu, 24 May 2012 04:42:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Glenn Davis</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51917</link>
		<dc:creator>Glenn Davis</dc:creator>
		<pubDate>Thu, 19 Nov 2009 02:10:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51917</guid>
		<description>That’s a super paper and a good start in the right direction! Those results, although measured on entropy-reduced data, illustrate the power of multidimensional approaches and the goal alignment (speed with size) one can get from good models.

Their paper contains a footnote that I found not only intriguing but suggestive of a direction to go to reach the next level of performance:

&quot;By modeling the tuple sources as i.i.d., we lose the ability to exploit inter-tuple correlations. To our knowledge, no one has studied such correlations in databases – all the work on correlations has been among fields within a tuple. If inter-tuple correlations are significant, the information theory literature on compression of non zero-order sources might be applicable.&quot;

Funny they should say that! The answer is both yes and no. Yes, inter-tuple correlations can and should be exploited to compress structured data; I led a team that did that with great success some 20 years ago. And no, we found the information theory literature to be irrelevant. Information theory concerns encoding modeled data, not the design of the data models themselves. That is where the challenges and benefits lie.</description>
		<content:encoded><![CDATA[<p>That’s a super paper and a good start in the right direction! Those results, although measured on entropy-reduced data, illustrate the power of multidimensional approaches and the goal alignment (speed with size) one can get from good models.</p>
<p>Their paper contains a footnote that I found not only intriguing but suggestive of a direction to go to reach the next level of performance:</p>
<p>&#8220;By modeling the tuple sources as i.i.d., we lose the ability to exploit inter-tuple correlations. To our knowledge, no one has studied such correlations in databases – all the work on correlations has been among fields within a tuple. If inter-tuple correlations are significant, the information theory literature on compression of non zero-order sources might be applicable.&#8221;</p>
<p>Funny they should say that! The answer is both yes and no. Yes, inter-tuple correlations can and should be exploited to compress structured data; I led a team that did that with great success some 20 years ago. And no, we found the information theory literature to be irrelevant. Information theory concerns encoding modeled data, not the design of the data models themselves. That is where the challenges and benefits lie.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51916</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Wed, 18 Nov 2009 19:21:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51916</guid>
		<description>@Glenn

The type of compression ratio you are referring to is already possible using publicly available algorithms:

V. Raman and G. Swart. Entropy compression of relations and querying of compressed relations. In VLDB, 2006.</description>
		<content:encoded><![CDATA[<p>@Glenn</p>
<p>The type of compression ratio you are referring to is already possible using publicly available algorithms:</p>
<p>V. Raman and G. Swart. Entropy compression of relations and querying of compressed relations. In VLDB, 2006.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Glenn Davis</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51915</link>
		<dc:creator>Glenn Davis</dc:creator>
		<pubDate>Wed, 18 Nov 2009 06:25:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51915</guid>
		<description>Ready for a radical idea? Too bad, here it is anyway.

Having to categorize one’s compression method as being lightweight or heavyweight suggests to me that the method just isn’t very good or appropriate for the data. Good methods do a good job of data modeling, and with really good data modeling the otherwise-competing performance goals of speed and size can go hand in hand. To me, in the case of structured databases, good data modeling means multidimensional data modeling; unfortunately, nearly all the methods now being used are inherently one-dimensional and, predictably, wind up requiring compromise.

I say that after having developed compression software that achieves almost 8-to-1 compression of the TPC-H lineitem table, a common benchmark in the DBMS world. That is far beyond published results from Oracle and IBM, and it demonstrates how much better, and more appropriate, multidimensional data modeling is when one is dealing with multidimensional data.</description>
		<content:encoded><![CDATA[<p>Ready for a radical idea? Too bad, here it is anyway.</p>
<p>Having to categorize one’s compression method as being lightweight or heavyweight suggests to me that the method just isn’t very good or appropriate for the data. Good methods do a good job of data modeling, and with really good data modeling the otherwise-competing performance goals of speed and size can go hand in hand. To me, in the case of structured databases, good data modeling means multidimensional data modeling; unfortunately, nearly all the methods now being used are inherently one-dimensional and, predictably, wind up requiring compromise.</p>
<p>I say that after having developed compression software that achieves almost 8-to-1 compression of the TPC-H lineitem table, a common benchmark in the DBMS world. That is far beyond published results from Oracle and IBM, and it demonstrates how much better, and more appropriate, multidimensional data modeling is when one is dealing with multidimensional data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51911</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Fri, 13 Nov 2009 23:24:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51911</guid>
		<description>@Conway

Sure. It could happen that in the future the cores will be constantly data-starved. Nobody can predict the future. But it is not the case right now. Lightweight compression outperforms and has outperformed for years if not decades heavy compression within databases.

(Some papers have claimed that databases are I/O bound. They have just not convinced me, nor the database industry.)</description>
		<content:encoded><![CDATA[<p>@Conway</p>
<p>Sure. It could happen that in the future the cores will be constantly data-starved. Nobody can predict the future. But it is not the case right now. Lightweight compression outperforms and has outperformed for years if not decades heavy compression within databases.</p>
<p>(Some papers have claimed that databases are I/O bound. They have just not convinced me, nor the database industry.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Conway</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51910</link>
		<dc:creator>Neil Conway</dc:creator>
		<pubDate>Fri, 13 Nov 2009 22:41:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51910</guid>
		<description>Well, it may well be the case that &quot;CPU cores will be constantly data-starved&quot; in the future, certainly for many run-of-the-mill business applications.

In the recent past, most superscalar chips had only a very limited ability to extract instruction-level parallelism from most programs -- that didn&#039;t stop Intel from mass-producing those chips, even though they were utilized relatively inefficiently. Those chips (e.g. Pentium IV) were still useful, despite the inefficiency. Similarly, manycore chips may still be useful, even if they are relatively bandwidth-starved -- which would make communication-intelligent designs increasingly important.</description>
		<content:encoded><![CDATA[<p>Well, it may well be the case that &#8220;CPU cores will be constantly data-starved&#8221; in the future, certainly for many run-of-the-mill business applications.</p>
<p>In the recent past, most superscalar chips had only a very limited ability to extract instruction-level parallelism from most programs &#8212; that didn&#8217;t stop Intel from mass-producing those chips, even though they were utilized relatively inefficiently. Those chips (e.g. Pentium IV) were still useful, despite the inefficiency. Similarly, manycore chips may still be useful, even if they are relatively bandwidth-starved &#8212; which would make communication-intelligent designs increasingly important.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51909</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Fri, 13 Nov 2009 18:31:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51909</guid>
		<description>@Conway

There reason for this to be true is stated in my post: &lt;em&gt;Otherwise, CPU cores would be constantly data-starved in most multimedia and business applications.&lt;/em&gt; Intel will not mass-produce CPUs unless the technology to keep them busy with mainstream applications is out there.

Disclaimer: you can always find special cases, and nobody can predict the future.</description>
		<content:encoded><![CDATA[<p>@Conway</p>
<p>There reason for this to be true is stated in my post: <em>Otherwise, CPU cores would be constantly data-starved in most multimedia and business applications.</em> Intel will not mass-produce CPUs unless the technology to keep them busy with mainstream applications is out there.</p>
<p>Disclaimer: you can always find special cases, and nobody can predict the future.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Conway</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51908</link>
		<dc:creator>Neil Conway</dc:creator>
		<pubDate>Fri, 13 Nov 2009 18:02:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51908</guid>
		<description>&lt;i&gt;As we have more CPU cores, we also have more bandwidth to bring data to the the cores.&lt;/i&gt;

There is no reason for that to be true: advances in processor technology often follow a different curve than advances in memory architectures. It may well be the case that many-core architectures in the future are increasingly bandwidth-constrained: once data reaches a core, computation cycles are cheap, but data movement into / out of cores might be relatively expensive.</description>
		<content:encoded><![CDATA[<p><i>As we have more CPU cores, we also have more bandwidth to bring data to the the cores.</i></p>
<p>There is no reason for that to be true: advances in processor technology often follow a different curve than advances in memory architectures. It may well be the case that many-core architectures in the future are increasingly bandwidth-constrained: once data reaches a core, computation cycles are cheap, but data movement into / out of cores might be relatively expensive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51906</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Fri, 13 Nov 2009 15:58:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51906</guid>
		<description>@Greg

Thanks Greg. This is the kind of comment that makes blogging so profitable.

Shame on me: I did not even think of BigTable, and I don&#039;t know anything about their compression techniques... more reading for me... I hope to blog about it in the future.</description>
		<content:encoded><![CDATA[<p>@Greg</p>
<p>Thanks Greg. This is the kind of comment that makes blogging so profitable.</p>
<p>Shame on me: I did not even think of BigTable, and I don&#8217;t know anything about their compression techniques&#8230; more reading for me&#8230; I hope to blog about it in the future.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg</title>
		<link>http://lemire.me/blog/archives/2009/11/13/more-database-compression-means-more-speed-right/comment-page-1/#comment-51905</link>
		<dc:creator>Greg</dc:creator>
		<pubDate>Fri, 13 Nov 2009 15:52:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2301#comment-51905</guid>
		<description>Google&#039;s compression is probably worth a mention here too.  They tend to use very fast, lightweight compression as well (e.g. Zippy).

Some of it is described in their papers (a small section in the Bigtable paper) and talks (small mentions of it in Jeff Dean&#039;s talks, such as his Bigtable talk or his recent LADIS 2009 talk).</description>
		<content:encoded><![CDATA[<p>Google&#8217;s compression is probably worth a mention here too.  They tend to use very fast, lightweight compression as well (e.g. Zippy).</p>
<p>Some of it is described in their papers (a small section in the Bigtable paper) and talks (small mentions of it in Jeff Dean&#8217;s talks, such as his Bigtable talk or his recent LADIS 2009 talk).</p>
]]></content:encoded>
	</item>
</channel>
</rss>

