<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The mythical bitmap index</title>
	<atom:link href="http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/</link>
	<description>Computer Science researcher and Open Scholar: Web, OLAP, Databases, Time Series, Collaborative Filtering, Information Retrieval, e-Learning.</description>
	<lastBuildDate>Wed, 28 Jul 2010 05:44:16 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50782</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Wed, 11 Mar 2009 13:24:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50782</guid>
		<description>&lt;i&gt; I didn&#039;t mean interval encoding outperforms WAH, I agree that WAH or EWAH could be the best scheme for bitmap indexes. I have read Wu&#039;s ACM paper(38 pages) and it really makes sense.&lt;/i&gt;

If you have a high selectivity query, your effort should be small. But with interval coding, your effort is always proportional to the number of rows in the entire table. That is not good for high selectivity queries. And interval coding does not even fare all that well for low selectivity queries where I prefer projection and bit-sliced indexes. For one thing, it is far simpler to implement a projection index to interval coding!

&lt;i&gt; As for approximate bitmap(AB) index, I implemented it and improved it into a accurate scheme with a small space and time overhead. But experiments show that the AB scheme is hard to be pratical for it only suits for small query region(narrow row and col selectivity). This technique essentially doesn&#039;t work for bitmap index. The problem is the heavy iterative cost for making (row+col) keys for hashing, which is far more time consuming compared to CPU bitwise operations as WAH exploits.&lt;/i&gt;


It is unfortunately easy to underestimate the CPU bottleneck when working on paper. Researchers often write that since we have 4 cores or more per CPU, CPU cycles are cheap. But that is not so simple, is it?</description>
		<content:encoded><![CDATA[<p><i> I didn&#8217;t mean interval encoding outperforms WAH, I agree that WAH or EWAH could be the best scheme for bitmap indexes. I have read Wu&#8217;s ACM paper(38 pages) and it really makes sense.</i></p>
<p>If you have a high selectivity query, your effort should be small. But with interval coding, your effort is always proportional to the number of rows in the entire table. That is not good for high selectivity queries. And interval coding does not even fare all that well for low selectivity queries where I prefer projection and bit-sliced indexes. For one thing, it is far simpler to implement a projection index to interval coding!</p>
<p><i> As for approximate bitmap(AB) index, I implemented it and improved it into a accurate scheme with a small space and time overhead. But experiments show that the AB scheme is hard to be pratical for it only suits for small query region(narrow row and col selectivity). This technique essentially doesn&#8217;t work for bitmap index. The problem is the heavy iterative cost for making (row+col) keys for hashing, which is far more time consuming compared to CPU bitwise operations as WAH exploits.</i></p>
<p>It is unfortunately easy to underestimate the CPU bottleneck when working on paper. Researchers often write that since we have 4 cores or more per CPU, CPU cycles are cheap. But that is not so simple, is it?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zhuo wang</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50779</link>
		<dc:creator>zhuo wang</dc:creator>
		<pubDate>Wed, 11 Mar 2009 09:54:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50779</guid>
		<description>Thanks，daniel. :-)
I didn&#039;t mean interval encoding outperforms WAH, I agree that WAH or EWAH could be the best scheme for bitmap indexes. I have read Wu&#039;s ACM paper(38 pages) and it really makes sense.

As for approximate bitmap(AB) index, I implemented it and improved it into a accurate scheme with a small space and time overhead. But experiments show that the AB scheme is hard to be pratical for it only suits for small query region(narrow row and col selectivity). This technique essentially doesn&#039;t work for bitmap index. The problem is the heavy iterative cost for making (row+col) keys for hashing, which is far more time consuming compared to CPU bitwise operations as WAH exploits.</description>
		<content:encoded><![CDATA[<p>Thanks，daniel. <img src='http://www.daniel-lemire.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
I didn&#8217;t mean interval encoding outperforms WAH, I agree that WAH or EWAH could be the best scheme for bitmap indexes. I have read Wu&#8217;s ACM paper(38 pages) and it really makes sense.</p>
<p>As for approximate bitmap(AB) index, I implemented it and improved it into a accurate scheme with a small space and time overhead. But experiments show that the AB scheme is hard to be pratical for it only suits for small query region(narrow row and col selectivity). This technique essentially doesn&#8217;t work for bitmap index. The problem is the heavy iterative cost for making (row+col) keys for hashing, which is far more time consuming compared to CPU bitwise operations as WAH exploits.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50778</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Wed, 11 Mar 2009 02:45:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50778</guid>
		<description>&lt;i&gt; According to Wu&#039;s paper, the WAH space complexity is 2N words for most datasets, 4N for the worst case. Note it is 2N words,that is 64N bits. My m*b/2*n is in bits!m and b are both very small, isn&#039;t it? I can&#039;t image m L^(m/2)/2, since L can be a very big number. Chan would not be so foolish to present such a solution. :-)&lt;/i&gt;

Chan is far from foolish. That is not my point. Interval coding is a nice idea, but it fails for non-trivial reasons. You only realize it is a bad idea after playing with real-world implementations of the idea.

The formula is something like m L^(1/m)/2 and not m L^(m/2)/2.  To make L^(m/2)/2 into a small constant, you need for m to be rather large (say m=5). 

Eventually, you will make interval coding to be just as small as a regular BBC, WAH or EWAH bitmap index, but what will happen of your query performance? Remember that every time you use the multicomponent trick, your performance degrades... and it does so faster than you might think! To see why, try to implement it! You will see that it gets nasty, and you end up having rather complicated boolean functions.

So you will need to load many incompressible bitmaps. Hence, for any range query, the amount of data you will load is in Omega(n) because your bitmaps are incompressible. The only way this is ok is if your selectivity is always very low. But then, at that point, why not use a Bit-Sliced or projection indexes?

There is absolutely no way you can get good practical performance, and nice fundamental properties, with incompressible bitmaps. Multicomponent is not a good substitute to compression because it trades off space for speed. You want to get better speed and less storage, both together! RLE compression does exactly this!

&lt;i&gt; BTW, how do you comment on Hakan&#039;s Approximate bitmap index? I have developed this scheme into an 100% accurate one with 0 false positives.
 Do you think this is promising?&lt;/i&gt;

Hard to tell. Send me a draft and I&#039;ll give you an opinion. But a good sanity test is to implement your scheme and see how fast it is!

I&#039;m guessing you have a scheme that allows random access? Is it random access in constant time? Then what is the constant?</description>
		<content:encoded><![CDATA[<p><i> According to Wu&#8217;s paper, the WAH space complexity is 2N words for most datasets, 4N for the worst case. Note it is 2N words,that is 64N bits. My m*b/2*n is in bits!m and b are both very small, isn&#8217;t it? I can&#8217;t image m L^(m/2)/2, since L can be a very big number. Chan would not be so foolish to present such a solution. <img src='http://www.daniel-lemire.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </i></p>
<p>Chan is far from foolish. That is not my point. Interval coding is a nice idea, but it fails for non-trivial reasons. You only realize it is a bad idea after playing with real-world implementations of the idea.</p>
<p>The formula is something like m L^(1/m)/2 and not m L^(m/2)/2.  To make L^(m/2)/2 into a small constant, you need for m to be rather large (say m=5). </p>
<p>Eventually, you will make interval coding to be just as small as a regular BBC, WAH or EWAH bitmap index, but what will happen of your query performance? Remember that every time you use the multicomponent trick, your performance degrades&#8230; and it does so faster than you might think! To see why, try to implement it! You will see that it gets nasty, and you end up having rather complicated boolean functions.</p>
<p>So you will need to load many incompressible bitmaps. Hence, for any range query, the amount of data you will load is in Omega(n) because your bitmaps are incompressible. The only way this is ok is if your selectivity is always very low. But then, at that point, why not use a Bit-Sliced or projection indexes?</p>
<p>There is absolutely no way you can get good practical performance, and nice fundamental properties, with incompressible bitmaps. Multicomponent is not a good substitute to compression because it trades off space for speed. You want to get better speed and less storage, both together! RLE compression does exactly this!</p>
<p><i> BTW, how do you comment on Hakan&#8217;s Approximate bitmap index? I have developed this scheme into an 100% accurate one with 0 false positives.<br />
 Do you think this is promising?</i></p>
<p>Hard to tell. Send me a draft and I&#8217;ll give you an opinion. But a good sanity test is to implement your scheme and see how fast it is!</p>
<p>I&#8217;m guessing you have a scheme that allows random access? Is it random access in constant time? Then what is the constant?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zhuo wang</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50777</link>
		<dc:creator>zhuo wang</dc:creator>
		<pubDate>Tue, 10 Mar 2009 23:23:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50777</guid>
		<description>Thank you for your links to those matierals...

According to Wu&#039;s paper, the WAH space complexity is 2N words for most datasets, 4N for the worst case. Note it is 2N words,that is 64N bits. My m*b/2*n is in bits!m and b are both very small, isn&#039;t it? I can&#039;t image m L^(m/2)/2, since L can be a very big number. Chan would not be so foolish to present such a solution. :-)

BTW, how do you comment on Hakan&#039;s Approximate bitmap index? I have developed this scheme into an 100% accurate one with 0 false positives.
Do you think this is promising?</description>
		<content:encoded><![CDATA[<p>Thank you for your links to those matierals&#8230;</p>
<p>According to Wu&#8217;s paper, the WAH space complexity is 2N words for most datasets, 4N for the worst case. Note it is 2N words,that is 64N bits. My m*b/2*n is in bits!m and b are both very small, isn&#8217;t it? I can&#8217;t image m L^(m/2)/2, since L can be a very big number. Chan would not be so foolish to present such a solution. <img src='http://www.daniel-lemire.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>BTW, how do you comment on Hakan&#8217;s Approximate bitmap index? I have developed this scheme into an 100% accurate one with 0 false positives.<br />
Do you think this is promising?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50774</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 10 Mar 2009 14:15:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50774</guid>
		<description>&lt;i&gt; where can I find the description of EWAH?&lt;/i&gt;

EWAH is not an important contribution, nevertheless... see this paper:

&lt;a href=&quot;http://arxiv.org/abs/0901.3751&quot; rel=&quot;nofollow&quot;&gt;Sorting improves word-aligned bitmap indexes&lt;/a&gt;

See also my slides:

http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them

&lt;i&gt; What is hierarchical coding? where can I find it?&lt;/i&gt;

 R. R. Sinha, M. Winslett, Multi-resolution bitmap indexes for scientific 
data, ACM Trans. Database Syst. 32 (3) (2007)

&lt;i&gt; Yes, results of interval encoding and such like will be hard to compress further, but they are not supposed to do that. The n*L bits with L distinct values and n rows are not interval encoding, they are just simple(basic) bitmap index. For interval encoding, with m components, let m = log_b_(L),it will produce m*(b/2)*n bits, far less than n*L bits.&lt;/i&gt;

With RLE-based compression (including BBC, WAH, EWAH), compression accelerates query as well as reducing storage. The total storage will be O(n) with a small constant.

Note that Chan and Ionnadis did not invent multicomponent indexing. It goes back to the seventies. What they stated is that if you don&#039;t compress your bitmaps using RLE compression, then one level of multicomponent indexing was ideal. The problem with this analysis is that there is no reason not to compress the bitmap indexes with RLE.

Then, they invented interval coding. The problem is that interval coding precludes compression!

Yes, you can use multicomponents indexes... but you can use multicomponents with anything, not just interval coding. Is your factor m*b/2 small? It is actually m L^(1/m)/2. It is pretty hard to make it into a small constant.

Think about the fact that if the index is ten times the size of the original table, this may be prohibitive. Indexes are not supposed to be much larger than the original data set... Very large indexes tend to be slow in practice due to buffering problems.


But read what Sinha and Winslett have written on this topic. Wu has also written on why interval coding is not a good idea.</description>
		<content:encoded><![CDATA[<p><i> where can I find the description of EWAH?</i></p>
<p>EWAH is not an important contribution, nevertheless&#8230; see this paper:</p>
<p><a href="http://arxiv.org/abs/0901.3751" rel="nofollow">Sorting improves word-aligned bitmap indexes</a></p>
<p>See also my slides:</p>
<p><a href="http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them" rel="nofollow">http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them</a></p>
<p><i> What is hierarchical coding? where can I find it?</i></p>
<p> R. R. Sinha, M. Winslett, Multi-resolution bitmap indexes for scientific<br />
data, ACM Trans. Database Syst. 32 (3) (2007)</p>
<p><i> Yes, results of interval encoding and such like will be hard to compress further, but they are not supposed to do that. The n*L bits with L distinct values and n rows are not interval encoding, they are just simple(basic) bitmap index. For interval encoding, with m components, let m = log_b_(L),it will produce m*(b/2)*n bits, far less than n*L bits.</i></p>
<p>With RLE-based compression (including BBC, WAH, EWAH), compression accelerates query as well as reducing storage. The total storage will be O(n) with a small constant.</p>
<p>Note that Chan and Ionnadis did not invent multicomponent indexing. It goes back to the seventies. What they stated is that if you don&#8217;t compress your bitmaps using RLE compression, then one level of multicomponent indexing was ideal. The problem with this analysis is that there is no reason not to compress the bitmap indexes with RLE.</p>
<p>Then, they invented interval coding. The problem is that interval coding precludes compression!</p>
<p>Yes, you can use multicomponents indexes&#8230; but you can use multicomponents with anything, not just interval coding. Is your factor m*b/2 small? It is actually m L^(1/m)/2. It is pretty hard to make it into a small constant.</p>
<p>Think about the fact that if the index is ten times the size of the original table, this may be prohibitive. Indexes are not supposed to be much larger than the original data set&#8230; Very large indexes tend to be slow in practice due to buffering problems.</p>
<p>But read what Sinha and Winslett have written on this topic. Wu has also written on why interval coding is not a good idea.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zhuo wang</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50772</link>
		<dc:creator>zhuo wang</dc:creator>
		<pubDate>Tue, 10 Mar 2009 01:01:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50772</guid>
		<description>where can I find the description of EWAH? What is hierarchical coding? where can I find it?

Yes, results of interval encoding and such like will be hard to compress further, but they are not supposed to do that. The n*L bits with L distinct values and n rows are not interval encoding, they are just simple(basic) bitmap index. For interval encoding, with m components, let m = log_b_(L),it will produce m*(b/2)*n bits, far less than n*L bits.</description>
		<content:encoded><![CDATA[<p>where can I find the description of EWAH? What is hierarchical coding? where can I find it?</p>
<p>Yes, results of interval encoding and such like will be hard to compress further, but they are not supposed to do that. The n*L bits with L distinct values and n rows are not interval encoding, they are just simple(basic) bitmap index. For interval encoding, with m components, let m = log_b_(L),it will produce m*(b/2)*n bits, far less than n*L bits.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50771</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Mon, 09 Mar 2009 18:03:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50771</guid>
		<description>The general idea behind WAH is sound and that&#039;s what people should be using. Not necessarily WAH itself, but some variant. For example, we decided to use EWAH. (Note that WAH itself is patented.)

Chan&#039;s interval encoding is a bit of a problem because it will generate an incompressible index. If you have L different attribute values in your column, and n rows, you will end up with nL bits that are incompressible. This can be quite a pain. It will be slow to index, it will use a lot of storage, it will be hard to keep things in memory, and so on.

I would not recommend interval coding. Rather, look at hierarchical coding with maybe 2-3 levels.</description>
		<content:encoded><![CDATA[<p>The general idea behind WAH is sound and that&#8217;s what people should be using. Not necessarily WAH itself, but some variant. For example, we decided to use EWAH. (Note that WAH itself is patented.)</p>
<p>Chan&#8217;s interval encoding is a bit of a problem because it will generate an incompressible index. If you have L different attribute values in your column, and n rows, you will end up with nL bits that are incompressible. This can be quite a pain. It will be slow to index, it will use a lot of storage, it will be hard to keep things in memory, and so on.</p>
<p>I would not recommend interval coding. Rather, look at hierarchical coding with maybe 2-3 levels.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zhuo wang</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50769</link>
		<dc:creator>zhuo wang</dc:creator>
		<pubDate>Mon, 09 Mar 2009 06:07:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50769</guid>
		<description>Can we say that WAH is the best of all kinds of bitmap indices? How does chan&#039;s multi-component interval encoding method work compared to WAH?
Will WAH be the overall winner? According to Wu&#039;s paper, the WAH compression works well for sparse bitmaps,i.e.,the large cardinality, and skewed data. Maybe some other schemes will beat WAH in lower cardinality.</description>
		<content:encoded><![CDATA[<p>Can we say that WAH is the best of all kinds of bitmap indices? How does chan&#8217;s multi-component interval encoding method work compared to WAH?<br />
Will WAH be the overall winner? According to Wu&#8217;s paper, the WAH compression works well for sparse bitmaps,i.e.,the large cardinality, and skewed data. Maybe some other schemes will beat WAH in lower cardinality.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50529</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Sat, 10 Jan 2009 15:36:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50529</guid>
		<description>@Otis: Wu is an expert indeed. Please see his 2006 paper &quot;Optimizing bitmap indices with efficient compression&quot;

I quote their abstract:

&quot; In this paper, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes com- 
pressed bitmap indices efficient even for high cardinality attributes. &quot;

(WAH is based on BBC which is used by Oracle.)</description>
		<content:encoded><![CDATA[<p>@Otis: Wu is an expert indeed. Please see his 2006 paper &#8220;Optimizing bitmap indices with efficient compression&#8221;</p>
<p>I quote their abstract:</p>
<p>&#8221; In this paper, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes com-<br />
pressed bitmap indices efficient even for high cardinality attributes. &#8221;</p>
<p>(WAH is based on BBC which is used by Oracle.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Otis Gospodnetic</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50528</link>
		<dc:creator>Otis Gospodnetic</dc:creator>
		<pubDate>Sat, 10 Jan 2009 05:53:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50528</guid>
		<description>I&#039;m no expert, but it seems people who authored http://i.cs.hku.hk/~ssdbm/slides/SSDBM.July11/Session7.2.pdf are, and I think they claim the opposite.</description>
		<content:encoded><![CDATA[<p>I&#8217;m no expert, but it seems people who authored <a href="http://i.cs.hku.hk/~ssdbm/slides/SSDBM.July11/Session7.2.pdf" rel="nofollow">http://i.cs.hku.hk/~ssdbm/slides/SSDBM.July11/Session7.2.pdf</a> are, and I think they claim the opposite.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Parand</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50121</link>
		<dc:creator>Parand</dc:creator>
		<pubDate>Mon, 25 Aug 2008 16:56:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50121</guid>
		<description>Thanks Daniel, I learned something today.</description>
		<content:encoded><![CDATA[<p>Thanks Daniel, I learned something today.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Haugeland</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50108</link>
		<dc:creator>John Haugeland</dc:creator>
		<pubDate>Thu, 21 Aug 2008 14:31:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50108</guid>
		<description>Jesus, you&#039;re surprised that Wikipedia is handing out bad technical advice?  Have you noticed who writes it yet?

http://sc.tri-bit.com/outgoing/NeverTrustWikipediaTechnicalArticles.png

Also, your anti-spam mechanism is inappropriately case sensitive.</description>
		<content:encoded><![CDATA[<p>Jesus, you&#8217;re surprised that Wikipedia is handing out bad technical advice?  Have you noticed who writes it yet?</p>
<p><a href="http://sc.tri-bit.com/outgoing/NeverTrustWikipediaTechnicalArticles.png" rel="nofollow">http://sc.tri-bit.com/outgoing/NeverTrustWikipediaTechnicalArticles.png</a></p>
<p>Also, your anti-spam mechanism is inappropriately case sensitive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50107</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Thu, 21 Aug 2008 12:48:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50107</guid>
		<description>&lt;i&gt; If all the values of a field (indexed with a bitmap field) were unique would the bitmap indices be extremely short?  Would there need to be a tree-structure to store the indicies (or their locations)?  How is the correct bitmap index, for a field, found when there are large numbers of distinct values?&lt;/i&gt;


As I point out in my post, you typically use a b-tree to get the location of the bitmap. You need some sort of b-tree-like data structure because compressed bitmaps have different sizes so you can&#039;t just have them in a fix-length-record flat file.

It is not very exciting to index a field where all values are distinct because, yes, the bitmaps would all be very short. Ah! But you rarely have a single dimension. Bitmap indexes are typically used in a DSS context where you &quot;always&quot; have numerous dimensions.</description>
		<content:encoded><![CDATA[<p><i> If all the values of a field (indexed with a bitmap field) were unique would the bitmap indices be extremely short?  Would there need to be a tree-structure to store the indicies (or their locations)?  How is the correct bitmap index, for a field, found when there are large numbers of distinct values?</i></p>
<p>As I point out in my post, you typically use a b-tree to get the location of the bitmap. You need some sort of b-tree-like data structure because compressed bitmaps have different sizes so you can&#8217;t just have them in a fix-length-record flat file.</p>
<p>It is not very exciting to index a field where all values are distinct because, yes, the bitmaps would all be very short. Ah! But you rarely have a single dimension. Bitmap indexes are typically used in a DSS context where you &#8220;always&#8221; have numerous dimensions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50106</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Thu, 21 Aug 2008 12:46:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50106</guid>
		<description>Thank you Will.</description>
		<content:encoded><![CDATA[<p>Thank you Will.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Will</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/comment-page-1/#comment-50105</link>
		<dc:creator>Will</dc:creator>
		<pubDate>Thu, 21 Aug 2008 12:28:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/08/20/the-mythical-bitmap-index/#comment-50105</guid>
		<description>Daniel, 

I&#039;m installing the software today, and don&#039;t have the GNU seq command installed; so the /testequalityqueries.sh failed.

Let me suggest you replace `seq 0 10` with
0 1 2 3 4 5 6 7 8 9 10 in the two places seq is called in the script. This was then successful.</description>
		<content:encoded><![CDATA[<p>Daniel, </p>
<p>I&#8217;m installing the software today, and don&#8217;t have the GNU seq command installed; so the /testequalityqueries.sh failed.</p>
<p>Let me suggest you replace `seq 0 10` with<br />
0 1 2 3 4 5 6 7 8 9 10 in the two places seq is called in the script. This was then successful.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
