<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Do hash tables work in constant time?</title>
	<atom:link href="http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/</link>
	<description>Computer Scientist and Open Scholar: Databases, Information Retrieval, Business Intelligence.</description>
	<lastBuildDate>Wed, 08 Sep 2010 06:33:22 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51388</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 23:46:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51388</guid>
		<description>@Rachel I am happy with O(log k). But what if you are using strings as keys?</description>
		<content:encoded><![CDATA[<p>@Rachel I am happy with O(log k). But what if you are using strings as keys?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rachel Blum</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51384</link>
		<dc:creator>Rachel Blum</dc:creator>
		<pubDate>Tue, 18 Aug 2009 23:25:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51384</guid>
		<description>@Daniel - even w/ an upper bound on the number of elements, an array scan still depends on how full the array is, though.

A hashtable still is independent of the number of entries, for a fixed *key* size. The number of elements can be variable.

That&#039;s not saying keysize is not important, just that a hashtable w/ a fixed keysize is O(1) w/ respect to number of entries, and an array scan is O(n)

So I guess my quibble is that for a given key size, hash tables should be O(log k) - k being the maximum number of keys - , not O(log n).</description>
		<content:encoded><![CDATA[<p>@Daniel &#8211; even w/ an upper bound on the number of elements, an array scan still depends on how full the array is, though.</p>
<p>A hashtable still is independent of the number of entries, for a fixed *key* size. The number of elements can be variable.</p>
<p>That&#8217;s not saying keysize is not important, just that a hashtable w/ a fixed keysize is O(1) w/ respect to number of entries, and an array scan is O(n)</p>
<p>So I guess my quibble is that for a given key size, hash tables should be O(log k) &#8211; k being the maximum number of keys &#8211; , not O(log n).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: D. Eppstein</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51383</link>
		<dc:creator>D. Eppstein</dc:creator>
		<pubDate>Tue, 18 Aug 2009 23:05:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51383</guid>
		<description>Yes, multiplication isn&#039;t really constant time and memory access isn&#039;t really constant time, but if you&#039;re going to assess a penalty to those operations when you use them in hashing then you need to assess them consistently throughout whatever other algorithms you&#039;re using.

Or, you could use the uniform cost model, pretend the penalty is one, and get basically the same comparison between any two algorithms (that don&#039;t have drastically different memory hierarchy behavior) without all the pain.

Where this falls down, of course, is that the memory hierarchy behavior of hashing is bad. So if you compare it to something that&#039;s designed to have good lcality of reference, the uniform cost model may not be the right thing to yse.</description>
		<content:encoded><![CDATA[<p>Yes, multiplication isn&#8217;t really constant time and memory access isn&#8217;t really constant time, but if you&#8217;re going to assess a penalty to those operations when you use them in hashing then you need to assess them consistently throughout whatever other algorithms you&#8217;re using.</p>
<p>Or, you could use the uniform cost model, pretend the penalty is one, and get basically the same comparison between any two algorithms (that don&#8217;t have drastically different memory hierarchy behavior) without all the pain.</p>
<p>Where this falls down, of course, is that the memory hierarchy behavior of hashing is bad. So if you compare it to something that&#8217;s designed to have good lcality of reference, the uniform cost model may not be the right thing to yse.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51382</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 22:07:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51382</guid>
		<description>@Preston 

Regarding Pearson hashing, I specifically pointed out that multiplication was not required (see Disclaimer 1). As for Pearson being quite good... I agree, and more about this another day. Watch this blog for follow-ups. (Man! Do I wish I knew more people who knew about Pearson hashing!)

As for using Vectorization, the idea I had was to implement several hash tables that are queried simultaneously; or a single hash table that is queried in small batches.  Why not? People are implementing and selling database engines designed around vectorization. Am I going to try doing building these vectorized hash tables today? Probably not.

I was just trying to get people to think about their assumptions this morning and it degenerated...</description>
		<content:encoded><![CDATA[<p>@Preston </p>
<p>Regarding Pearson hashing, I specifically pointed out that multiplication was not required (see Disclaimer 1). As for Pearson being quite good&#8230; I agree, and more about this another day. Watch this blog for follow-ups. (Man! Do I wish I knew more people who knew about Pearson hashing!)</p>
<p>As for using Vectorization, the idea I had was to implement several hash tables that are queried simultaneously; or a single hash table that is queried in small batches.  Why not? People are implementing and selling database engines designed around vectorization. Am I going to try doing building these vectorized hash tables today? Probably not.</p>
<p>I was just trying to get people to think about their assumptions this morning and it degenerated&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51380</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 21:52:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51380</guid>
		<description>@Rachel 


&lt;i&gt; But for most hash table implementations, the key size is fixed. Hence there is a constant upper bound for modulus, hence O(1)

 That shouldn&#039;t change for vectorization, either, as long as the key size has a fixed upper bound.

 What am I missing?&lt;/i&gt;

If you fix the maximal number of keys, then all data structures run in constant time. Even a linear scan through an array will never use more than a fixed number of operations.

That&#039;s not very exciting.</description>
		<content:encoded><![CDATA[<p>@Rachel </p>
<p><i> But for most hash table implementations, the key size is fixed. Hence there is a constant upper bound for modulus, hence O(1)</p>
<p> That shouldn&#8217;t change for vectorization, either, as long as the key size has a fixed upper bound.</p>
<p> What am I missing?</i></p>
<p>If you fix the maximal number of keys, then all data structures run in constant time. Even a linear scan through an array will never use more than a fixed number of operations.</p>
<p>That&#8217;s not very exciting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Preston L. Bannister</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51379</link>
		<dc:creator>Preston L. Bannister</dc:creator>
		<pubDate>Tue, 18 Aug 2009 21:44:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51379</guid>
		<description>Perhaps I am not tracking your argument, but as far as I can tell, 32 and 64-bit integer multiply instructions take a fixed number of clocks on recent x86 CPUs (the fixed time multiply hardware went into either the 386 or 486, if memory serves). Presuming you are interested in hash tables that fit in memory, integer multiply instructions are sufficient.

Not sure how you would use vectorization with hashing, at least not in the usual cases.

Then again, for my purpose, simple Pearson hashing (not using multiply or divide) has always won out when measured over actual problem sizes.</description>
		<content:encoded><![CDATA[<p>Perhaps I am not tracking your argument, but as far as I can tell, 32 and 64-bit integer multiply instructions take a fixed number of clocks on recent x86 CPUs (the fixed time multiply hardware went into either the 386 or 486, if memory serves). Presuming you are interested in hash tables that fit in memory, integer multiply instructions are sufficient.</p>
<p>Not sure how you would use vectorization with hashing, at least not in the usual cases.</p>
<p>Then again, for my purpose, simple Pearson hashing (not using multiply or divide) has always won out when measured over actual problem sizes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rachel Blum</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51377</link>
		<dc:creator>Rachel Blum</dc:creator>
		<pubDate>Tue, 18 Aug 2009 21:30:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51377</guid>
		<description>As I said - it *is* a true statement when you consider variable key size.

But for most hash table implementations, the key size is fixed. Hence there is a constant upper bound for modulus, hence O(1)

That shouldn&#039;t change for vectorization, either, as long as the key size has a fixed upper bound.

What am I missing?</description>
		<content:encoded><![CDATA[<p>As I said &#8211; it *is* a true statement when you consider variable key size.</p>
<p>But for most hash table implementations, the key size is fixed. Hence there is a constant upper bound for modulus, hence O(1)</p>
<p>That shouldn&#8217;t change for vectorization, either, as long as the key size has a fixed upper bound.</p>
<p>What am I missing?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51374</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 20:46:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51374</guid>
		<description>@Rachel

&lt;i&gt;And yes, larger hash tables are slower – but their time complexity is the same. (Don’t you love complexity arithemtic?)&lt;/i&gt;

When a 2^m-key hash table runs 2 times slower than a 2^2m-key hash table, then you have O(log n) complexity by definition. And that&#039;s precisely what happens when running multiple hash table queries simultaneously with vectorization.</description>
		<content:encoded><![CDATA[<p>@Rachel</p>
<p><i>And yes, larger hash tables are slower – but their time complexity is the same. (Don’t you love complexity arithemtic?)</i></p>
<p>When a 2^m-key hash table runs 2 times slower than a 2^2m-key hash table, then you have O(log n) complexity by definition. And that&#8217;s precisely what happens when running multiple hash table queries simultaneously with vectorization.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rachel Blum</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51369</link>
		<dc:creator>Rachel Blum</dc:creator>
		<pubDate>Tue, 18 Aug 2009 20:21:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51369</guid>
		<description>That is *still* constant time... 4*O(1)=O(1)

(I&#039;m well aware of vectorization - given that I work on games, it&#039;s inevitable ;)

Only if your key size were unlimited it would be an issue. (Because then time required for hash computation would indeed change w/ the size of your key)

And yes, larger hash tables are slower - but their time complexity is the same. (Don&#039;t you love complexity arithemtic? ;)</description>
		<content:encoded><![CDATA[<p>That is *still* constant time&#8230; 4*O(1)=O(1)</p>
<p>(I&#8217;m well aware of vectorization &#8211; given that I work on games, it&#8217;s inevitable <img src='http://www.daniel-lemire.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Only if your key size were unlimited it would be an issue. (Because then time required for hash computation would indeed change w/ the size of your key)</p>
<p>And yes, larger hash tables are slower &#8211; but their time complexity is the same. (Don&#8217;t you love complexity arithemtic? <img src='http://www.daniel-lemire.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51353</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 18:47:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51353</guid>
		<description>@Sammy

When using larger hash tables, you have to compute larger hash values. This takes more time.

Try it out. Right now. Implement a hash table with 2^256 keys and one with 2^16 keys. You&#039;ll see that multiplying 256-bit integers takes longer. Thus, your 2^256-element hash table will be slower.

True. Nobody right now has databases with 2^256 elements. But with vectorization, you can do 4 multiplications in 16-bit in the time it takes to do one 64-bit multiplication.

So, right now, in real systems, I claim that larger hash tables could be slower, even if I discard the memory access time.

True. Few people are using vectorization *right now*. But some important people are using it for important database applications.</description>
		<content:encoded><![CDATA[<p>@Sammy</p>
<p>When using larger hash tables, you have to compute larger hash values. This takes more time.</p>
<p>Try it out. Right now. Implement a hash table with 2^256 keys and one with 2^16 keys. You&#8217;ll see that multiplying 256-bit integers takes longer. Thus, your 2^256-element hash table will be slower.</p>
<p>True. Nobody right now has databases with 2^256 elements. But with vectorization, you can do 4 multiplications in 16-bit in the time it takes to do one 64-bit multiplication.</p>
<p>So, right now, in real systems, I claim that larger hash tables could be slower, even if I discard the memory access time.</p>
<p>True. Few people are using vectorization *right now*. But some important people are using it for important database applications.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51351</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 18:37:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51351</guid>
		<description>@Rachel

Thank you for your comments. Yes, I like to play games, but please give me some credit. 

Yes, we always assume that multiplications take constant time because we assume that we take in 64-bit integers and produce 64-bit integers using 64-bit processors. 

But is it true?

No, it is not if you use vectorization. With vectorization, you can multiply four pairs of 16-bit numbers in the time it takes to multiply one pair of 64-bit numbers. And yes, people use vectorization, right now, in commercial products.

Further reading:

http://en.wikipedia.org/wiki/Vectorization_(computer_science)</description>
		<content:encoded><![CDATA[<p>@Rachel</p>
<p>Thank you for your comments. Yes, I like to play games, but please give me some credit. </p>
<p>Yes, we always assume that multiplications take constant time because we assume that we take in 64-bit integers and produce 64-bit integers using 64-bit processors. </p>
<p>But is it true?</p>
<p>No, it is not if you use vectorization. With vectorization, you can multiply four pairs of 16-bit numbers in the time it takes to multiply one pair of 64-bit numbers. And yes, people use vectorization, right now, in commercial products.</p>
<p>Further reading:</p>
<p><a href="http://en.wikipedia.org/wiki/Vectorization_(computer_science)" rel="nofollow">http://en.wikipedia.org/wiki/Vectorization_(computer_science)</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sammy Larbi</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51352</link>
		<dc:creator>Sammy Larbi</dc:creator>
		<pubDate>Tue, 18 Aug 2009 18:37:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51352</guid>
		<description>When doing complexity analysis you are supposed to be comparing apples with apples: how fast does the time or space complexity grow /with respect to the size of the input n/. 

You are taking n as the size of the hash table at the beginning, then using n as the size of one element in another part, and Colin is (I&#039;m guessing facetiously) taking n as the number of logic gates in a stick of RAM.

Neither of these variables change as n changes, so it doesn&#039;t make sense to include them in analysis. Even if you do include them, they are still constants, and choosing M and x (following http://en.wikipedia.org/wiki/Big_O_notation) appropriately would show you the correct result.</description>
		<content:encoded><![CDATA[<p>When doing complexity analysis you are supposed to be comparing apples with apples: how fast does the time or space complexity grow /with respect to the size of the input n/. </p>
<p>You are taking n as the size of the hash table at the beginning, then using n as the size of one element in another part, and Colin is (I&#8217;m guessing facetiously) taking n as the number of logic gates in a stick of RAM.</p>
<p>Neither of these variables change as n changes, so it doesn&#8217;t make sense to include them in analysis. Even if you do include them, they are still constants, and choosing M and x (following <a href="http://en.wikipedia.org/wiki/Big_O_notation)" rel="nofollow">http://en.wikipedia.org/wiki/Big_O_notation)</a> appropriately would show you the correct result.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rachel Blum</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51348</link>
		<dc:creator>Rachel Blum</dc:creator>
		<pubDate>Tue, 18 Aug 2009 18:17:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51348</guid>
		<description>You&#039;re playing semantic games. If you look at algorithms, modulus computation is always considered O(1). And it is in most commercial CPUs. The time for a multiplication is indepedent of the size of the number, assuming we stay in the natural word size. The complexity you mention has simply been moved from time-complexity to gate-complexity.

And even if we stipulate hashing were O(log n) - where do you make the jump to O(n log n)?</description>
		<content:encoded><![CDATA[<p>You&#8217;re playing semantic games. If you look at algorithms, modulus computation is always considered O(1). And it is in most commercial CPUs. The time for a multiplication is indepedent of the size of the number, assuming we stay in the natural word size. The complexity you mention has simply been moved from time-complexity to gate-complexity.</p>
<p>And even if we stipulate hashing were O(log n) &#8211; where do you make the jump to O(n log n)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51346</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 18 Aug 2009 18:07:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51346</guid>
		<description>&lt;i&gt; Thus accessing an entry in a hash table of size n takes time O((log n)^(1+epsilon)), not O(n^(1+epsilon)).&lt;/i&gt;

Duh! Thanks. (I don&#039;t think anyone would be using hash tables, had I been right...) (:shame:)
</description>
		<content:encoded><![CDATA[<p><i> Thus accessing an entry in a hash table of size n takes time O((log n)^(1+epsilon)), not O(n^(1+epsilon)).</i></p>
<p>Duh! Thanks. (I don&#8217;t think anyone would be using hash tables, had I been right&#8230;) (:shame:)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Colin Percival</title>
		<link>http://www.daniel-lemire.com/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/comment-page-1/#comment-51345</link>
		<dc:creator>Colin Percival</dc:creator>
		<pubDate>Tue, 18 Aug 2009 17:48:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2121#comment-51345</guid>
		<description>You lost a logarithm: &quot;Multiplications of numbers in [0,m) can almost be done in time O(m log m)&quot; should read &quot;Multiplications of numbers in [0, 2^m) can almost be done in time O(m log m)&quot;.

Thus accessing an entry in a hash table of size n takes time O((log n)^(1+epsilon)), not O(n^(1+epsilon)).

There&#039;s a more fundamental for hash tables not being constant-time, however: Random-access memory isn&#039;t constant-time.  You&#039;ve got at least log n gate delays; and once you start dealing with large amounts of storage, you&#039;ve got a speed-of-light cost of O(n^(1/2)) or O(n^(1/3)), depending on whether your circuits are two- or three- dimensional.</description>
		<content:encoded><![CDATA[<p>You lost a logarithm: &#8220;Multiplications of numbers in [0,m) can almost be done in time O(m log m)&#8221; should read &#8220;Multiplications of numbers in [0, 2^m) can almost be done in time O(m log m)&#8221;.</p>
<p>Thus accessing an entry in a hash table of size n takes time O((log n)^(1+epsilon)), not O(n^(1+epsilon)).</p>
<p>There&#8217;s a more fundamental for hash tables not being constant-time, however: Random-access memory isn&#8217;t constant-time.  You&#8217;ve got at least log n gate delays; and once you start dealing with large amounts of storage, you&#8217;ve got a speed-of-light cost of O(n^(1/2)) or O(n^(1/3)), depending on whether your circuits are two- or three- dimensional.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
