<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: External-Memory Shuffles?</title>
	<atom:link href="http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/</link>
	<description>Daniel Lemire's blog is about life in academia, research in Computer Science, wondering how we can reconcile fast databases and algorithms with the informal and asemantic nature of the world around us. It is broadcasted from Montreal (Canada).</description>
	<pubDate>Thu, 08 Jan 2009 18:59:18 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/comment-page-1/#comment-49734</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Thu, 14 Feb 2008 01:00:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/#comment-49734</guid>
		<description>Thanks David.

I edited my post to remove the CRC64 idea since I believe that "sort --random-sort" does exactly this, though they don't specify their hashing family.</description>
		<content:encoded><![CDATA[<p>Thanks David.</p>
<p>I edited my post to remove the CRC64 idea since I believe that &#8220;sort &#8211;random-sort&#8221; does exactly this, though they don&#8217;t specify their hashing family.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/comment-page-1/#comment-49733</link>
		<dc:creator>David</dc:creator>
		<pubDate>Wed, 13 Feb 2008 23:41:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/#comment-49733</guid>
		<description>I like the CRC64 solution, although, why not go even crazier and use a SHA algorithm?  The disadvantage of these approaches is that you need one pass to generate the random number (prepending it to the line) and another pass to sort the lines.
Unf. my sha1sum command doesn't seem to accept input from the command line, making it difficult to answer with a nice one liner.</description>
		<content:encoded><![CDATA[<p>I like the CRC64 solution, although, why not go even crazier and use a SHA algorithm?  The disadvantage of these approaches is that you need one pass to generate the random number (prepending it to the line) and another pass to sort the lines.<br />
Unf. my sha1sum command doesn&#8217;t seem to accept input from the command line, making it difficult to answer with a nice one liner.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Suresh</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/comment-page-1/#comment-49732</link>
		<dc:creator>Suresh</dc:creator>
		<pubDate>Wed, 13 Feb 2008 19:50:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/#comment-49732</guid>
		<description>True, but you're only shuffling the range [1..n], not the actual numbers, no ? I'm guessing that it won't scale beyond what you can fit in memory though.</description>
		<content:encoded><![CDATA[<p>True, but you&#8217;re only shuffling the range [1..n], not the actual numbers, no ? I&#8217;m guessing that it won&#8217;t scale beyond what you can fit in memory though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/comment-page-1/#comment-49731</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Wed, 13 Feb 2008 19:28:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/#comment-49731</guid>
		<description>Thanks Suresh. To implement a "Knuth shuffle", don't you need random access to the items to swap them? For my problem, I specify a "variable-length-record flat file". If I am allowed to turn it into a fixed-length-record flat file with random access to the lines, I might as well throw the data in a DBMS. Hence, there is the idea of shuffling the numbers from 1 to n, which can be done in O(n) time using the Knuth shuffle [which languages like Java or Python conveniently provide], prepend them to the lines, and then sorting the lines...

Or are you thinking about some other algorithm? Here is my reference:
http://en.wikipedia.org/wiki/Knuth_shuffle

Or maybe you mean that I can scale up the shuffling of the numbers from 1 to n to values of n much greater than 100,000,000? Well, maybe I was being a pessimist...</description>
		<content:encoded><![CDATA[<p>Thanks Suresh. To implement a &#8220;Knuth shuffle&#8221;, don&#8217;t you need random access to the items to swap them? For my problem, I specify a &#8220;variable-length-record flat file&#8221;. If I am allowed to turn it into a fixed-length-record flat file with random access to the lines, I might as well throw the data in a DBMS. Hence, there is the idea of shuffling the numbers from 1 to n, which can be done in O(n) time using the Knuth shuffle [which languages like Java or Python conveniently provide], prepend them to the lines, and then sorting the lines&#8230;</p>
<p>Or are you thinking about some other algorithm? Here is my reference:<br />
<a href="http://en.wikipedia.org/wiki/Knuth_shuffle" rel="nofollow">http://en.wikipedia.org/wiki/Knuth_shuffle</a></p>
<p>Or maybe you mean that I can scale up the shuffling of the numbers from 1 to n to values of n much greater than 100,000,000? Well, maybe I was being a pessimist&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Suresh</title>
		<link>http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/comment-page-1/#comment-49730</link>
		<dc:creator>Suresh</dc:creator>
		<pubDate>Wed, 13 Feb 2008 18:46:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/archives/2008/02/13/external-memory-shuffles/#comment-49730</guid>
		<description>What's the problem with a shuffle ? using the standard Knuth trick, you can generate a shuffle as easily as choosing random numbers.</description>
		<content:encoded><![CDATA[<p>What&#8217;s the problem with a shuffle ? using the standard Knuth trick, you can generate a shuffle as easily as choosing random numbers.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
