<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Which is fastest: integer addition or XOR?</title>
	<atom:link href="http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/feed/" rel="self" type="application/rss+xml" />
	<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/</link>
	<description>Computer Scientist and Open Scholar: Databases, Information Retrieval, Business Intelligence.</description>
	<lastBuildDate>Thu, 24 May 2012 04:42:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Aloz1</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-54798</link>
		<dc:creator>Aloz1</dc:creator>
		<pubDate>Sat, 19 Nov 2011 22:21:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-54798</guid>
		<description>Would it not be wise and slightly more accurate if you were using unsigned integers?</description>
		<content:encoded><![CDATA[<p>Would it not be wise and slightly more accurate if you were using unsigned integers?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ed rowland</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-54711</link>
		<dc:creator>ed rowland</dc:creator>
		<pubDate>Sun, 11 Sep 2011 04:51:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-54711</guid>
		<description>Addition takes longer to compute in *hardware* because the carry bit has to be propogated through N sequential calculations -- each of which involves gate delays. When computing XOR, each bit can be calculated in parallel. 

Whether it takes longer on a particular hardware implementation is implementation-dependent. Instruction timing on state-of-the-art Intel processors is complicated, to say the least. But, according to the Intel Architecture Optimization Manula, On intel architectures, The pair of integer ALU pipeline stages can each execute xor and addition operations in one clock cycle. They execute at exactly the same speed.

That&#039;s true for Pentium and later processors. It&#039;s possible that smaller and leaner processors do have different execution times for XOR and ADD; but I would think that -- given the prevalence of addition operations in computing, that even tiny processors would use the propogation delay time of an addition operation would for the low limit for a single pipleline stage in any modern microprocess, and all or almost all ancient ones as well. (Despite having taken high-school latin, I also think your spam filter is inappropriate).</description>
		<content:encoded><![CDATA[<p>Addition takes longer to compute in *hardware* because the carry bit has to be propogated through N sequential calculations &#8212; each of which involves gate delays. When computing XOR, each bit can be calculated in parallel. </p>
<p>Whether it takes longer on a particular hardware implementation is implementation-dependent. Instruction timing on state-of-the-art Intel processors is complicated, to say the least. But, according to the Intel Architecture Optimization Manula, On intel architectures, The pair of integer ALU pipeline stages can each execute xor and addition operations in one clock cycle. They execute at exactly the same speed.</p>
<p>That&#8217;s true for Pentium and later processors. It&#8217;s possible that smaller and leaner processors do have different execution times for XOR and ADD; but I would think that &#8212; given the prevalence of addition operations in computing, that even tiny processors would use the propogation delay time of an addition operation would for the low limit for a single pipleline stage in any modern microprocess, and all or almost all ancient ones as well. (Despite having taken high-school latin, I also think your spam filter is inappropriate).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ethanara</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-54709</link>
		<dc:creator>ethanara</dc:creator>
		<pubDate>Sat, 10 Sep 2011 10:19:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-54709</guid>
		<description>I have done the same experiment but with xor and sub, since k -= k is 0 and k ^= k is also 0
the result is :
sub : 151 secs. (7 tries)
xor : 171 secs. (also 7 tries)
this shows that sub is faster

Code:


#include 

using namespace std;

int main()
{
    int k = 1000;
    for(int i= 0 ; i&lt;10000; i++)
    {
        k -= k;
        cout &lt;&lt; i &lt;&lt; endl;
        k = 1000;

    }
}

and 



#include 

using namespace std;

int main()
{
    int k = 1000;
    for(int i= 0 ; i&lt;10000; i++)
    {
        k ^= k;
        cout &lt;&lt; i &lt;&lt; endl;
        k = 1000;

    }
}</description>
		<content:encoded><![CDATA[<p>I have done the same experiment but with xor and sub, since k -= k is 0 and k ^= k is also 0<br />
the result is :<br />
sub : 151 secs. (7 tries)<br />
xor : 171 secs. (also 7 tries)<br />
this shows that sub is faster</p>
<p>Code:</p>
<p>#include </p>
<p>using namespace std;</p>
<p>int main()<br />
{<br />
    int k = 1000;<br />
    for(int i= 0 ; i&lt;10000; i++)<br />
    {<br />
        k -= k;<br />
        cout &lt;&lt; i &lt;&lt; endl;<br />
        k = 1000;</p>
<p>    }<br />
}</p>
<p>and </p>
<p>#include </p>
<p>using namespace std;</p>
<p>int main()<br />
{<br />
    int k = 1000;<br />
    for(int i= 0 ; i&lt;10000; i++)<br />
    {<br />
        k ^= k;<br />
        cout &lt;&lt; i &lt;&lt; endl;<br />
        k = 1000;</p>
<p>    }<br />
}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Hessler</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-54274</link>
		<dc:creator>David Hessler</dc:creator>
		<pubDate>Sun, 13 Mar 2011 03:48:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-54274</guid>
		<description>FYI:  Any test that counts the number of cpu cycles is going to have issues (this is essentially due to the Heisenberg uncertainty principle).  Also, to answer a question from above, multiplication (i.e. the IMULT assembly command) usually runs at about 4 clock cycles.  However, this is only when you are doing a lot of multiplication.  If you are doing only a few this speed is slower.  See an discussion of optimized multiplexers for more understanding.  Also, if the two number being multiplied are greater than the size of register (ie. 32 bits or 64 bits) the speed is O(lg(n)) where n is the number of bits. While this may seem odd to mention, it is very common on any system doing cryptological calculation (particular those for the ElGamal and RSA encryption systems).</description>
		<content:encoded><![CDATA[<p>FYI:  Any test that counts the number of cpu cycles is going to have issues (this is essentially due to the Heisenberg uncertainty principle).  Also, to answer a question from above, multiplication (i.e. the IMULT assembly command) usually runs at about 4 clock cycles.  However, this is only when you are doing a lot of multiplication.  If you are doing only a few this speed is slower.  See an discussion of optimized multiplexers for more understanding.  Also, if the two number being multiplied are greater than the size of register (ie. 32 bits or 64 bits) the speed is O(lg(n)) where n is the number of bits. While this may seem odd to mention, it is very common on any system doing cryptological calculation (particular those for the ElGamal and RSA encryption systems).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Preston L. Bannister</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52341</link>
		<dc:creator>Preston L. Bannister</dc:creator>
		<pubDate>Sat, 13 Mar 2010 20:08:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52341</guid>
		<description>Folk, you are wasting your time comparing XOR against addition - both are dirt cheap and going to take the minimum amount of time possible for an instruction that takes two operands and stores a result.

The difference in complexity between the two operations (if any) is insignificant compared to the transistor budget of the CPU designer, and has been for a couple decades.

If you find a difference, it will be to either (1) a measurement error, or (2) something wonky in your favorite language interpreter.

I used to pay a lot of attention to CPU architecture, counting clock cycles per instruction, and writing benchmarks to verify the variants of instruction sequences. As the CPU designers got a bigger transistor budget, they dropped in bigger circuit blocks to perform common operations in a single cycle, or as close as possible. 

I was delighted when CPUs got barrel shifters. Ever try to write efficient graphics ops when bit-shift time is linear to the size of the shift? That was a long time back. (And I found a solution.)

When the Intel 486 came out, I lost interest. Most common operations were at that point were very fast. 

There is another aspect to this. The unique logic in a CPU is like lines of code in software (and VLSI design became more like software design). Since then CPUs have become very large programs indeed. Large programs almost invariably have bugs. We normally attribute failures to bugs in software, but some are bugs in hardware, and we do not know the proportion.

With current CPUs, massive hardware and overlapped speculative execution has even removed or minimized the cost of subroutine calls and pointer chasing - in many cases. Instruction clock-cycle counting has not been effective for a long time.</description>
		<content:encoded><![CDATA[<p>Folk, you are wasting your time comparing XOR against addition &#8211; both are dirt cheap and going to take the minimum amount of time possible for an instruction that takes two operands and stores a result.</p>
<p>The difference in complexity between the two operations (if any) is insignificant compared to the transistor budget of the CPU designer, and has been for a couple decades.</p>
<p>If you find a difference, it will be to either (1) a measurement error, or (2) something wonky in your favorite language interpreter.</p>
<p>I used to pay a lot of attention to CPU architecture, counting clock cycles per instruction, and writing benchmarks to verify the variants of instruction sequences. As the CPU designers got a bigger transistor budget, they dropped in bigger circuit blocks to perform common operations in a single cycle, or as close as possible. </p>
<p>I was delighted when CPUs got barrel shifters. Ever try to write efficient graphics ops when bit-shift time is linear to the size of the shift? That was a long time back. (And I found a solution.)</p>
<p>When the Intel 486 came out, I lost interest. Most common operations were at that point were very fast. </p>
<p>There is another aspect to this. The unique logic in a CPU is like lines of code in software (and VLSI design became more like software design). Since then CPUs have become very large programs indeed. Large programs almost invariably have bugs. We normally attribute failures to bugs in software, but some are bugs in hardware, and we do not know the proportion.</p>
<p>With current CPUs, massive hardware and overlapped speculative execution has even removed or minimized the cost of subroutine calls and pointer chasing &#8211; in many cases. Instruction clock-cycle counting has not been effective for a long time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52340</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Sat, 13 Mar 2010 15:47:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52340</guid>
		<description>@Bannister I have a bunch of counter-measures against spammers for this blog. They work very well, but a small percentage of spam is unavoidable. I prune spam comments every day, believe it or not.</description>
		<content:encoded><![CDATA[<p>@Bannister I have a bunch of counter-measures against spammers for this blog. They work very well, but a small percentage of spam is unavoidable. I prune spam comments every day, believe it or not.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52339</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Sat, 13 Mar 2010 15:44:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52339</guid>
		<description>@Frank With the same code, and testing over many runs, I get that both run at the same speed. 

I have modified your code so that it loops around more:

http://pastebin.com/mEZBBYKZ

&lt;code&gt;
$ g++ -O2 -o code1 code1.cpp
$ g++ -O2 -o code2 code2.cpp
$ time ./code1 9999999
real	0m5.953s
user	0m5.944s
sys	0m0.002s
$ time ./code2 9999999
real	0m5.974s
user	0m5.971s
sys	0m0.002s
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>@Frank With the same code, and testing over many runs, I get that both run at the same speed. </p>
<p>I have modified your code so that it loops around more:</p>
<p><a href="http://pastebin.com/mEZBBYKZ" rel="nofollow">http://pastebin.com/mEZBBYKZ</a></p>
<p><code><br />
$ g++ -O2 -o code1 code1.cpp<br />
$ g++ -O2 -o code2 code2.cpp<br />
$ time ./code1 9999999<br />
real	0m5.953s<br />
user	0m5.944s<br />
sys	0m0.002s<br />
$ time ./code2 9999999<br />
real	0m5.974s<br />
user	0m5.971s<br />
sys	0m0.002s<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Preston L. Bannister</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52338</link>
		<dc:creator>Preston L. Bannister</dc:creator>
		<pubDate>Sat, 13 Mar 2010 11:47:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52338</guid>
		<description>Well Daniel, you have achieved a measure of success with your weblog - the spammers have arrived, despite your protection.

Either that, or we have someone (a student?) with an odd sense of humor.</description>
		<content:encoded><![CDATA[<p>Well Daniel, you have achieved a measure of success with your weblog &#8211; the spammers have arrived, despite your protection.</p>
<p>Either that, or we have someone (a student?) with an odd sense of humor.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harisankar H</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52336</link>
		<dc:creator>Harisankar H</dc:creator>
		<pubDate>Sat, 13 Mar 2010 09:22:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52336</guid>
		<description>Isn&#039;t this because there are complex carry lookahead adder circuitry already incorporated in the hardware ? XOR is a bit wise parallel simpler operation. So in that sense, ADD finishes at the same 
same as XOR because hardware has been optimised for AND.</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t this because there are complex carry lookahead adder circuitry already incorporated in the hardware ? XOR is a bit wise parallel simpler operation. So in that sense, ADD finishes at the same<br />
same as XOR because hardware has been optimised for AND.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Drew Frank</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52335</link>
		<dc:creator>Drew Frank</dc:creator>
		<pubDate>Sat, 13 Mar 2010 08:44:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52335</guid>
		<description>I&#039;m finding that when I compile with gcc, xor is 10-20% faster.  When I compile the same code with g++, add is about 10% faster.  Here is the code I used: http://pastebin.com/PzmmFsai .

uname -a ouputs:
Linux lappy 2.6.32-ARCH #1 SMP PREEMPT Tue Feb 23 19:43:46 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel GNU/Linux

And I&#039;m using gcc 4.4.3.</description>
		<content:encoded><![CDATA[<p>I&#8217;m finding that when I compile with gcc, xor is 10-20% faster.  When I compile the same code with g++, add is about 10% faster.  Here is the code I used: <a href="http://pastebin.com/PzmmFsai" rel="nofollow">http://pastebin.com/PzmmFsai</a> .</p>
<p>uname -a ouputs:<br />
Linux lappy 2.6.32-ARCH #1 SMP PREEMPT Tue Feb 23 19:43:46 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel GNU/Linux</p>
<p>And I&#8217;m using gcc 4.4.3.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philippe Beaudoin</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52332</link>
		<dc:creator>Philippe Beaudoin</dc:creator>
		<pubDate>Sat, 13 Mar 2010 04:23:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52332</guid>
		<description>XOr requires less transistor than addition. This is one measure of complexity on which XOr wins.

XOr would probably be harder to teach to my kid than sums. This is a measure of complexity on which addition wins.

We have a stalemate...

(As a sidenote, have you thought of updating your spam protection mechanism to ask for XOrs  instead of sums? ;))</description>
		<content:encoded><![CDATA[<p>XOr requires less transistor than addition. This is one measure of complexity on which XOr wins.</p>
<p>XOr would probably be harder to teach to my kid than sums. This is a measure of complexity on which addition wins.</p>
<p>We have a stalemate&#8230;</p>
<p>(As a sidenote, have you thought of updating your spam protection mechanism to ask for XOrs  instead of sums? <img src='http://lemire.me/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> )</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Preston L. Bannister</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52331</link>
		<dc:creator>Preston L. Bannister</dc:creator>
		<pubDate>Sat, 13 Mar 2010 03:17:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52331</guid>
		<description>The answer is going to depend on the CPU. 

For anything like current desktop and server CPUs, they use a ridiculous amount of hardware to accelerate a single instruction stream. Both operations are common and pretty much dead simple, so I&#039;d expect both to be among the very-fastest instructions.

This is likely true of embedded CPUs, like those used in the iPad and cell phones.

Multiplication might show a bigger difference (compared to simpler operations) between high-end and low-end CPUs.</description>
		<content:encoded><![CDATA[<p>The answer is going to depend on the CPU. </p>
<p>For anything like current desktop and server CPUs, they use a ridiculous amount of hardware to accelerate a single instruction stream. Both operations are common and pretty much dead simple, so I&#8217;d expect both to be among the very-fastest instructions.</p>
<p>This is likely true of embedded CPUs, like those used in the iPad and cell phones.</p>
<p>Multiplication might show a bigger difference (compared to simpler operations) between high-end and low-end CPUs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52330</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Sat, 13 Mar 2010 01:48:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52330</guid>
		<description>@LaForest Yes, I agree. 

@Stiber

Multiplication might run at the same speed, but you have to use a different test.</description>
		<content:encoded><![CDATA[<p>@LaForest Yes, I agree. </p>
<p>@Stiber</p>
<p>Multiplication might run at the same speed, but you have to use a different test.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Stiber</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52328</link>
		<dc:creator>Mike Stiber</dc:creator>
		<pubDate>Sat, 13 Mar 2010 01:32:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52328</guid>
		<description>They&#039;re probably both the same number of machine cycles, if only because there&#039;s no way for an ALU to take advantage of fewer stages of transistors in the XOR versus the ADD (the outputs of the combinatorial logic are just latched into the output after the same amount of time regardless of the operation of that type. Is integer multiplication any different these days (I&#039;m too lazy to check for myself)?</description>
		<content:encoded><![CDATA[<p>They&#8217;re probably both the same number of machine cycles, if only because there&#8217;s no way for an ALU to take advantage of fewer stages of transistors in the XOR versus the ADD (the outputs of the combinatorial logic are just latched into the output after the same amount of time regardless of the operation of that type. Is integer multiplication any different these days (I&#8217;m too lazy to check for myself)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric LaForest</title>
		<link>http://lemire.me/blog/archives/2010/03/12/which-is-fastest-integer-addition-or-xor/comment-page-1/#comment-52327</link>
		<dc:creator>Eric LaForest</dc:creator>
		<pubDate>Sat, 13 Mar 2010 01:32:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2452#comment-52327</guid>
		<description>Unless there is more context here, there should be no difference, as both ops map to a single-cycle operation on modern processors. Arbitrary-precision numbers maybe?

However, at the pure hardware level, XOR is faster than addition since there is no carry bit, but other details obscure that and for all practical purposes, they run at the same speed as instructions.</description>
		<content:encoded><![CDATA[<p>Unless there is more context here, there should be no difference, as both ops map to a single-cycle operation on modern processors. Arbitrary-precision numbers maybe?</p>
<p>However, at the pure hardware level, XOR is faster than addition since there is no carry bit, but other details obscure that and for all practical purposes, they run at the same speed as instructions.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

