When optimising code, never guess, always measure
source link: https://www.tuicool.com/articles/hit/a6vEriE
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
When optimising code, never guess, always measure.
This page has been
Tagged As SoftwareThis is a truism that lots of people quote, but it can be hard to remember, especially in the heat of battle (as it were). Rather fortunately it came to mind just when needed, as I found something completely unexpected.
I was writing a simple implementation of the Fermat difference of squares method of factoring. This involves writing the number to be factored as - you guessed it - the difference of two squares. If $n=a^2-b^2$ then $n=(a-b)(a+b)$ and we have a factorisation (provided we don't have $a-b=1$).
Version 0
def factor_fermat0(n): a = square_root_ceil(n) while True: d = a*a-n b = square_root_ceil(d) if b**2 == d: return a-b, a+b a += 1
Version 1
def factor_fermat1( n ): a = square_root_ceil(n) b = square_root_ceil(a**2-n) while a**2-b**2 != n: while a**2-b**2 < n: a += 1 while a**2-b**2 > n: b += 1 return a-b, a+b
Version 2
def factor_fermat2( n ): a = square_root_ceil(n) a2 = a**2 b = square_root_ceil(a**2-n) b2 = b**2 while a2-b2 != n: while a2-b2 < n: a2, a = a2+a+a+1, a+1 while a2-b2 > n: b2, b = b2+b+b+1, b+1 return a-b, a+b
Version 3
def factor_fermat3( n ): a = square_root_ceil(n) a2 = a**2 c = 2*a+1 b = square_root_ceil(a**2-n) b2 = b**2 d = 2*b+1 while a2-b2 != n: while a2-b2 < n: a2, c = a2+c, c+2 while a2-b2 > n: b2, d = b2+d, d+2 return (c-d)/2, (c+d)/2-1
Let's see how we're doing
These improvements all seem obvious and straight-forward, and we can check by doing some timing tests. The times, and indeed, the ratios of times, will vary according to the number being factored, so we can do two different numbers and see what the trend is. If you choose to analyse the algorithm then you'll see what's going on. However, we factored two different numbers:
1211141
x
3911111: Routine
3911111
x
12111139: Routine
We can see that the first routine is rubbish, but that's no surprise, as it's computing a square root every time through the loop. For the others we can see a clear improvement as we make those optimisations. It did surprise me just how big the improvement was from #2 to #3, but it was nice to see.
Version 4
def factor_fermat4( n ): a = square_root_ceil(n) a2 = a**2 c = 2*a+1 b = square_root_ceil(a**2-n) b2 = b**2 d = 2*b+1 while a2-b2 != n: while a2-b2 < n: a2 += c c += 2 while a2-b2 > n: b2 += d d += 2 return (c-d)/2, (c+d)/2-1
3911111
x
12111139: Routine
Wait - what? It's slower? How can that be? I'd've thought it would compile to exactly the same bytecode, and so execute in pretty much the same time.
Just to be sure I ran it on a bigger number.
39111113
x
121111147: Routine
There is no doubt:
a2, c = a2+c, c+2is faster than
a2 += c c += 2
That caught me by surprise, but it really does go to show:
measure, measure,
measure.
My guess is that in the first case the two evaluations and assignments can happen in parallel, and so may happen on different cores, whereas in the second case they must happen in serial, but that's just a guess. Maybe someone reading this knows for sure.
Let me know.
I've got more changes to make, but one thing is for sure - I'll be measuring the improvements as I go to make sure they really do make things better.
References:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK