			   Hartman Technica

	 Multiply times for different forms of multiplication

Written by Diane Gagne

Three different multiply functions were run 65000 times to get an
average time.  A test empty function was also run to compare the
random number generator times and subtract them from the multiply
times.

Timing was on an MSP430F1611 at 8MHz clock rate.

The tests were performed twice with two different forms of random
numbers.  The first is bitwise random created by getting two random
integers and shifting them beside each other then passing that bit
pattern directly to the multiply function.  This resulted in a lot of
overflow, underflow and denormalized cases.

The second was a normalized number generator created by taking two
random 16 bit integers casting them into floats and multiplying them.
This resulted in very little overflow or exception cases.

The timing tests were then run on all optimization levels with all
three forms of multiplication, the original gcc with IEEE, the
original mspgcc with no IEEE and my version with IEEE.  All the
outputs are in the spreadsheet multiply_comp.gnumeric.

The testing was only done at level 1 optimization for the addition and
divide functions, which are also found in the spreadsheet.

Finally I compared the text size that each multiply version takes up
for each optimization level.  The results follow:

No optimization level

Text sizes of the three functions:
			__mulsf3	_fpmul_parts	total
Original with IEEE 754: 0x26a		-inline-	0x26a
Original witout IEEE:	0x1e8		0xa4		0x28c		
My version with IEEE:	0x550		0x46		0x596

Optimization level 01

Text sizes of the three functions:
			__mulsf3	_fpmul_parts	total
Original with IEEE 754: 0x26a		-inline-	0x26a
Original witout IEEE:	0xf8		0x4e		0x146		
My version with IEEE:	0x2fe		0x46		0x344

Optimization level 02

Text sizes of the three functions:
			__mulsf3	_fpmul_parts	total
Original with IEEE 754: 0x26a		-inline-	0x26a
Original witout IEEE:	0xe8		0x4e		0x136		
My version with IEEE:	0x350		0x46		0x396

Optimization level 03

Text sizes of the three functions:
			__mulsf3	_fpmul_parts	total
Original with IEEE 754: 0x26a		-inline-	0x26a
Original witout IEEE:	0x11c		0x50		0x16c		
My version with IEEE:	0x350		0x46		0x396

Optimization level 0s

Text sizes of the three functions:
			__mulsf3	_fpmul_parts	total
Original with IEEE 754: 0x26a		-inline-	0x26a
Original witout IEEE:	0x150		0x42		0x192		
My version with IEEE:	0x350		0x46		0x39c

mspgcc4 Optimization level O2
Test sizes of libm functions:
			__mulsf3	_fpmul_parts	total
Original witout IEEE:	0xe8		0x4e		0x136		
My version with IEEE:	0x26c		Ox46		0c2b2


Optimization levels

As our data set is usually normalized with few overflow or underflow
situations we can now compare the normalized random version of my multiply
function with error handling at all the optimization levels:

Level	 65000 mults	     1 mult

-	3.015s		   46.39us
01	2.141s		   32.93us
02	2.688s	           41.34us
03	2.695s	           41.46us
0s	2.711s		   41.70us

As the text size also matters compare my version of those for all the
multiply optimization levels as well:

Level	__mulsf3	_fpmul_parts	total

-	0x550		0x46		0x596
01	0x2fe		0x46		0x344
02	0x350		0x46		0x396
03	0x350		0x46		0x396
0s	0x356		0x46		0x39c	

Comparing the speed of multiplication at different levels:

The fastest optimization level for my version of code is at a 01
optimization.  This occurs wether the code is overflow intensive or
not, though is a greater speed boost with there is a lot of exception
cases.  There is a factor of 1.25 difference between the level 1 and
2, which is the next closest in speed.  Also looking at the level 1
optimization there is a factor of 7.64 difference between the original
gcc multiplier and my multiplier in the test with little to no
overflow cases.
