> Okay, let's try a few other things.  First, run all tests again using
> the standard cc (SunPro compiler) if you weren't using it before.  Then
> add the following compile time options:
> 
> -xO5 -fd -native -v -Xa -xarch=v8plus
> 
> Also, try recoding the faster routine with type "long long" (64 bit
> word size, not 32 which is what you get with long).  Also, change
> those to UNSIGNED long's and long long's.  You *MUST* use the above
> compile operations for the "long long" code or else it will really
> hurt performance because it will not use the 64bit opcodes without
> those compile options.


All the tests have been run using the SunPro compiler. For the options
you give above we get:


Plain memcpy:  34588.296875 KB/sec

naive memcpy using char *: 18189.503906 KB/sec

memcpy using unsigned long *:  27729.460938 KB/sec

memcpy using unsigned long * with exclusive OR:  25897.773438 KB/sec

memcpy using long long *: 27931.076172 KB/sec

memcpy using long long * with exclusive OR: 26529.302734 KB/sec

memcpy using unsigned long long *: 28416.781250 KB/sec

memcpy using unsigned long long * with exclusive OR:  26929.230469 KB/sec


It seems that the fast switch worked just about as well as the flags
you specified, also the peformance of longs and long longs are about 
the same. So as a final question I wondered, is it faster to to do the
libc memcpy and then a encryption or use our own copy and encrypt?

So here is the results:

memcpy (libc) and encrypt with long long *: 25441.546875 KB/sec
(compiled with the -fast option)

It seems that combining the encrypt with a copy won't buy us much over
doing seperate memcpy's and encryptions.