> Okay, let's try a few other things. First, run all tests again using > the standard cc (SunPro compiler) if you weren't using it before. Then > add the following compile time options: > > -xO5 -fd -native -v -Xa -xarch=v8plus > > Also, try recoding the faster routine with type "long long" (64 bit > word size, not 32 which is what you get with long). Also, change > those to UNSIGNED long's and long long's. You *MUST* use the above > compile operations for the "long long" code or else it will really > hurt performance because it will not use the 64bit opcodes without > those compile options. All the tests have been run using the SunPro compiler. For the options you give above we get: Plain memcpy: 34588.296875 KB/sec naive memcpy using char *: 18189.503906 KB/sec memcpy using unsigned long *: 27729.460938 KB/sec memcpy using unsigned long * with exclusive OR: 25897.773438 KB/sec memcpy using long long *: 27931.076172 KB/sec memcpy using long long * with exclusive OR: 26529.302734 KB/sec memcpy using unsigned long long *: 28416.781250 KB/sec memcpy using unsigned long long * with exclusive OR: 26929.230469 KB/sec It seems that the fast switch worked just about as well as the flags you specified, also the peformance of longs and long longs are about the same. So as a final question I wondered, is it faster to to do the libc memcpy and then a encryption or use our own copy and encrypt? So here is the results: memcpy (libc) and encrypt with long long *: 25441.546875 KB/sec (compiled with the -fast option) It seems that combining the encrypt with a copy won't buy us much over doing seperate memcpy's and encryptions.