看一段代码:
void send1(int *to, int *from, int count)
{
do {
*to++ = *from++ ;
} while( --count > 0);
}
很容易看出来这段代码的作用,把count个整型数据从from复制到to。不过,还有更快的写法吗?看下边这段代码:
void send2(int *to, int *from, int count)
{
int n = (count + 7 ) / 8 ;
switch (count % 8 ) {
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while(--n > 0);
}
}
这段代码很神奇的把一个循环嵌到了一个 switch-case 里。首先,用 count%8
取得余下的int个数(这余数不是在分组的末尾,而是在开头),利用 switch-case 定位到这“剩下”的int个数,先复制这几个int。然后,这个 switch-case 就失去作用了。接着,就是 do-while 来发挥作用,每8个int为一组,批量复制数据。
这样对比看来,send1每复制一个int都要进行一次比较,而send2每复制8个int才进行一次比较,显然send2的复制效率更高一些。实际测试的结果也是这样。后一种复制技巧被称作Duff’s Device。
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
const size_t BUFLEN = 100000000;
#define TEST_START(TEST_NAME) \
do { \
char* name = TEST_NAME; \
clock_t start, finish; \
start = clock();
#define TEST_END() \
finish = clock(); \
printf("%s clock: %ld\n", name, finish - start); \
} while(0);
void send1(int *to, int *from, int count)
{
do {
*to++ = *from++ ;
} while( --count > 0);
}
void send2(int *to, int *from, int count)
{
int n = (count + 7 ) / 8 ;
switch (count % 8 ) {
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while(--n > 0);
}
}
int main (int argc, char *argv[])
{
char *from, *to;
from = (char *) malloc(BUFLEN * sizeof(char));
to = (char *) malloc(BUFLEN * sizeof(char));
memset(from, 'a', (BUFLEN * sizeof(char)));
TEST_START("send1");
send1(to, from, BUFLEN);
TEST_END();
TEST_START("send2");
send2(to, from, BUFLEN);
TEST_END();
free(from);
free(to);
return(0);
}
运行结果:
send1 clock: 110000
send2 clock: 60000