不得不转,这个太经典了。后面两个问题自己分析了一下。
转自:http://blog.chinaunix.net/u/13991/showart_115947.html
代码:http://infolab.stanford.edu/~manku/bitcount/bitcount.c
Fast Bit Counting Routines
Compiled from various sources by Gurmeet Singh Manku
A common problem asked in job interviews is to count the number of
bits that are on in an unsigned integer. Here are seven solutions to
this problem. Source code in C is available.
Iterated Count |
int bitcount (unsigned int n)
{
int count=0;
while (n)
{
count += n & 0x1u ;
n >>= 1 ;
}
return count ;
}
|
|
Sparse Ones |
int bitcount (unsigned int n)
{
int count=0 ;
while (n)
{
count++ ;
n &= (n - 1) ;
}
return count ;
}
|
|
Dense Ones |
int bitcount (unsigned int n)
{
int count = 8 * sizeof(int) ;
n ^= (unsigned int) -1 ;
while (n)
{
count-- ;
n &= (n - 1) ;
}
return count ;
}
|
|
Precompute_8bit |
// static int bits_in_char [256] ;
int bitcount (unsigned int n)
{
// works only for 32-bit ints
return bits_in_char [n & 0xffu]
+ bits_in_char [(n >> 8) & 0xffu]
+ bits_in_char [(n >> 16) & 0xffu]
+ bits_in_char [(n >> 24) & 0xffu] ;
}
|
|
Iterated Count
runs in time proportional to the total number of bits. It simply loops
through all the bits, terminating slightly earlier because of the while
condition. Useful if 1's are sparse and among the least significant
bits. Sparse Ones runs in time proportional to the number of 1 bits. The line n &= (n - 1) simply sets the rightmost 1 bit in n to 0. Dense Ones
runs in time proportional to the number of 0 bits. It is the same as
Sparse Ones, except that it first toggles all bits (n ~= -1), and
continually subtracts the number of 1 bits from sizeof(int). Precompute_8bit
assumes an array bits_in_char such that bits_in_char[i] contains the
number of 1 bits in the binary representation for i. It repeatedly
updates count by masking out the last eight bits in n, and indexing into
bits_in_char.
Precompute_16bit |
// static char bits_in_16bits [0x1u << 16] ;
int bitcount (unsigned int n)
{
// works only for 32-bit ints
return bits_in_16bits [n & 0xffffu]
+ bits_in_16bits [(n >> 16) & 0xffffu] ;
}
|
Precompute_16bit
is a variant of Precompute_8bit in that an array bits_in_16bits[]
stores the number of 1 bits in successive 16 bit numbers (shorts).
Parallel Count |
#define TWO(c) (0x1u << (c))
#define MASK(c) (((unsigned int)(-1)) / (TWO(TWO(c)) + 1u))
#define COUNT(x,c) ((x) & MASK(c)) + (((x) >> (TWO(c))) & MASK(c))
int bitcount (unsigned int n)
{
n = COUNT(n, 0) ;
n = COUNT(n, 1) ;
n = COUNT(n, 2) ;
n = COUNT(n, 3) ;
n = COUNT(n, 4) ;
/* n = COUNT(n, 5) ; for 64-bit integers */
return n ;
}
|
Parallel Count
carries out bit counting in a parallel fashion. Consider n after the
first line has finished executing. Imagine splitting n into pairs of
bits. Each pair contains the number of ones in those two bit positions in the original n. After the second line has finished executing, each nibble contains the number of ones
in those four bits positions in the original n. Continuing this for
five iterations, the 64 bits contain the number of ones among these
sixty-four bit positions in the original n. That is what we wanted to
compute.
Nifty Parallel Count |
#define MASK_01010101 (((unsigned int)(-1))/3)
#define MASK_00110011 (((unsigned int)(-1))/5)
#define MASK_00001111 (((unsigned int)(-1))/17)
int bitcount (unsigned int n)
{
n = (n & MASK_01010101) + ((n >> 1) & MASK_01010101) ;
n = (n & MASK_00110011) + ((n >> 2) & MASK_00110011) ;
n = (n & MASK_00001111) + ((n >> 4) & MASK_00001111) ;
return n % 255 ;
}
|
Nifty Parallel Count
works the same way as Parallel Count for the first three iterations. At
the end of the third line (just before the return), each byte of n
contains the number of ones in those eight bit positions in the original
n. A little thought then explains why the remainder modulo 255 works.
MIT HACKMEM Count |
int bitcount(unsigned int n)
{
/* works for 32-bit numbers only */
/* fix last line for 64-bit numbers */
register unsigned int tmp;
tmp = n - ((n >> 1) & 033333333333)
- ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}
|
MIT Hackmem Count
is funky. Consider a 3 bit number as being 4a+2b+c. If we shift it
right 1 bit, we have 2a+b. Subtracting this from the original gives
2a+b+c. If we right-shift the original 3-bit number by two bits, we get
a, and so with another subtraction we have a+b+c, which is the number of
bits in the original number. How is this insight employed? The first
assignment statement in the routine computes tmp. Consider the octal representation of tmp. Each digit in the octal representation is simply the number of 1's in the corresponding three bit positions in n.
The last return statement sums these octal digits to produce the final
answer. The key idea is to add adjacent pairs of octal digits together
and then compute the remainder modulus 63. This is accomplished by
right-shifting tmp by three bits, adding it to tmp
itself and ANDing with a suitable mask. This yields a number in which
groups of six adjacent bits (starting from the LSB) contain the number
of 1's among those six positions in n. This number modulo 63
yields the final answer. For 64-bit numbers, we would have to add
triples of octal digits and use modulus 1023. This is HACKMEM 169, as
used in X11 sources. Source: MIT AI Lab memo, late 1970's.
No Optimization Some Optimization Heavy Optimization
Precomp_16 52.94 Mcps Precomp_16 76.22 Mcps Precomp_16 80.58 Mcps
Precomp_8 29.74 Mcps Precomp_8 49.83 Mcps Precomp_8 51.65 Mcps
Parallel 19.30 Mcps Parallel 36.00 Mcps Parallel 38.55 Mcps
MIT 16.93 Mcps MIT 17.10 Mcps Nifty 31.82 Mcps
Nifty 12.78 Mcps Nifty 16.07 Mcps MIT 29.71 Mcps
Sparse 5.70 Mcps Sparse 15.01 Mcps Sparse 14.62 Mcps
Dense 5.30 Mcps Dense 14.11 Mcps Dense 14.56 Mcps
Iterated 3.60 Mcps Iterated 3.84 Mcps Iterated 9.24 Mcps
Mcps = Million counts per second
|
Which
of the several bit counting routines is the fastest? Results of speed
trials on an i686 are summarized in the table on left. "No Optimization"
was compiled with plain gcc. "Some Optimizations" was gcc -O3. "Heavy Optimizations" corresponds to gcc -O3 -mcpu=i686 -march=i686 -fforce-addr -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4.
Thanks to Seth Robertson who suggested performing speed trials by extending bitcount.c. Seth also pointed me to MIT_Hackmem routine. Thanks to Denny Gursky
who suggested the idea of Precompute_11bit. That would require three
sums (11-bit, 11-bit and 10-bit precomputed counts). I then tried
Precompute_16bit which turned out to be even faster.
If you have niftier solutions up your sleeves, please send me an e-mail
问题1:
#define MASK_01010101 (((unsigned int)(-1))/3)
#define MASK_00110011 (((unsigned int)(-1))/5)
#define MASK_00001111 (((unsigned int)(-1))/17)
这里的(unsigned int) (-1) /3为啥是01010101的样子???
做了下实验:
>>> bin(int('11111111',2)/3)
'0b1010101'
>>> bin(int('11111111',2)/5)
'0b110011'
这个只是一个小技巧,具体在纸上除一下:
001
101 | 11111111
101
10
00110
101 | 11111111
101
101
0
01
问题2:
n % 255 ;
n%63
这里的%63 是什么作用??
1. 假设最后结果n为:
000111 001111
b a
n = b*64+a
= 63b + (a+b)
所以
n%63 = [63b + (a+b)]%63
= 63b % 63 + (a+b) % 63 根据模的性质((a%m + b%m)%m = (a+b)%m)
= (a+b)
2. 假设结果n为:
000011 000111 001111
c b a
n = c*642 + b*64 + a
= c*(642-1+1) + 64b + a
= c*(642-1) + c + 64b + a
= c*(64-1)(64+1) + c + 64b + a
= c*65*63 + 63b + (a + b + c )
所以 n%63 = a+b+c
3. 现在我们看644, 645 ...
644 = (644 -1 +1) = (644 -1 ) + 1
而(644 - 1) 一定可以分解为(644 - 1) *... , 必然能被63整除.
所以n % 63 = n的64进制各个数位上的数字之和.
这也解释了为什么必须是63, 当数字是用64进制表示的时候,就只能选择64-1 = 63
模的基本性质:
(a + b) % n = (a % n + b % n) % n (1)
(a - b) % n = (a % n - b % n) % n (2)
(a * b) % n = (a % n * b % n) % n (3)
分享到:
相关推荐
**前端开源库-fast-bitfield详解** 在前端开发中,数据处理和优化往往是一个重要的环节,尤其是在处理大量数据或者需要高效运算的场景下。`fast-bitfield`是一个专门为前端设计的开源库,它提供了快速位字段操作的...
在这个Java实例中,我们看到一个名为`CountOnes`的类,它提供了一个`getCount`方法来快速计算一个整数中1的个数。这种方法基于一种被称为“Brian Kernighan”的位操作技巧。 Brian Kernighan算法的基本思想是通过将...
CLC402 Low-Gain Op Amp with Fast 14-Bit Settling CLC404 Wideband, High-Slew Rate, Monolithic Op Amp CLC405 Low-Cost, Low Power, 110MHz Op Amp with Disable CLC406 Wideband, Low Power Monolithic Op Amp ...
This simplifies the overall circuit design and reduces component count and cost. 6. **Linearity Error ±1 LSB Max**: The linearity error of the TLC2543 is specified as ±1 least significant bit (LSB...
where y_i is the i'th bit, and the context is the previous i - 1 bits of uncompressed data. 2. PAQ6 MODEL The PAQ6 model consists of a weighted mix of independent submodels which make predictions ...
- Count occurences of text or hex string - Replace text or hex string, e.g. replace "0D 0A" by "0A" or replace "0D 0A" by text "EOL" - Extremely fast "replace all" mode (if needed, additional memory...
- Fast wake and hop times - Excellent selectivity performance - 60 dB adjacent channel - 73 dB blocking at 1 MHz - Antenna diversity and T/R switch control - Highly configurable packet handler - TX ...
ber ofdm_bpsk ofdm_ofdma ber_ofdma bpsk"表明这是一个关于OFDMA(Orthogonal Frequency Division Multiple Access)技术的压缩包,其中涉及了BPSK(Binary Phase Shift Keying)调制在OFDM系统中的误比特率(Bit ...
- **HyperRAM Interface**: Utilizes the HyperRAM interface, which is optimized for low-pin count applications and supports fast data transfer rates. - **Low Power Consumption**: Designed for low power ...
STM32L475VGT6单片机物联网开发板PDF原理图PCB+AD集成封装库文件, ALTIUM工程转的PDF原理图PCB文件+AD集成封装库,已在项目中验证,可以做为你的设计参考。集成封装库器件列表: Library Component Count : 46 ...
DROP TABLE, ALTER TABLE statements CREATE INDEX, DROP INDEX statements INSERT, UPDATE, DELETE statements BETWEEN, IN, LIKE, IS NULL, EXISTS operators Aggregate functions COUNT,SUM,MIN,MAX,AVG Most of...
Library Component Count : 48 Name Description ---------------------------------------------------------------------------------------------------- AP7333-XXSAG 300mA, Low Quiescent Current, Fast ...
* fixed: adding subtitle caption count to filenames sometimes didn't work * fixed: subtitle caption counts in log sometimes had wrong track numbers * fixed: all non-supported MKV tracks shared the ...
- **IFFT_bin_length**: IFFT(Inverse Fast Fourier Transform)的长度,即每个OFDM符号的子载波数量。 - **carrier_count**: 使用的子载波数量。 - **bits_per_symbol**: 每个调制符号携带的比特数。 - **symbols_...
--bitrate <integer> Target bitrate (kbps) for ABR (implied). Default 0 --crf <float> Quality-based VBR (0-51). Default 28.0 --vbv-maxrate <integer> Max local bitrate (kbit/s). Default 0 --vbv-...
CHR(BITAND(p1, -16777216) / 16777215) || CHR(BITAND(p1, 16711680) / 65535) "EnqueueType" FROM v$session_wait a, v$session b WHERE a.event NOT LIKE 'SQL*N%' AND a.event NOT LIKE 'rdbms%' AND a.SID = ...
The Marvell® 88E6085 device is a single chip integrated 10-port Fast Ethernet switch. This device supports ‘Best-in-Class’ Quality of Service (QoS) and the highest ‘real-world’ performance. It ...
Shuffled, Disordered, Scattered Imports (Just for 32 bit processes). So you can use this tool for changing IAT Base Address and Sorting IATs in New (other) Address. Tested on: Armadillo ASProtect...
- Fast/Slow两档速度,Fast:10次/秒,Slow:1次/秒 - 数据从USB UART输出,波特率115200(目前只输出,不能从上位机控制) - 默认5分钟自动关机,可以关闭该功能 - 使用一节锂电池供电,支持从USB充电 - 支持背光,...