The algorithm
"Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky…" — Professor Donald Knuth
When Jon Bentley assigned it as a problem in a course for professional programmers, he found that an astounding ninety percent failed to code a binary search correctly after several hours of working on it[3], and another study shows that accurate code for it is only found in five out of twenty textbooks (Kruse, 1999). Furthermore, Bentley's own implementation of binary search, published in his 1986 book Programming Pearls, contains an error that remained undetected for over twenty years.
Numerical difficulties
In a practical implementation, the variables used to represent the indices will be of finite size, hence only capable of representing a finite range of values. For example, 16-bit unsigned integers can only hold values from 0 to 65535. If the binary search algorithm is to operate on large arrays, this has two implications:
- The values
first − 1andlast + 1must both be representable within the finite bounds of the chosen integer type . Therefore, continuing the 16-bit example, the largest value thatlastmay take is +65534, not +65535. A problem exists even for the "inclusive" form of the method, as ifx A(65535).Key, then on the final iteration the algorithm will attempt to store 65536 intoLand fail. Equivalent issues apply to the lower limit (wherefirst − 1could become negative when the allowed search space only contains positive or null indice). - If the midpoint of the span is calculated as
p := (L + R) / 2, then the value(L + R)will exceed the number range iflastis greater than (in our example) 65535/2 and the search wanders toward the upper end of the search space. This can be avoided by performing the calculation asp := L + (R - L) / 2.
Syntax difficulties
Another difficulty is presented by the absence in most computer languages of a three-way result from a comparison, which forces a comparison to be performed twice. The form is somewhat as follows:
if a
About half the time, the first test will be true so that there will be only one comparison of a and b, but the other half of the time it will be false, and a second comparison forced. This is so grievous that some versions are recast so as not to make a second test at all thus not determining equality until the span has been reduced to zero, and thereby foregoing the possibility of early termination – remember that about half the time the search will happen on a matching value one iteration short of the limit.
It is quite easy to make this problem still worse (e.g. as in [5]) by using an order such as
if a = b then action3
else if a b then action2
else action1;
Rather than detecting equality early (as it might appear to), this will force two comparisons to be performed for all but the last iteration of a search.
Implementations
Iterative
Niklaus Wirth recorded this algorithm in Standard Pascal [6] :
min := 1;
max := N; {array size: var A : array [1..N] of integer}
repeat
mid := (min + max) div 2;
if x A[mid] then
min := mid + 1
else
max := mid - 1;
until (A[mid] = x) or (min max);
This code uses inclusive bounds and a three-way test (for early loop termination in case of equality), but with two separate comparisons per iteration. It is not the most efficient solution.
Three-way comparison
| | This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2009) |
Since Fortran does offer a three-way test, here is a version for searching an array of integers. For labels Fortran uses numbers at the start of statements, thus the 1, 2, 3, and 4. The if statement performs a go to to one of the three nominated labels according to the sign of its arithmetic expression.
Integer Function BinarySearch(A,X,N)
! Search for the value X in A(1)..A(N)
Integer A(*),X !The array is indexed 1 to ?
Integer N !Stated number of elements.
Integer L,R,P
L = 0 !Exclusive bounds,
R = N + 1 !To search elements 1 to N.
1 P = (R - L)/2 !Probe; integer division. Not (L + R)/2!
if (P = 0) Return(-L) !Search exhausted. P = L + P !Convert an offset from L to an array index. if (X - A(P)) 3,4,2 !Test: negative,zero,positive. 2 L = P !A(P) < r =" P" x =" A(P).">
Recursive
| | This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2009) |
The most straightforward implementation is recursive, which recursively searches the subrange dictated by the comparison:
BinarySearch(A[0..N-1], value, low, high) {
if (high mid =" low" value)
return BinarySearch(A, value, low, mid-1)
else if (A[mid]
It is invoked with initial low and high values of 0 and N-1. We can eliminate the tail recursion above and convert this to an iterative implementation:
Single comparison per iteration
| | This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2009) |
In computer languages that lack a three-way comparison, the required three-way comparison has to be replaced by two two-way comparisons that would on average involve one and a half comparisons per iteration, not one. To reduce this overhead, some implementations defer checking for equality until after the search completes, as in this pseudocode:
low = 0
high = N
while (low mid =" low" low =" mid" high =" mid-1:"= value,
//so high can't be high =" mid;" high ="=""
This approach foregoes the possibility of early termination on discovery of a match, thus successful searches have log2(N) iterations instead of an expected log2(N) − 1 iterations. On the other hand, this implementation makes fewer comparisons: log2(N) is less than the expected number of comparisons for the two-test implementations of 1·5(log2(N) − 1), for N greater than eight.
No comments:
Post a Comment