Tuesday, January 19, 2010

Computer usage

The algorithm

"Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky…" — Professor Donald Knuth

When Jon Bentley assigned it as a problem in a course for professional programmers, he found that an astounding ninety percent failed to code a binary search correctly after several hours of working on it[3], and another study shows that accurate code for it is only found in five out of twenty textbooks (Kruse, 1999). Furthermore, Bentley's own implementation of binary search, published in his 1986 book Programming Pearls, contains an error that remained undetected for over twenty years.

Numerical difficulties

In a practical implementation, the variables used to represent the indices will be of finite size, hence only capable of representing a finite range of values. For example, 16-bit unsigned integers can only hold values from 0 to 65535. If the binary search algorithm is to operate on large arrays, this has two implications:

  • The values first − 1 and last + 1 must both be representable within the finite bounds of the chosen integer type . Therefore, continuing the 16-bit example, the largest value that last may take is +65534, not +65535. A problem exists even for the "inclusive" form of the method, as if x A(65535).Key, then on the final iteration the algorithm will attempt to store 65536 into L and fail. Equivalent issues apply to the lower limit (where first − 1 could become negative when the allowed search space only contains positive or null indice).
  • If the midpoint of the span is calculated as p := (L + R) / 2, then the value (L + R) will exceed the number range if last is greater than (in our example) 65535/2 and the search wanders toward the upper end of the search space. This can be avoided by performing the calculation as p := L + (R - L) / 2.

Syntax difficulties

Another difficulty is presented by the absence in most computer languages of a three-way result from a comparison, which forces a comparison to be performed twice. The form is somewhat as follows:

if a 

About half the time, the first test will be true so that there will be only one comparison of a and b, but the other half of the time it will be false, and a second comparison forced. This is so grievous that some versions are recast so as not to make a second test at all thus not determining equality until the span has been reduced to zero, and thereby foregoing the possibility of early termination – remember that about half the time the search will happen on a matching value one iteration short of the limit.

It is quite easy to make this problem still worse (e.g. as in [5]) by using an order such as

if a = b then action3
else if a b then action2
else action1;

Rather than detecting equality early (as it might appear to), this will force two comparisons to be performed for all but the last iteration of a search.

Implementations

Iterative

Niklaus Wirth recorded this algorithm in Standard Pascal [6] :

  min := 1;
max := N; {array size: var A : array [1..N] of integer}
repeat
mid := (min + max) div 2;
if x A[mid] then
min := mid + 1
else
max := mid - 1;
until (A[mid] = x) or (min max);

This code uses inclusive bounds and a three-way test (for early loop termination in case of equality), but with two separate comparisons per iteration. It is not the most efficient solution.

Three-way comparison

Since Fortran does offer a three-way test, here is a version for searching an array of integers. For labels Fortran uses numbers at the start of statements, thus the 1, 2, 3, and 4. The if statement performs a go to to one of the three nominated labels according to the sign of its arithmetic expression.

      Integer Function BinarySearch(A,X,N)
! Search for the value X in A(1)..A(N)
Integer A(*),X !The array is indexed 1 to ?
Integer N !Stated number of elements.
Integer L,R,P
L = 0 !Exclusive bounds,
R = N + 1 !To search elements 1 to N.
1 P = (R - L)/2 !Probe; integer division. Not (L + R)/2!
if (P = 0) Return(-L) !Search exhausted. P = L + P !Convert an offset from L to an array index. if (X - A(P)) 3,4,2 !Test: negative,zero,positive. 2 L = P !A(P) < r =" P" x =" A(P).">

Recursive

The most straightforward implementation is recursive, which recursively searches the subrange dictated by the comparison:

   BinarySearch(A[0..N-1], value, low, high) {
if (high mid =" low" value)
return BinarySearch(A, value, low, mid-1)
else if (A[mid]


It is invoked with initial low and high values of 0 and N-1. We can eliminate the tail recursion above and convert this to an iterative implementation:

Single comparison per iteration

In computer languages that lack a three-way comparison, the required three-way comparison has to be replaced by two two-way comparisons that would on average involve one and a half comparisons per iteration, not one. To reduce this overhead, some implementations defer checking for equality until after the search completes, as in this pseudocode:

       low = 0
high = N
while (low mid =" low" low =" mid" high =" mid-1:"= value,
//so high can't be high =" mid;" high ="=""

This approach foregoes the possibility of early termination on discovery of a match, thus successful searches have log2(N) iterations instead of an expected log2(N) − 1 iterations. On the other hand, this implementation makes fewer comparisons: log2(N) is less than the expected number of comparisons for the two-test implementations of 1·5(log2(N) − 1), for N greater than eight.

No comments:

Post a Comment