Divide AND Conquer, Part 1, Binary Search PDF

Title Divide AND Conquer, Part 1, Binary Search
Author test 2020
Course Data Structure And Algorithm
Institution Andhra University
Pages 15
File Size 281.1 KB
File Type PDF
Total Downloads 48
Total Views 166

Summary

Divide AND Conquer, Part 3, Quicksort...


Description

DIVIDE AND CONQUER, Part 1 Binary Search

Divide and conquer is a strategy for solving problems. It divides a problem into smaller subproblems until each of the sub-problems is solvable., and then combines their solution to obtain a solution of the original problem. The following summarizes the process. DAC stands for divide and conquer. algorithm DAC (P) if small(P) then return S(P) // solution of a small problem else divide P into smaller instances P1, P2, …, Pk, k ≥ 1 apply DAC to each of these problems return combine (DAC(P1), DAC(P2),…, DAC(Pk)) endif The size of the problem P is n, and the sizes of k sub-problems are P1, P2, …, Pk are n1, n2, …, nk, respectively. combine is a function that determines the solution to P using solutions to the k subproblems

Let T(n) be the time for DAC for input size n, g(n) is the time for computing answer

directly for small inputs. Then

T(n) = g(n).

n small

= T(n1) + T(n2) + …+T(nk) + f(n), otherwise.

The function f(n) is the time for dividing P and combining the solutions of the sub-problems to obtain a solution to P. The complexity of many divide and conquer is given by the following recurrence relations. T(n) = T(1),

n =1

= a T(n/b) + f(n), n ≥ 2, where a>0 and b > 0.

We consider three examples. 1. Binary search: It is an algorithm used to search for a given element in an ordered array. It accomplishes this by dividing the array into two approximately equal parts. 2. MergeSort: It is an algorithm used to sort a given array. It accomplishes this by dividing the array into two approximately equal parts, sorting each of the two parts and merging them to form a sorted array. 3. QuickSort: It is an algorithm used to sort a given array. It accomplishes this by selecting a partitioning element and positioning it in its correct place in the array thus dividing the array into two parts one of them could be empty. Then applying the algorithm recursively to both parts.

BINARY SEARCH

Binary Search Tree (BST) or Binary Decision Tree

Ordered array: A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8) A(9) A(10) 3

5

9

12

16

20

25

32

35

40

: Internal node : External node

Node indexes of BST shown on next page are computed as below. Midpoint of 1 and 10 is (1 + 10)/2 = 5 Midpoint of 1 and 5 -1 = 4 is (1 + 4)/2 = 2 Midpoint of 5 + 1= 6 and 10 is (6 + 10)/2 = 8 Midpoint of 2 - 1= 1 and 1 is (1 + 1)/2 = 1 Midpoint of 2 + 1= 3 and 5-1= 4 is (3 + 4)/2 = 3 Midpoint of 3 + 1= 4 and 5-1= 4 is (4 + 4)/2 = 4 Midpoint of 5 + 1= 6 and 8-1= 7 is (6 + 7)/2 = 6 Midpoint of 6 + 1= 7 and 8-1= 7 is (7 + 7)/2 = 7 Midpoint of 8 + 1= 9 and 10-1= 9 is (9 + 9)/2 = 9 Midpoint of 9 + 1= 10 and 10 is (10 + 10)/2 = 10

16 5 Level 1

5 2

(-ω,3)

3

9

1

3

(3,5)

32 8

Level 2

20 Level 3

35

6

9

12

25

4

7

(5,9)

(16,20)

40 10

(32,35)

Last level (9,12)

(12,16)

(20,25)

(25,32)

(35,40)

Nodes are assigned levels/depths beginning with level 1 at the root node. The height/depth (highest level) of the tree is approximately log n. (Exact height is = log (n+1) as derived later). Each node represents one comparison (see algorithms below.) The external nodes are at the bottom two levels = Θ (log n).

Search for x in the array Successful search: x is found in the array- it is found at an internal node and search ends there. Unsuccessful search: x is not found in the array- search ends at an external node. The intervals shown at each external node are the values of x which are not in the array.

(40,∞)

Search for x in the array If x = key at node, then x is found If x < key at node, then search the left subtree. If x > key at node, then search the right subtree.

Both iterative and recursive algorithms are given on the last two pages. Examples Search for 25 in the list low

high

mid A(mid)

1

10

5

16

25 > 16

6

10

8

32

25 < 32

6

7

6

20

25 > 20

7

7

7

25

25 = 25. Success, low=high, item is found in the array.

Search for 11 in the list low

high

mid A(mid)

1

10

5

16

11 < 16

1

4

2

5

11 > 5

3

4

3

9

11 > 9

4

4

4

12

11 < 12

4

3.

Failure, low>high, item is not in the array.

Complexities: # of comparisons Successful search x is found in the array at an internal node, search ends at an internal node. The best case is at the root node (one comparison). The worst case is at an internal node at level last but one

Average case needs special computation Best

Average

Θ (1).

Θ (log n).

Worst Θ (log n).

Unsuccessful search x is not found in the array, all searches end at an external node which are at last two levels. Best

Average

Worst

Θ (log n). Both iterative and recursive algorithms are given next.

algorithm binsrch (A, x, low, high) // recursive // finds an index with A(index) = x if index exists otherwise returns 0, // where A(1: n) is an ascending array of n elements. index ← 0 if low ≤ high then mid ← (low + high)/2 // integer division if x = A(mid) then index ← mid else if x < A(mid) then index ← binsrch (A, x, low, mid - 1) else index ← binsrch (A, x, mid + 1, high) endif endif endif return index end algorithm

algorithm binsrch (A, x, low, high) // recursive // finds an index with A(index) = x if index exists otherwise returns 0, // where A(1: n) is an ascending array of n elements. if low ≤ high then mid ← (low + high)/2 // integer division if x = A(mid) then return mid else if x < A(mid) then return binsrch (A, x, low, mid - 1) else return binsrch (A, x, mid + 1, high) endif endif endif return 0 end algorithm

algorithm binsrch (A, x, low, high) // iterative // finds an index with A(index) = x if index exists otherwise returns 0, // where A(1: n) is an ascending array of n elements. low ← 1, high ← n while low ≤ high then mid ← (low + high)/2 // integer division if x = A(mid) then return mid else if x < A(mid) then high ← mid - 1 else low ← mid + 1 endif endif endwhile return 0 end algorithm

For graduate students: Height of a Binary Decision Tree h(n) = height of a binary decision tree with n internal nodes. We show h(n) = log (n+1) , n  0. h (1) =1, root node is at level 1. mid = (n + 1)/2 = n/2

(n + 1)/2 = n/2

1

n/2 -1= (n -1)/2, elements in the left subtree

n/2 +1

n

n - n/2 = n/2, elements in the right subtree

Note, since n/2 - n/2 ≤ 1, we have n/2 - 1 = (n -1)/2 ≤ n/2. root 1

h(n/2 - 1)

h(n/2)

left subtree

h(n)

right subtree

h(0) = 0, h(1) = 1, h(n) = max{ h( n/2 -1) , h(n/2) } + 1, n  2, or equivalently, h(0) = 0, h(n) = max{ h(n/2 -1), h(n/2 )} + 1, n  1. The goal is to solve the above recurrence to obtain a solution for h(n).

Solution: First we prove a lemma. Lemma: h(n) is a non-decreasing function of n, i.e., h(n)  h(n+1) for all n  0. Proof: We prove the lemma by strong form of induction. Define the statement S(n). S(n): h(i)  h(j) for all 0  i  j  n, where n  0. Base case: 0 = h (0) < h (1) = 1. Thus S (1) holds. Induction hypothesis: Assume S(n) holds. Then show that S(n+1) holds. Since S(n) is assumed to hold, to show that S(n+1) holds, it suffices to verify that h(n)  h(n+1). Now, h(n+1) = max{ h( (n+1) / 2 -1), h((n+1)/2 } + 1, n  1. Also, as before, h(n) = max{ h(n/2 -1), h(n/2 )} + 1, n  1. Since 0 ≤ n/2 -1  (n+1)/2 -1  n and 1 ≤ n/2 ≤ (n+1)/2 ≤ n for all n  2, we conclude from the induction hypothesis that h(n/2 -1)  h((n+1)/2 -1) and h(n/2)  h((n+1)/2), n  2. Hence, the expressions for h(n) and h(n+1) show immediately that h(n)  h(n+1), for n  2. We have already shown that h (0) < h (1). Hence S(n+1) holds. ■ Now we obtain an expression for h(n). Since n/2 -1 n/2 , using the above lemma we have that h(n/2 -1)/2)  h(n/2) in the expression for h(n). Hence, h(n) = h(n/2 + 1, n  1. Expanding this recurrence, we obtain h(n) = h((n/2k) + k, k  1. Now, n/2k = 1  1  n/2k < 2  2k  n < 2k+1  k  log n < k+1  log n = k. where the base of the logarithm is 2. Thus, h(n) = h (1) + k = k +1, and we obtain, h(n) = log n +1 = log (n+1) , n  0. ■ Internal and External Path Length

A tree is called a 2-tree if each node has 2 children or no children (degree of each node is 2 or 0). It is a special case of a binary tree. A BST is a 2-tree. A node with two children is called an internal node. A node with no children is called an external or leaf node. n: # of internal nodes. m: # of external nodes. I: Internal path length. The sum of path lengths of all the internal nodes. E: External path length. The sum of path lengths of all the external nodes.

I 0

1

3

2

4

7

8

2

13

5

9

E

12

6

10

Total

Proposition: Show that for a 2-tree the following holds. m = n + 1, E = I + 2n. Proof. By induction. Base case: n =1, m =2, I=0, E =2.

4

3

9

8

11

E = 21, I = 9, n=6, m=7, m = n + 1, E = I + 2n

4

---------------9 21

Inductive Step: Assume the equations hold for n-1 internal nodes and show they hold for n nodes.

Consider an internal node, say x, farthest from the root node. Assume it is at a distance p from the root node. Its children must be 2 external nodes.

P

.

p+1

p

y

x 1

Figure 1, first tree

Figure 2, second tree

This is shown in Figure 1 above. Replace this node x by an external node y as in Figure 2. Let the parameters of the second tree be n' =n−1 ,

' n ,

' m ,

m ' = m -1, I ' =I – p, ' E =

I

'

+ 2n.

Substituting for n' , m ' , I ' , E' , we get m-1= n-1+1, E-p-2= I-p + 2(n -1). This gives m = n+1, E = I + 2n. ■

'

, E' . Then

E' = E- 2(p+1) + p = E – p -2.

By inductive hypothesis, ' ' m =n +1 ,

I

Complexity analysis: We showed h(n) = log (n+1) , n  0, is the height of a binary decision tree with n internal nodes. Let k = log (n+1) = Θ(log n). Then all the internal nodes are at the level 1, 2,…, k, and the external nodes are at level k and k+1. For the BST shown above, n=10 and k = log 11 = 4. Hence the internal nodes are at level 1, 2 and 3, and the external nodes are at level 3 and 4, as is easily seen. Since successful searches end at an internal node, its time is O (log n). Unsuccessful searches end at levels k and k+1, so its time is Θ (log n).

Now we determine the average case complexity. Define As(n): Average number of comparisons in a successful search. Au(n): Average number of comparisons in an unsuccessful search. Then, As(n) = 1 + I /n. Au(n) = E/ (n+1). We explain the first equation. If a successful search ends at any internal node at level p with 1 ≤ p ≤ k, then we require p comparisons to reach that node. However, the internal path length of that node is p -1. Hence we add 1 to the average internal path length which is I/n. Now the second equation. If an unsuccessful search ends at an external node at level p with k ≤ p ≤ k+1, then, as before, we require p comparisons to reach that node. The external path length of that node is p, and # of external nodes is n + 1. So average external path length is E/(n+1).

The two equations give I = n (As(n) – 1) E = (n +1) Au(n). Substituting in E = I + 2n, we get As(n) = (1 + 1/n) Au(n)- 1. Since all unsuccessful searches end at external nodes at level k or k+1, we have A u(n) = Θ (log n). Hence, the above equation gives As(n) = Θ (log n). We summarize,

Successful search

Best

Average

Θ (1).

Θ (log n).

Worst Θ (log n).

Unsuccessful search Every case Θ (log n)...


Similar Free PDFs