COMP2521 Sort Detective Lab Report PDF

Title	COMP2521 Sort Detective Lab Report
Course	IT
Institution	University of New South Wales
Pages	6
File Size	279.1 KB
File Type	PDF
Total Downloads	23
Total Views	161

Preview

CLICK TO PREVIEW PDF

Summary

Sort Detective Lab...

Description

COMP2521 Sort Detective Lab Report In this lab, the aim is to measure the performance of two sorting programs, without access to the code, and determine which sort algorithm each program implements.

Experimental Design 1. Testing stability for both sorts. This is done by testing only 1 key for a list with 2 keys. The results should give us an indication if the sorting method is stable or not stable by showing us the sorted list (if the same order for the key not tested is preserved, it is stable). Any unstable sorts will indicate an oblivious bubble sort.

2. Manual testing for whether the sorting method stored original order of values. With randomly generated data, and manually changing the values (changing to same value to be sorted to see order), we tested using sortA and sortB Any change in the order of the letters indicate that the sorting methods change the order of list somehow before sorting

3. Testing 10000-100000 in increments of 10000 to see the range of timings for sortA. This will be tested with ascending, descending and in random generated lists.

4. Testing 60000-100000 in increments of 10000 and 100000 - 1000000 in increments of 100000 to see range of timings for sortB. This will be tested with ascending, descending and in random generated lists. Graphing the results in excel can then tell us the time complexities of the different graphs

5. Testing the sorts with same values but different keys to determine unique characteristics of the sort.

Testing Stability One key was tested for a list with two keys. The results should give us an indication if the sorting

method is stable or not stable by showing us the sorted list (if the same order for the key not tested is preserved, it is stable).

Performance Analysis

In our performance analysis, we measured how each program's execution time varied as the size and initial sortedness of the input varied. We used the following kinds of input: sortA: 10000 – 100000 sortB: 60000 – 100000 sortB: 100000 - 1000000 For sortA a range from 10000 to 100000 was used because for larger sizes the sorting time was extremely long Because of the way timing works on Unix/Linux, it was necessary to repeat the same test multiple times. The command works by sampling and produced different results for the same program run multiple times. We were able to use up to quite large test cases without storage overhead because (a) we had a data generator that could generate consistent inputs to be used for multiple test runs, (b) we had already demonstrated that the program worked correctly, so there was no need to check

Experimental Results Performance Experiments: Test 1:

Original List

Sort A

Sort B

1.nwl

1.nwl

1.jmo

1.dxr

1.dxr

1.dxr

1.jmo

1.jmo

1.rwb

1.ell

1.ell

1.nwl

1.rwb

1.rwb

1.ell

Original List

Sort A

Sort B

3.mqb

3.mqb

3.ewh

3.nuv

3.nuv

3.tns

3.tns

3.tns

3.nuv

Test 2:

3.kue

3.kue

3.mqb

3.ewh

3.ewh

3.kue

From the results, we can see sortA keeps the original list order while sortB does not preserve the order.This means sortB has to be either selection sort, Vanilla Quick Sort, Quick Sort Median of Three, Randomised Quick Sort, Shell Sort powers of four, shell sort (sedgewick), Psuedo-bogo sort. SortA has to be bubble sort, insertion sort, or merge sort due to its stability.

SortA-Random 10000-100000 SortA-Sorted 10000-100000 SortA-Reverse 10000-100000 Size

Time

Size

Time

Size

Time

10000

0.19

10000

0.15

10000

0.14

20000 30000

0.61 1.37

20000 30000

0.61 1.32

20000 30000

0.57 1.27

40000

2.38

40000

2.38

40000

2.27

50000

3.6

50000

3.67

50000

3.64

60000 70000

4.84 6.89

60000 70000

4.78 6.56

60000 70000

4.77 6.52

80000

8.87

80000

8.82

80000

8.98

90000

12.12

90000

11.91

90000

11.46

100000

14.98

100000

14.82

100000

14.11

Sort A: From the timings, sortA timing consistently increases proportionally with increases in list length. Sort A cannot be a bubble sort with early exit as this would mean that ascending times should be the fastest in all cases, which does not happen. Once we are in the 100,000 size lists for sortA, we can see a 1 second delay between random/Ascending compared to descending. With a merge sort, because it divides the list regardless of the order already in the beginning, it should take linear time to sort the list (meaning same time for all 3 cases) However, from the results, we can see descending is clearly longer than the other lists. This matches with what the insertion sort does, because with descending, the final sorted list has to be incrementally moved every single insertion, so will have the longest times for descending. sortA: insertion sort

SortB Random

60000 100000

Size

Time

SortB Random

100000500000

Size

Time

SortB Random

6000001000000

Size

Time

60000

0.05

100000

0.09

600000

0.4

70000

0.05

200000

0.15

700000

0.47

80000

0.08

300000

0.21

800000

0.61

90000

0.06

400000

0.34

900000

0.71

100000

0.1

500000

0.37

1000000

0.8

60000 SortB - Sorted 100000

SortB - Sorted

100000500000

SortB - Sorted

6000001000000

Size

Size

Time

Size

Time

Time 60000

0.05

100000

0.08

600000

0.33

70000

0.05

200000

0.11

700000

0.41

80000

0.05

300000

0.18

800000

0.43

90000

0.05

400000

0.26

900000

0.49

100000 SortB Reverse

0.1 60000 100000

Size

Time

500000

0.3

SortB Reverse

100000500000

Size

Time

1000000

0.71

SortB Reverse

6000001000000

Size

Time

60000

0.06

100000

0.05

600000

0.25

70000

0.04

200000

0.13

700000

0.37

80000

0.05

300000

0.2

800000

0.43

90000

0.04

400000

0.24

900000

0.48

100000

0.05

500000

0.33

1000000

0.54

From the 100,000 size lists for sortB, we can see an increase in timing for random, descending and ascending. Ascending has the shortest timing while Random has the longest timing (relatively). If it was a quick sort with last pivot, the descending would have the highest timing because the comparison between the two partitions requires it to search an element greater/equal forward (from the start) and an element less than the pivot backwards (from end). Because

the pivot in descended list is the largest, the sort will iterate through the whole list to find the element greater, and so will take the longest. If it was randomised quick sort, all 3 timings should be similar as the order of the lists are Randomised but this is not the case.

Test with the same keys:

Original List

sortB

1.asd

1.hij

1.efg

1.efg

1.hij

1.klm

1.klm

1.qrs

1.nop

1.nop

1.qrs

1.asd

1.tuv

1.tuv

With both shell sorts, because all the keys are the same, the first instance of the sort should complete and leave the original list (acting as stable). And as with multiple tests, the results from the above were consistent- meaning the sort was not randomised. This leaves the median-of-three quicksort whcih makes sense as the pivot would be the middle (klm) and the first instance should be (hij) with the first search for a key >= to the pivot. And as the bottom two keys are of (asd,tuv) which were the two ends of the original list, this is expected from quicksort as these numbers are the last to be sorted.

Conclusions On the basis of our experiments and our analysis above, we believe that  

sortA implements the insertion sorting algorithm sortB implements the median of three quicksort sorting algorithm

Appendix

SortA-Random 10000 - 100000

SortB-Random 100000 – 1000000...