Title | COMP2521 Sort Detective Lab Report |
---|---|
Course | IT |
Institution | University of New South Wales |
Pages | 6 |
File Size | 279.1 KB |
File Type | |
Total Downloads | 23 |
Total Views | 161 |
Sort Detective Lab...
COMP2521 Sort Detective Lab Report In this lab, the aim is to measure the performance of two sorting programs, without access to the code, and determine which sort algorithm each program implements.
Experimental Design 1. Testing stability for both sorts. This is done by testing only 1 key for a list with 2 keys. The results should give us an indication if the sorting method is stable or not stable by showing us the sorted list (if the same order for the key not tested is preserved, it is stable). Any unstable sorts will indicate an oblivious bubble sort.
2. Manual testing for whether the sorting method stored original order of values. With randomly generated data, and manually changing the values (changing to same value to be sorted to see order), we tested using sortA and sortB Any change in the order of the letters indicate that the sorting methods change the order of list somehow before sorting
3. Testing 10000-100000 in increments of 10000 to see the range of timings for sortA. This will be tested with ascending, descending and in random generated lists.
4. Testing 60000-100000 in increments of 10000 and 100000 - 1000000 in increments of 100000 to see range of timings for sortB. This will be tested with ascending, descending and in random generated lists. Graphing the results in excel can then tell us the time complexities of the different graphs
5. Testing the sorts with same values but different keys to determine unique characteristics of the sort.
Testing Stability One key was tested for a list with two keys. The results should give us an indication if the sorting
method is stable or not stable by showing us the sorted list (if the same order for the key not tested is preserved, it is stable).
Performance Analysis
In our performance analysis, we measured how each program's execution time varied as the size and initial sortedness of the input varied. We used the following kinds of input: sortA: 10000 – 100000 sortB: 60000 – 100000 sortB: 100000 - 1000000 For sortA a range from 10000 to 100000 was used because for larger sizes the sorting time was extremely long Because of the way timing works on Unix/Linux, it was necessary to repeat the same test multiple times. The command works by sampling and produced different results for the same program run multiple times. We were able to use up to quite large test cases without storage overhead because (a) we had a data generator that could generate consistent inputs to be used for multiple test runs, (b) we had already demonstrated that the program worked correctly, so there was no need to check
Experimental Results Performance Experiments: Test 1:
Original List
Sort A
Sort B
1.nwl
1.nwl
1.jmo
1.dxr
1.dxr
1.dxr
1.jmo
1.jmo
1.rwb
1.ell
1.ell
1.nwl
1.rwb
1.rwb
1.ell
Original List
Sort A
Sort B
3.mqb
3.mqb
3.ewh
3.nuv
3.nuv
3.tns
3.tns
3.tns
3.nuv
Test 2:
3.kue
3.kue
3.mqb
3.ewh
3.ewh
3.kue
From the results, we can see sortA keeps the original list order while sortB does not preserve the order.This means sortB has to be either selection sort, Vanilla Quick Sort, Quick Sort Median of Three, Randomised Quick Sort, Shell Sort powers of four, shell sort (sedgewick), Psuedo-bogo sort. SortA has to be bubble sort, insertion sort, or merge sort due to its stability.
SortA-Random 10000-100000 SortA-Sorted 10000-100000 SortA-Reverse 10000-100000 Size
Time
Size
Time
Size
Time
10000
0.19
10000
0.15
10000
0.14
20000 30000
0.61 1.37
20000 30000
0.61 1.32
20000 30000
0.57 1.27
40000
2.38
40000
2.38
40000
2.27
50000
3.6
50000
3.67
50000
3.64
60000 70000
4.84 6.89
60000 70000
4.78 6.56
60000 70000
4.77 6.52
80000
8.87
80000
8.82
80000
8.98
90000
12.12
90000
11.91
90000
11.46
100000
14.98
100000
14.82
100000
14.11
Sort A: From the timings, sortA timing consistently increases proportionally with increases in list length. Sort A cannot be a bubble sort with early exit as this would mean that ascending times should be the fastest in all cases, which does not happen. Once we are in the 100,000 size lists for sortA, we can see a 1 second delay between random/Ascending compared to descending. With a merge sort, because it divides the list regardless of the order already in the beginning, it should take linear time to sort the list (meaning same time for all 3 cases) However, from the results, we can see descending is clearly longer than the other lists. This matches with what the insertion sort does, because with descending, the final sorted list has to be incrementally moved every single insertion, so will have the longest times for descending. sortA: insertion sort
SortB Random
60000 100000
Size
Time
SortB Random
100000500000
Size
Time
SortB Random
6000001000000
Size
Time
60000
0.05
100000
0.09
600000
0.4
70000
0.05
200000
0.15
700000
0.47
80000
0.08
300000
0.21
800000
0.61
90000
0.06
400000
0.34
900000
0.71
100000
0.1
500000
0.37
1000000
0.8
60000 SortB - Sorted 100000
SortB - Sorted
100000500000
SortB - Sorted
6000001000000
Size
Size
Time
Size
Time
Time 60000
0.05
100000
0.08
600000
0.33
70000
0.05
200000
0.11
700000
0.41
80000
0.05
300000
0.18
800000
0.43
90000
0.05
400000
0.26
900000
0.49
100000 SortB Reverse
0.1 60000 100000
Size
Time
500000
0.3
SortB Reverse
100000500000
Size
Time
1000000
0.71
SortB Reverse
6000001000000
Size
Time
60000
0.06
100000
0.05
600000
0.25
70000
0.04
200000
0.13
700000
0.37
80000
0.05
300000
0.2
800000
0.43
90000
0.04
400000
0.24
900000
0.48
100000
0.05
500000
0.33
1000000
0.54
From the 100,000 size lists for sortB, we can see an increase in timing for random, descending and ascending. Ascending has the shortest timing while Random has the longest timing (relatively). If it was a quick sort with last pivot, the descending would have the highest timing because the comparison between the two partitions requires it to search an element greater/equal forward (from the start) and an element less than the pivot backwards (from end). Because
the pivot in descended list is the largest, the sort will iterate through the whole list to find the element greater, and so will take the longest. If it was randomised quick sort, all 3 timings should be similar as the order of the lists are Randomised but this is not the case.
Test with the same keys:
Original List
sortB
1.asd
1.hij
1.efg
1.efg
1.hij
1.klm
1.klm
1.qrs
1.nop
1.nop
1.qrs
1.asd
1.tuv
1.tuv
With both shell sorts, because all the keys are the same, the first instance of the sort should complete and leave the original list (acting as stable). And as with multiple tests, the results from the above were consistent- meaning the sort was not randomised. This leaves the median-of-three quicksort whcih makes sense as the pivot would be the middle (klm) and the first instance should be (hij) with the first search for a key >= to the pivot. And as the bottom two keys are of (asd,tuv) which were the two ends of the original list, this is expected from quicksort as these numbers are the last to be sorted.
Conclusions On the basis of our experiments and our analysis above, we believe that
sortA implements the insertion sorting algorithm sortB implements the median of three quicksort sorting algorithm
Appendix
SortA-Random 10000 - 100000
SortB-Random 100000 – 1000000...