COMP2521 21T2 - Lecture notes 1-10 PDF

Title	COMP2521 21T2 - Lecture notes 1-10
Course	Data Structure & Algorithms
Institution	University of New South Wales
Pages	90
File Size	6.7 MB
File Type	PDF
Total Downloads	46
Total Views	136

Preview

CLICK TO PREVIEW PDF

Summary

Summary notes from all lectures...

Description

COMP2521 - Week 1, Lecture 2 Key Concepts ● Revision of recursion ● Brief performance analysis ● Makefiles? Recursion ● A programming pattern where a function calls itself ● Happens during some kind of “looping” ● Generally doesn’t make a program run faster, but they can help to simplify otherwise complex logical statements. ○ In fact, the longer/bigger a recursive function is, the slower the program will run, since recursion grows exponentially. Base Case A special case that stops a recursive function from repeating indefinitely. “Wall” and “Werror” in GCC/ “-O” (Optimiser Flag) ● Wall - Show all warnings in the code ● Werror - Convert all warnings into errors ● “-O” The compiler optimises the code when its able to Analysing a “Sum” Program (Performance Analysis) ● Can we run and plot the results? ● How do we describe how long this program takes? ● How does the time taken change with the changing size of input? ● How much memory does this program use when running? ● How does the memory change with the changing size of input?

COMP2521 - Week 2, Lecture 1 Key Concepts: ● Bubble sort ● Pseudocode ● Theoretical performance analysis ○ Time complexity Bubble Sort ● Moves from left to right and compares ‘pairs of numbers’, rearranging them if necessary. ○ This continues until the numbers are sorted.

Time ●

‘Time’ command is used to measure the time taken for a program to execute. ○ Outputs 3 separate times: ■ Real - User + sys ■ User - Time taken for the user to execute the program ■ Sys - Time taken for the OS to execute the program

Issues with Literal Program Run-Time (Analysing Algorithms) ● When trying to understand general program behaviour, the “time” command has some issues, such as: ○ Requires implementation of the algorithm ○ Running time is influenced by specific machine ○ Need to use the same machine and circumstances to compare different algorithms Pseudocode ● A method of describing the behaviour of a program without needing to worry about the language that its written in or the machine it runs on. ○ More structured than English ○ Less structured than code ● Pseudocode allows the user to explain their logic and code more easily whilst not being constrained by coding structure.

Theoretical Performance Analysis ● This is where we use the pseudocode of a program to evaluate its complexity regardless of the hardware or software environment. ● For time complexity, this is done by counting how many “operations” occur in a program. ○ This complexity is determined Analysing a Simple Program ● Do an analysis of each line. ○ We can generally assume that a line has a single unit cost of work. ● When you enter a loop, multiply the cost of the line by the number of times it's being looped. ● For some algorithms you can analyse it for best case and worst case input ● Space complexity refers to the number of bytes/memory required to run the program

Primitive Operations ● We assume that primitive operations are “constant time” (1 step of processing) ○ Even though in reality these may be one step or a few steps, it mainly doesn’t matter. ● Primitive operations include: ○ Indexing an array (getting or setting) ○ Comparisons (equality, relational) ○ Variable assignment Big ‘O’ Notation ● We can simplify the amount of time or space that a program needs by: ○ Removing any constant factors in front of variables ○ Removing any lower order variables. ● This produces a program’s Big-O number (e.g. O(n^2)) ○ E.g. 3n^2 + 2n + 3 = O(n^2) ● Big-O numbers essentially focus on the most dominant number in your polynomial as factors/lower orders become irrelevant at scale.

Analysing a “Search” Program ● A typical ‘linear’ search loops through all the numbers in STDIN and, depending on the desired number, prints whether the number was found or not.

●

A ‘binary’ search keeps halving intervals for all the numbers in STDIN until the desired number is found.

Amortisation ● It is the process of doing some work ‘up front’ to reduce the work later on. ● For searching algorithms, a binary search is faster when the numbers are sorted in a list, whereas a linear search is faster when the numbers are randomised in a list. ○ Because of this, individuals can sort a list of numbers prior to using a binary search, which will save more time in the long run.

COMP2521 - Week 2, Lecture 2 Key Concepts ● Abstraction ● ADTs ○ Creation ○ Implementation ○ Time/Space complexities Abstraction ● Hiding details of how a system is built in favour of focusing on the high level behaviours, or inputs and outputs, of the system. ● Examples: ○ Web browsers abstract away the underlying hardware that they’re run on ○ C abstracts away assembly/MIPS code Abstract Data Types (ADTs) ● ADT is a description of a data type that focuses on it’s high level behaviour without regard for how its implemented underneath: ○ There is a separation of interface from implementations ○ Users of the ADT only see the interface ○ Builds of the ADT provide an implementation ○ The interface allows people to agree at the start, and work separately ● Advantages of using an ADT include: ○ It allows future iterations to remove or upgrade a data structure ○ It allows things like lists to have more intelligent implementation underneath Programming by Contract ● We defining an interface, information needs to be included, such as: ○ Pre-conditions: What conditions hold at the start of the function call ○ Post-conditions: What conditions will hold at the end of the function ● Add these via comments ● We can also ‘sanity check’ with asserts Creating an ADT 1. Determine the interface of your ADT in a .h file. 2. The “developer” builds a concrete implementation for the ADT in a .c file. 3. The “user” uses the ADT in their program. a. They have to compile with it, even though they might not understand how it is built underneath.

ADT Example: Sets ● What’s a Set data integer values ● We need to figure (interface) first. include:

type?: A collection of unique (unordered, no duplicates). out the behaviour of the ADT Some of these behaviours

●

We can then begin to write this as C code in a .h file:

Key Principles of ADTs in C ● The “set” (or equivalent) is usually a pointer of some sort. ● That pointer is usually the first argument in every ADT with the exception of the create function. For example:

● ●

We write .h files, we use header guards to prevent re-definition. We can also put pre/post conditions in our ADT to help both developers and users manage expectations.

Set Usage ● We write our “main” file, and compile it with the set library that the ADT developer has implemented. ● While we need their .c file to build with, we don’t need to look at it or make sense of it, because we have the ADT (i.e. .h file). Set Implementation ● Since the .h file provides the prototypes for our ADTs, we need to implement the actual functionality of such functions in our .c file. Examples of Set Implementations Unsorted Array ● We need to do upper and lower bounds checks because there will be a theoretical limit on the size of the set.

●

Since a linear sort is faster than a binary sort for unsorted arrays, we need to implement the set with this in mind.

● Time/Space complexities: Sorted Array ● This is similar to an unsorted array with some differences: ○ Membership test -> We can use binary search which is faster ○ Insertion -> We can use binary search and then shift up and insert

●

○ Deletion -> We can use binary search and then shift down Time and space complexities

Linked List ● Using our knowledge of linked lists, we can loop through the list using pointers and then re-arranging such pointers when sets need to be included.

●

Implementation:

●

S

Note: Pointer structures (i.e. linked lists) are slow to look through but quick to modify, but arrays are slow to modify but quick to look through. Issues with Direct Access ● If we try to access elements of the implementation directly we might receive a “dereferencing pointer to incomplete type” error.

COMP2521 - Week 3 Lecture 1 Key Concepts ● Tree data types ○ BST rules/operations ○ Time complexity ● Macros Macros ● We can use macros to abstract repeated code and make it easier to read. ○ We use #define to accomplish this and make code more readable. // a Node contains its data, plus left and right subtrees typedef struct Node { int data; Tree left, right; } Node; // some #define #define #define

macros that data(node) left(node) right(node)

we will use frequently ((node)->data) ((node)->left) ((node)->right)

Data Structure “Trees” ● These are connected graphs that: ○ Has edges (lines) and nodes (circles) ○ Has no cycles (can’t get caught in loops) ○ Has each node with a particular data values ○ Has each node links to up to k children

Binary Search “Trees” ● These are able to be searched with a binary search and are easy to maintain/modify. ● Why BSTs? ○ Binary search is a necessary algorithm to use for large sets of numbers (quicker than linear search). ○ Maintaining an ordered array is hard, but binary searching an ordered array is very fast. ■ Easy to search, hard to insert. ■ I.e. To display an item, you can do printf(“%d”), array[52];

○

○

Maintaining an ordered linked list is easy (direct insertion), but binary searching on an ordered linked list is hard (will always have to traverse linearly). ■ Easy to insert, hard to search. ■ You can’t access specific nodes quickly Trees help to solve this issue by borrowing the best elements from both.

BST Structure ● These trees are either: ○ Empty; or ○ Consist of a node with two subtrees: ■ Node contains a value ■ Left and right subtrees are also binary trees (recursive) ● The first node of a tree is called the root node. ● Level of a node: Path length from root to node ● Height/depth of tree: Max path length from root to leaf

BST RULES ● BSTs are ordered trees, meaning: ○ Ecah node is the root of 2 subtrees (which are potentially NULL). ○ All values in the left subtree are less than the root ○ All values in the right subtree are greater than the root ● These rules apply for all nodes in the tree

Balanced BST ● A tree becomes height-balanced once there are an equal number of nodes between the left and right subtree for all nodes in the tree.

BST Operations ● insert(Tree, item) ● delete(Tree, item) ● search(Tree, item) ● print(Tree) ● (create + delete) ● Note: All these operations ensure that a BST is ordered, but not balanced. BST Insertion ● E.g. Inserting [3, 2, 4, 5, 1] results in (but does not always guarantee a balanced tree):

●

This operation uses the following algorithm: TreeInsert(Tree, item): if Tree is empty: return new root node containing item else if item < Tree's value: Tree's left child = TreeInsert(Tree's left child, item) else if item > Tree's value: Tree's right child = TreeInsert(Tree's right child, item) else: return tree (avoid duplicates)

Time Complexity ● BST Insertion is typically O(h), where h is the height of the BST. ○ For a balanced tree O(h) = O(log2(n)) ● In general, the time complexity is simply the time it takes to traverse down to the place that the node needs to be inserted. ● N is used for the left subtree’s time complexity ● M is used for the right subtree’s time complexity Traversal ● Searching: ○ Best case O(1) - what you re looking for is at the root ○ Worst case O(h) - where h is the height of the BST ● Printing: ○ Always O(n), where n is the number of nodes in the tree Join ●

Typically O(m), where m is the height of the right subtree

Delete ● Typically O(h), where h is the height of the BST. ● In general, the time complexity is the time it takes to traverse down to the place that the node needs to be deleted. BST Traversal ● Preorder: Visit root, then left subtree, then right subtree. ○ “Printing the data immediately when we explore it and then searching the left and right child.” ○ “Shows you the literal order in which the nodes are explored” ● Inorder: Visit left subtree, then root, then right subtree. ○ “We are hitting the node, exploring the entire left subtree and then printing it. We then explore the entire right tree.” ○ “Useful when we want to extract the order of the values” ● Postorder: Visit left subtree, then right subtree, then root. ○ “We explore both the left and right subtree, then print it at the end.”

BST Join ● We can join two trees by taking two BSTs, join and return a single one that contains all items correctly ordered. ○ Join does not guarantee to maintain a balanced tree. ● Method: ○ Find the minimum mode in the right subtree (t2) ○ Replace the minimum node by its right subtree ○ Element the minimum node to be the new root of both trees

Explanation of Photo ● You take the lowest node and make it the new root of both trees. ● You elevate everything else up to replace the lowest node’s original position.

Pseudocode TreeJoin(tree1, tree2): if tree1 is empty, return tree2 if tree2 is empty, return tree1 current = tree2 parent = NULL while current's left child is not empty: parent = current current = current's left child if parent is not NULL: parent's left child = current's right child current's right child = tree2 current's left child = tree1 return current (new root) BST Delete

Case-2:

Case-3:

Case-4: Use the join operation to fix the previously separated subtrees

Case-4 (Method 2):

Pseudocode TreeDelete(tree,item): if t is not empty: if item < data(t): left(t)=TreeDelete(left(t), item) else if item > data(t): right(t)=TreeDelete(right(t), item) else: if left(t) and right(t) are empty: new = empty tree else if left(t) is empty: new = right(t) else if right(t) is empty: new = left(t) else: new = TreeJoin(left(t), right(t)) free memory allocated for t t = new

Freeing BSTs ● Uses a postorder traversal method.

// 0 children // 1 child // 1 child // 2 children

COMP2521 - Week 3, Lecture 2 Key Concepts ● Balanced BST

● ●

From above:The difference in the no. of nodes in the left subtree and the no. of nodes in the right subtree is less than 2. Weight-balanced tree. A weight balanced tree means that there is the same amount of nodes of the left/right subtrees: ○ Less than 2 because there is always at least 1 level of difference because the tree will inevitably be unbalanced to some degree (never perfectly symmetrical).

Strategies to Balance BSTs

Rotations Right: ● Consider the root of the tree to be the pivot point of the tree rotation. Then bring surrounding nodes up and around to complete the rotation.

●

The in the show

● right the that link-wise are the root’s left

● ●

red arrows diagram the link changes during the rotation. During a rotation, only things are affected child and the left child’s right child.

When re-establishing links, keep in mind that the left subtree is always less than the right subtree. Left example:

●

Also note that when rotating right and left again, the original tree is not created, as you are rotating different nodes with each operation.

Another Subtree Rotation Example

Tree Rotation Algorithms rotateRight(root): newRoot = (root's left child) if (root) is empty or (newRoot) is empty: return root (Root's left child) should now point to (newRoot's right child) (newRoot's right child) should now point to root return newRoot

rotateLeft(root): newRoot = (root's right child) if (root) is empty or (newRoot) is empty: return root (Root's right child) should now point to (newRoot's left child) (newRoot's left child) should now point to root return newRoot Time Complexity for Rotations ● For a single rotation, the time complexity is ‘cheap’: ○ Cheap cost O(1) ○ Rotation requires simple, localised pointer arrangement ● However realistically, rotation is usually applied multiple times, from leaf to root, along one branch: ○ The cost of this is O(height) ○ Payoff is improved balanced which reduces height ○ Pushes search cost towards O(log n) Methods of Using Rotations to Balance Trees Insertion at Root ● Approach: Insert new items at the root ● Method: Insert new node as leaf (bottom of the tree) normally in tree, then lift new node to top by rotation (usually rotate multiple times). (Right rotation brings left child up and vice versa) ● Benefit(s): Recently-inserted items are close to the root, therefore there is a lower cost to time complexity if recent items are more likely to be searched.

●

Disadvantages: Requires large-scale re-arrangement of tree.

Example (Think about how to rotate the leaf’s parent in a way that will elevate the leaf):

Pseudocode for Insertion at Root ● Works recursively to insert leafs into the tree and then using recursion again to rotate the leaf to become the new root of the tree. insertAtRoot(tree, item): root = get root of "tree" if tree is empty: tree = new node containing item else if item < root's value: (tree's left child) = insertAtRoot((tree's left child), item) tree = rotateRight(tree) else if item > root's value: (tree's right child) = insertAtRoot((tree's right child), item) tree = rotateLeft(tree) return tree Time Complexity for Insertion of Root ● The complexity is typically O(h), where h is the height of the tree.

● ●

○ In reality though, the cost is double, as you have to traverse down and then traverse up. Insertion at root has a higher tendency to be balanced, but no balance is guaranteed. For applications where the same element is looked up many times, we can only consider moving a node up to the root when it’s found during a search. ○ This makes lookup for recently accessed items very fast for this method.

Tree Partitioning ● Re-arranges the tree so that the element with index i becomes the new root. ○ Indexes are based on pre-order (smallest to largest, left to right). ● We partition the middle index (equal number of nodes before and after the node) and then rotate the index to become the new root so that the whole tree becomes more balanced.

Pseudocode for Tree Partitioning partition(tree, index) m = # nodes in the tree's left subtree if index < m: (tree's left subtree) = partition(tree's left subtree, index) tree = rotateRight(tree) else if index > m: (tree's right subtree) = partition(tree's right subtree, index - m 1) tree = rotateLeft(tree) return tree Time Complexity for Tree Partitioning ● The time complexity is similar to that of insert-at-root. ○ That is, it gets faster the more we partition closer to the root. Periodic Rebalancing ● A simple approach to using partitions selectively. ○ I.e. Rebalance the tree every k inserts. ● The problem with this is that to find the no. of nodes in a tree, we need to do a O(n) traversal which can be expensive to do often. periodicRebalance(tree, index) tree = insertAtLeaf(tree, item) if #nodes in tree % k == 0:

tree = partition(tree, middle index) return tree Note: Tree Height ● The height of a tree is equal to the number of edges in the longest path from the root to a leaf node. ○ The height of a tree is equal to the height of the root.

COMP2521 - Week 4, Lecture 1 Key Concepts

● Debugging with GDB and Valgrind

GDB • GDP is a portable debugger that we can use to debug our C programs during runtime. • When compiling our code, we use the -g flag so that our executable can be “watched” by GDB when running. • Essentially, GDB is responsible for looking throughout your code, but the end user is responsible for telling GDB which parts of the code to inspect closely (i.e. when to stop and to carefully check the code) for debugging purposes. o ...