CS61C Fall2018 Midterm Guide PDF

Title	CS61C Fall2018 Midterm Guide
Author	CiCi Huang
Course	Machine Structures
Institution	University of California, Berkeley
Pages	7
File Size	262.4 KB
File Type	PDF
Total Downloads	73
Total Views	157

Preview

CLICK TO PREVIEW PDF

Summary

Download CS61C Fall2018 Midterm Guide PDF

Description

CS 61C Fall 2018

Midterm Guide October 14, 2018 Notes

With the upcoming Midterm, here is a topical guide to aid your studying process. This guide will cover Caches, MapReduce/Spark, and Data Level Parallelism. We do not want to turn this into a textbook reading, so various concepts have been omitted. This is not to say they are not important or will not show up on the midterm; they are just better presented/studied in a different format. Disclaimer: this guide is not meant to be comprehensive; the goal of this document is to present some of the challenging concepts in a “friendlier” way, in the hopes that people will be able to study more efficiently with a strong core understanding of these topics.

1 Time to Cache Up on Caches Caches are a tricky subject all around, and they can be tested in many different ways. This is to help get you up to speed on the basics of caches. In general, the steps taken to solve cache problems involve • Figure out your Tag, Index, and Offset bits. • Draw a picture to represent your cache. Use TIO to know where to look. • Update drawing with data brought in/kicked out and figure out Hits/Misses. • Find a Hit and Miss pattern or determine total hits and misses. Use these to find Hit/Miss Rate. That is usually a good start to working on cache problems, but there are definitely a lot more subtleties involved and there is no one right answer. The following categories are what you are likely to see in previous exams and are expected to know for the coming exam.

1.1

What Does the Cache Actually Store?

While there probably won’t be any gotcha questions relating to this, it is important to understand the mechanics of a cache. The Tag, Index, and Offset make up the memory address, but the cache doesn’t actually store all of this information. The index and offset are almost naturally built in properties of the cache. You can treat the cache almost like an array, where the index and offset tell you what element to look at. An array doesn’t store the actual indices itself, and neither does the cache! The cache only really needs to store the Tag to match addresses, the Data for quick access, and some extra bits to determine the state of the data (valid, dirty, etc.).

2

1.2

Midterm Guide

Tag, Index, Offset

Equations: Offset Bits: log2 (blocksize) Index Bits: log2 (# indices) Tag Bits: Leftover bits Tips and Intuition: As I always like to tell students, try to find the offset bits first, since it is likely the easiest calculation. The offset is directly related to the block size. The offset tells you what byte of the block you are looking for, so you need log2 (blocksize) (with blocksize in Bytes) offset bits to fully represent every Byte in the block. Next is index. On an exam, we will provide various pieces of information about the cache, which will help you to determine all of the cache properties. At this stage, drawing a picture will definitely help you to visualize the system. You can then use this information to determine the index bits. Some useful things to remember: cache size # blocks = block size # blocks # indices = associativity The types of caches we work with are Direct Mapped, Fully Associative, and N-Way Set Associative. These all affect the associativity of the cache, also changing the necessary number of index bits. This is discussed more in 3.4. The last piece of information is the Tag. These are the (upper) remaining bits of the address that aren’t used for the index or the offset. If these are just leftover bits, why do we need them? Well, every byte of our memory space has a unique address. To make sure we are getting the right block of memory, our address has to match exactly. With the index and offset we know where in the cache to look, but we also have to make sure this block came from the right place. We do this by matching the tag, which then says the address of the block fully matches the address we are looking for.

1.3

TIO with Memory Addresses

The Tag, Index, and Offset bit breakdown can be fully determined just using cache properties, but it’s important to understand how these work with actual addresses. After determining how many bits each field requires, you can split your address up into the [ T : I : O ] breakdown. When accessing memory, you first look at the index in the cache, check if the data is valid, and then try to match the tag. One important trend to notice is that the index and offset bits repeat for every cache-sized portion of memory. This is best illustrated with the lecture figure below:

Midterm Guide

3

Notice that the color pattern of the cache repeats in memory, meaning that each of those 4-color chunks will fit perfectly into the cache (with the color specifying where it is mapped to). Some important things to take away from the picture: • Each 4-color chunk (from medium blue to light blue) will have a unique tag. • The same color means the same index bits. As a result, if you always access the green elements, your index bits will always be 2 and map to the same location in the cache.

1.4

Types of Caches

The cache shown in the picture above is a direct mapped cache. This means there is exactly 1 block per index, so we only ever have to look at a single spot when trying to find data. This block cannot be stored anywhere else in the cache, so we know the number of indices in a Direct Mapped Cache is equivalent to the number of blocks. Opposite to Direct Mapped we have Fully Associative Caches. In this configuration, blocks can go anywhere in the cache. Think about what the TIO breakdown would look like. There would be no index bits, since there are 0 indices. The blocks can go into the cache at any location, so having an index for these free-for-all spaces would be pointless! Somewhere in the middle is the N-Way Set Associative Cache. This cache is like a mixture between the direct mapped cache and the fully associative one. This cache can store N different blocks under the same index. Using another example from lecture:

4

Midterm Guide

You can see the red memory blocks can go in either of the two red spots in the cache, and likewise for the green. One way to remember this is that there are N ways (in this case 2) to store the data into a single index. As a result, for a given cache size there are fewer indices total but more room in each one!

1.5

Hits and Misses

A hit signifies that the data is in the cache, ready to be accessed quickly. A miss signifies that the data is not in the cache. We then have to go to main memory and bring it into our cache, which takes an additional amount of work. As mentioned in the intro, it is always good to draw out your cache. This helps you to keep track of what data is currently in the cache, which will then generate certain hit/miss patterns (or totals if we are feeling extra difficult). As an example, this is how I drew out the cache during discussion 5:

If we are working with different types of caches, more metadata (dirty bit, least recently used number, etc.) may be necessary. The main idea is to just know what is in your cache at any given time to figure out hits or misses.

1.6

Final Thoughts

The previous sections can get you through many cache problems and contain information and tips that will definitely help out on the midterm. However, there

Midterm Guide

5

are definitely a lot of other topics to review. Topics that you should still look into include, but are not limited to: • Write back/write through policies • Write allocate/no write allocate policies • Block eviction decisions, e.g. Least Recently Used block gets evicted • Average Memory Access Time • etc.

2 Spark MapReduce and Spark both fall under a similar category of splitting up data processing into multiple stages and doing it at a large scale. You typically will see Spark on past exams, and you are expected to know the major Spark functions. However, you should still understand how MapReduce works because it is still fair game. Before getting started, the number on tip I have for any Spark problem is know your inputs and your outputs. This cannot be emphasized enough. If you know what data is coming in and what should be coming out, it will definitely help your thought process when piecing these problems together. The main idea for spark is to use mapping functions to prepare your data for some kind of reduction. The four major functions you will repeatedly see are • map() • flatMap() • reduce() • reduceByKey() You can read up on what each of these does, but in general, you would use map for a 1 to 1 mapping, flatMap for a 1 to many (0 or more) mapping, reduceByKey as a reduction for every matching key, and reduce as a way to merge everything together into a single output. This is a lot to take in, and I believe it is best explained with an example. Lets use a input dataset of strings [“HI”, “I”, “LOVE”, “61C”]. Lets say we wanted to count the total number of times each character appears. 1. We would first need to prepare the strings for some sort of reduction. In this case, we have each input as a string, so our mapping helper function signature might look something like mappingF unc(inputString). This goes back to knowing what your input is. 2. On the inside of the function, we should probably loop through the string and output a list of pairs with letter as the key and 1 as the value, something like [(“H”, 1), (“I”, 1)] for “HI”. why 1? We use 1 because we are counting

6

Midterm Guide

the number of times each character appears. For each character in the input string, we can count that character as appearing once. 3. We know that each string has multiple letters, so each input string should map to a list of (Key, Value) pairs. The contents of the list are the desired elements (not the whole list itself), indicating we probably should be using flatMap, which will join our output lists together. As a result of the two observations above, we will call .f latM ap(mappingF unc ) first. Next, we have to reduce them together. 1. Since we want to do it based on character (our key), we probably want to do a reduceByKey. 2. The reduceByKey helper function should only take in the values. This means our helper signature probably looks like reductionF unc(value1, value2) where value1 and value2 would be some intermediary count of the letter. 3. We also know that we have to output the same type as what was put in. If we are pairwise reducing elements and the first reduction did not output a type compatible with the original, then our next pairwise reduction would fail. 4. Since we are adding up counts, it is logical to return value1 + value2 in our reduction helper. This is consitent with points 2 and 3 above. Overall, our spark program looks like output = data.flatMap(mappingFunc) .reduceByKey(reductionFunc) with the helper functions as defined in the steps above. You would follow the same methodology if you were using map and reduce, except you would have different goals in mind. For example, if you wanted to find the average number of occurrences of all letters, maybe you would want to use reduce. This will reduce all elements together for all keys. What if you wanted to count the number of times the same word appears? Each input will map to a single (string, 1) output, so we can use map instead of flatMap here. This is the general methodology to solving any Spark problem. Figure out what your inputs and outputs are at each step and work with your data from there.

3 Parallelism We will study various types of parallelism in this course. This section will focus on something called Data Level Parallelism, implemented through SIMD registers. Data level parallelism is very straightforward: instead of doing register operations one word at a time, do it many words at a time! How does it accomplish this? It simply uses larger registers! For example, say I wanted to add two integer arrays together, both having 8 elements. Naively, this would require 8 add instructions; load each integer into a register and add them together using an add instruction. The problem here is that we’re limited to one integer a time.

Midterm Guide

7

Introducing intrinsics. Basically, there is a special C library you can import (depending if your CPU supports it) that allows for larger registers. The most common ones you will see are 4-word and 8-word registers. Let’s consider an example: adding together the elements of two arrays. Instead of loading in integers 1 at a time, adding them 1 at a time, and storing them 1 at a time, we can (assuming 4-word registers) load 4 integers in at once, add them all in parallel, and store those results at once. Ignoring overhead costs, this can theoretically speed up our computations by a factor of 4! Visually, it looks like this:

In lecture, you saw this applied to matrix multiplications. Fundamentally, nothing really changed, unlike with cache blocking in lab which changed our pattern of memory accesses. All we had to do was load in chunks of the matrix into large registers, do the multiplications in parallel, and store it back. It’s as straightforward as that. How does this affect your code? Well now, instead of loading in data one element at a time, just assume it does several at a time. A consequence of this, however, is that our iteration is different: when traversing an array, we no longer increment by 1; instead we increment by how big our new registers are! One question that naturally arises is, ”what if my array isn’t divisible by 4?” Do not fear, our good ol’ 1-word registers didn’t go anywhere, so we can just go back to using those for that edge case! The difficulty with SIMD questions are, similar to MapReduce, approaching the solution to a problem with a SIMD mindset. Often, we will give you a list of builtin ”vector” functions that operate on SIMD registers, and you should be able to use these seemlessly in your code to parallelize certain operations. Lab will provide an excellent exposure to this. Zooming out on the big picture, keep in mind the big concept regarding any type of parallelization: no matter how much we can speed up a certain portion by exploiting parallelism, we are only as fast as the portion we can’t speed up! Not everything can be solved by using bigger registers......