Parallel Computing Research at Illinois The UPCRC Agenda PDF

Title Parallel Computing Research at Illinois The UPCRC Agenda
Author Josep Torrellas
Pages 321
File Size 20.7 MB
File Type PDF
Total Downloads 16
Total Views 42

Summary

Josep Torrellas (I2PC Director) Sarita V. Adve Vikram S. Adve Danny Dig Minh N. Do Maria Jesus Garzaran John C. Hart Thomas S. Huang Wen-mei W. Hwu (UPCRC Co-Director) Samuel T. King Darko Marinov Klara Nahrstedt David A. Padua Madhusudan Parthasarathy Sanjay J. Patel Marc Snir (UPCRC Co-Director) I...


Description

Josep Torrellas (I2PC Director) Sarita V. Adve Vikram S. Adve Danny Dig Minh N. Do Maria Jesus Garzaran John C. Hart Thomas S. Huang Wen-mei W. Hwu (UPCRC Co-Director) Samuel T. King Darko Marinov Klara Nahrstedt David A. Padua Madhusudan Parthasarathy Sanjay J. Patel Marc Snir (UPCRC Co-Director) Illinois Parallelism Center Department of Computer Science Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign September 2013 ILLINOIS-INTEL PARALLELISM CENTER (I2PC) http://i2pc.cs.illinois.edu/

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Contents

1

Introduction: The Illinois Research Agenda 1.1 Applications . . . . . . . . . . . . . . . 1.2 Software Development . . . . . . . . . 1.3 Correctness . . . . . . . . . . . . . . . 1.4 Multicore Architectures . . . . . . . . . 1.5 Training Efforts . . . . . . . . . . . . .

2

Parallelism Center Personnel

3

AvaScholar Instructor 3.1 Problem Addressed . . . . . . 3.2 Contributions . . . . . . . . . 3.3 Lessons Learned . . . . . . . 3.4 Future Work . . . . . . . . . . 3.5 Key Papers and Other Material

4

5

6

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4 5 5 7 8 9 10

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

11 11 12 14 14 14

AvaScholar Student 4.1 Problem Addressed . . . . . . . . . . . . . . . . 4.2 Contributions . . . . . . . . . . . . . . . . . . . 4.2.1 3-D Face Modeling . . . . . . . . . . . . 4.2.2 Performance Driven Avatar . . . . . . . . 4.2.3 Attention Detection . . . . . . . . . . . . 4.2.4 Mobile-Cloudlet Design Framework . . . 4.2.5 Appearance-Based Emotion Recognition 4.3 Lessons Learned . . . . . . . . . . . . . . . . . 4.4 Future Work . . . . . . . . . . . . . . . . . . . . 4.5 Key Papers and Other Material . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

15 15 17 17 17 17 18 19 19 20 20

Parallel Web Browser 5.1 Problem Addressed . . . . . . 5.2 Contributions . . . . . . . . . 5.2.1 Results . . . . . . . . 5.3 Lessons Learned . . . . . . . 5.4 Future Work . . . . . . . . . . 5.5 Key Papers and Other Material

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

21 21 21 23 23 23 23

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

Refactoring

. . . . .

. . . . . .

. . . . . .

24 1

CONTENTS 6.1 6.2

2 . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

24 25 25 26 27 27 28 28

Tiling: Notations and Optimization Techniques 7.1 Problem Addressed . . . . . . . . . . . . . . . . . . 7.2 Contributions . . . . . . . . . . . . . . . . . . . . . 7.2.1 Programming with Tiles . . . . . . . . . . . 7.2.2 HYDRA: Automatic Tuning from Equations 7.2.3 Automatic Selection of Block Shapes . . . . 7.3 Lessons Learned . . . . . . . . . . . . . . . . . . . 7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . 7.5 Key Papers and Other Material . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

29 29 29 29 31 32 32 33 33

Deterministic-by-default Parallel Programming 8.1 Problem Addressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Deterministic Parallel Java: Strong Static Safety Guarantees . . . . . . 8.2.2 Tasks With Effects: Supporting Flexible Concurrent Programs . . . . . 8.2.3 Logical annotations for safe parallelism using Accord . . . . . . . . . . 8.2.4 Annotations for Safe Parallelism: Scaling to Large Production Software 8.2.5 Automatic Inference of Region and Effect Annotations . . . . . . . . . 8.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Key Papers and Other Material . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

34 34 34 35 36 37 37 38 38 38 38

MCUDA: CUDA for Multicores 9.1 Problem Addressed . . . . . . 9.2 Contributions . . . . . . . . . 9.3 Lessons Learned . . . . . . . 9.4 Future Work . . . . . . . . . . 9.5 Key Papers and Other Material

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

39 39 39 42 42 42

10 Scheduling for Energy Efficiency 10.1 Problem Addressed . . . . . . 10.2 Contributions . . . . . . . . . 10.2.1 Scheduling . . . . . . 10.2.2 Experimental Results . 10.3 Lessons Learned . . . . . . . 10.4 Future Work . . . . . . . . . . 10.5 Key Papers and Other Material

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

43 43 43 43 44 45 46 46

6.3 6.4 6.5 7

8

9

Problem Addressed . . . . . . . . . . . . . . . . Contributions . . . . . . . . . . . . . . . . . . . 6.2.1 Empirical Studies with Actionable Items 6.2.2 Refactoring and Analysis Tools . . . . . 6.2.3 Dissemination of Results . . . . . . . . . Lessons Learned . . . . . . . . . . . . . . . . . Future Work . . . . . . . . . . . . . . . . . . . . Key Papers and Other Material . . . . . . . . . .

. . . . . . . .

CONTENTS

3

11 Verification and Testing Advances 11.1 Problem Addressed . . . . . . . . . . . . . . . . . . . . . 11.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 General Testing of Multithreaded Code . . . . . . 11.2.2 Predictive Testing and the P ENELOPE Framework 11.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . 11.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Key Papers and Other Material . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

47 47 48 48 49 49 50 50

12 Record&Replay and Debugging Architectures 12.1 Problem Addressed . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 The QuickRec Prototype . . . . . . . . . . . . . . . . . . 12.2.2 Additional R&R Architectures Designed . . . . . . . . . 12.2.3 Architectures for Detecting & Avoiding Concurrency Bugs 12.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Key Papers and Other Material . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

51 51 51 52 53 53 55 55 55

13 The Bulk Multicore Architecture for Programmability 13.1 Problem Addressed . . . . . . . . . . . . . . . . . . 13.2 Contributions . . . . . . . . . . . . . . . . . . . . . 13.2.1 Basic Bulk Architecture . . . . . . . . . . . 13.2.2 Improving Bulk Scalability and Usability . . 13.2.3 The Bulk Compilation Support . . . . . . . . 13.3 Lessons Learned . . . . . . . . . . . . . . . . . . . 13.4 Future Work . . . . . . . . . . . . . . . . . . . . . . 13.5 Key Papers and Other Material . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

56 56 56 57 59 59 60 60 60

. . . . . . . . . .

61 61 61 61 62 62 64 65 65 65 65

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

14 DeNovo: Rethinking Memory Systems for Disciplined Parallelism 14.1 Problem Addressed . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Disciplined Shared-Memory Software . . . . . . . . . . 14.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 DeNovo Architecture for Deterministic Codes . . . . . . 14.2.4 Beyond Deterministic Codes . . . . . . . . . . . . . . . 14.2.5 Heterogeneous systems . . . . . . . . . . . . . . . . . . 14.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Key Papers and Other Material . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

15 Concluding Remarks

66

16 Key Papers

79

Chapter 1

Introduction: The Illinois Research Agenda For many decades, the microprocessor industry has seen a steady growth in CPU performance, driven by Moore’s Law [113] and Dennard scaling [42]. Unfortunately, as feature size decreased below 130nm over a decade ago, Dennard scaling ceased to apply, as static power became significant and voltage could not be decreased as fast as before. To keep power consumption in check, designers stopped increasing the clock rate and started to integrate multiple processors in one chip [41]. This technology shift has had major software implications. Before, single-threaded applications would see their performance increase over successive microprocessor generations with little or no need for software changes. Now, the performance of an application improves only if it can use an increasing number of concurrent threads. The problem is particularly acute for client (i.e., desktop and mobile) workloads which, unlike server ones, are turnaround-oriented — parallelism is used to reduce response time or handle a more complex problem. This is a difficult programming problem because it requires parallelization of many compute-intensive algorithms, often with fine-grain sharing of complex data structures. A key question is whether today’s multicore parallel computing context is fundamentally different from the traditional high-performance parallel computing context. There are, in fact, two fundamental differences: the importance of productivity and the market size. First, applications for desktop and mobile devices are developed under enormous competitive pressure to minimize time-to-market and enhance functionality, leaving less developer time for performance-oriented goals like parallelization. In this context, maximizing developer productivity becomes vital: application teams are willing to accept moderate speedups at low developer cost rather than invest the time to maximize speedups. Second, the client computing market is ten to a hundred times larger than the high-performance one. This justifies far greater investments by industry, which in turn can enable many high-level and specialized languages, libraries, frameworks, tools, and architectures addressing different subsets of the market and aiming at improving programmer productivity. If, however, client applications do not leverage parallelism, then users will see no performance improvement as they buy a more powerful processor. They will have no incentive to upgrade their systems, and an industry strongly dependent on a continuous demand for increasing performance will be threatened. This is the problem addressed by the Illinois Parallelism Center, through the Universal Parallel Computing Research Center (UPCRC) funded by Intel and Microsoft during 2008-2010, and the Illinois-Intel Parallelism Center (I2PC) funded by Intel during 2011-2013. The Center focused on three questions: • What applications will require the increasing performance that parallelism can bring to client processors? • What programming models and tools will facilitate productive development of such applications? • What computer architectures will leverage most efficiently the many cores that future manycores may have? This book summarizes the research results of the Illinois Parallelism Center, and includes a few key papers resulting from the research.

4

CHAPTER 1. INTRODUCTION: THE ILLINOIS RESEARCH AGENDA

1.1

5

Applications

In a world that increasingly relies on technology to facilitate interpersonal communication, we envision the killer client applications of the near future to be those that require high-quality, interactive tele-immersive environments with significant amount of local processing. In the AvaScholar project described in Chapters 3 and 4, PIs John Hart, Minh Do, Thomas Huang, Sanjay Patel and Klara Nahrstedt study such an application: an educational environment where an online instructor uses her hands to interact with real and virtual 3-D visual aids while, in real time, g...


Similar Free PDFs