Keynotes | 2014 International Symposium on Code Generation and Optimization

CGO Keynotes I

Monday, February 19th (8:30am-9:40am)

21st Century Computer Architecture

Mark D. Hill
Computer Sciences Department
University of Wisconsin-Madison

Abstract:
This talk has two parts. The first part will discuss possible directions for computer architecture research, including architecture as infrastructure, energy first, impact of new
technologies, and cross-layer opportunities. This part is based on a 2012 Computing Community Consortium (CCC) whitepaper effort led by Hill, as well as other recent National Academy and ISAT studies. See: http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf

The second part of the talk will discuss one or more examples of cross-layer research advocated in the first part. For example, our analysis shows that many “big-memory” server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory: up to 50% of execution time wasted. Via small changes to the operating system (Linux) and hardware (x86-64 MMU), this work reduces execution time these workloads waste to less than 0.5%. The key idea is to map
part of a process’s linear virtual address space with a new incarnation of segmentation, while providing compatibility by mapping the rest of the virtual address space with paging.

Biography: Mark D. Hill (http://www.cs.wisc.edu/~markhill) is the Gene M. Amdahl Professor of Computer Sciences and Electrical & Computer Engineering at the University of Wisconsin–Madison, where he also co-leads the Wisconsin Multifacet project. His research interests include parallel computer system design, memory system design, computer simulation, and transactional memory. He earned a PhD from University of California, Berkeley. He is an ACM Fellow, a Fellow of the IEEE, co-inventor on 30+ patents, and ACM SIGARCH Distinguished Service Award recipient. His accomplishments include teaching more than 1000 students, having 40 Ph.D. progeny so far, developing the 3C cache miss taxonomy (compulsory, capacity, and conflict), and co-developing “sequential consistency for data-race free” that serves as a foundation of the C++ and Java memory models.

CGO Keynotes II

Tuesday, February 18th (8:30am-9:40am)

Are scripting languages ready for mobile computing? / Slides /

Calin Cascaval
Qualcomm

Abstract:
Mobile devices are becoming the prevalent platform for personal computing, driven by the fact that they are extremely portable, always connected, and truly personal. This revolution is driving millions of programmers to develop mobile applications. In this talk, I will highlight the challenges that developers will have to face when developing for mobile: the platform is complex and constrained, requiring careful optimizations for power efficiency; the significant diversity of devices is creating a portability nightmare. Thus, developers are looking for tools to bridge the portability and performance gap, and in particular, have high expectations from web technologies, such as HTML and JavaScript. Are these technologies ready for the challenge?

Biography: Dr. Calin Cascaval is Sr. Director of Engineering at the Qualcomm Silicon Valley Research Center, where he is leading projects in the area of parallel software for mobile computing, including the Qualcomm MARE runtime (http://developer.qualcomm.com/mare) and the Zoomm Parallel Browser. Previously, he worked at the IBM TJ Watson Research Center, where he led projects on systems software, programming models, and compilers for a number of large scale parallel systems, including Blue Gene and PERCS. He led the implementation of the first UPC compiler to scale to hundreds of thousands of processors, and research into hardware and software for Transactional Memory and other parallel programming abstractions. He collaborates extensively with academia, has more than 50 peer-reviewed publications and more than 40 patent disclosures.

CGO Keynotes III

Wednesday, February 19th (8:30am-9:40am)

Heterogeneous computing – what does it mean for compiler research? / Slides /

Norm Rubin
NVIDIA

The current trend in computer architecture is to increase the number of cores, to create specialized types of cores within a single machine, and to network such machines together in very fluid web/cloud computing arrangements. Compilers have traditionally focused on optimizations to code that improve performance, but is that the right target to speed up real applications? Consider loading a web page (like starting GMAIL) the page is transferred to the client, any JavaScript is compiled, the JavaScript executes, and the page gets displayed. The classic compiler model (which was first developed in the late
50′s) was a great fit for single core machines but has fallen behind architecture, and language. For example how do you compile a single program for a machine that has both a CPU and a graphics coprocessor (a GPU) with a very different programming and memory model? Together with the changes in architecture there have been changes in programming languages. Dynamic languages are used more, static languages are used less. How does this effect compiler research?

In this talk, I’ll review a number of traditional compiler research challenges that have (or will) become burning issues and will describe some new problems areas that were not considered in the past. For example language specifications are large complex technical documents that are difficult for non-experts to follow. Application programmers are often not willing to read these documents; can a compiler bridge the gap?

Biography: Norm Rubin has over thirty years of experience delivering commercial compilers for processors ranging from embedded (ARM), desktop (HP, ALPHA) and supercomputer (KSR), and is a recognized expert in the field. He was the architect and lead implementer for the widely used graphics compiler for AMD/ATI. That compiler is currently shipping on millions of machines including cell phones, consoles, and PCs. Norm was part of the AMD architecture team that designed GCN (Graphics core next). He was the lead designer of HSAIL, the virtual machine used in the HSA system architecture. Around a year ago he moved to NVIDA Research where he is working in algorithms and future programming models. Lately Norm has been looking at extending JavaScript to use GPUS and heterogeneous devices. Norm is also a visiting scholar at Northeastern University.

Dr. Rubin holds a PhD from the Courant Institute of NYU. Besides his work in compilers and architecture, he is well known for his work in GPU systems, compiler related parts of the tool chain, binary translators and dynamic optimizers.

ODES workshop keynote

Energy Efficient Data Access Techniques

David Whalley, Florida State University

Abstract: Energy has become a first class design constraint for all types of processors. Data accesses contribute to processor energy usage and have been shown to account for up to 25% of the total energy used in embedded processors. Using a set-associative level-one data cache (L1 DC) organization is particularly energy inefficient as load operations access all L1 DC data arrays in parallel to reduce access latency even though the data can reside in at most one of the arrays. In this presentation I will describe three techniques we have developed to reduce the energy used for L1 data accesses without adversely affecting performance. The first technique avoids unnecessary loads from the L1 DC data arrays of set associative caches by speculatively accessing the L1 DC tag arrays earlier in the pipeline and only accessing the single L1 DC data array where there was a tag match. The second technique detects when a load operation will not cause a delay with a subsequent instruction and sequentially accesses the tag and data memories to also avoid unnecessary L1 DC data array accesses. The third technique provides a practical data filter cache design that not only significantly reduces data access energy usage, but also avoids the traditional execution time penalty associated with data filter caches. All of these techniques can easily be integrated into a conventional processor without requiring any ISA changes.

Biography: David Whalley received his PhD in CS from the University of Virginia in 1990. He is currently the E.P. Miles professor of the Computer Science Department at Florida State University and is an FSU Distinguished Research Professor. His research interests include low-level compiler optimizations, tools for supporting the development and maintenance of compilers, program performance evaluation tools, predicting execution time, computer architecture, and embedded systems. Some of the techniques that he developed for new compiler optimizations and diagnostic tools are currently being applied in industrial and academic compilers. His research is currently supported by the National Science Foundation.