Skip to content
Jin Wang edited this page Mar 27, 2015 · 1 revision

Talks

By Ocelot Members

[1] Andrew Kerr, Rodrigo Dominguez, Sudhakar Yalamanchili, Gregory Diamos, Ocelot Tutorial PACT 2011

[2] Gregory Diamos, Dynamic Compilation for Massively Parallel Processors, Harvard CS264 2011

[3] Naila Farooqui, Andrew Kerr, Gregory Diamos, S. Yalamanchili, and K. Schwan, "A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot", GPGPU4 2011

[4] Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, Nathan Clark. A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems, PACT 2010

[5] Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, Modeling GPU-CPU Workloads and Systems, GPGPU3 2010

[6] Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, Ocelot: An Open Source Debugging and Compilation Framework for CUDA, GTC 2010

Talks Referencing Ocelot

Reference Papers

A list of references for Ocelot development.

Papers Influencing the Design of Ocelot

[1] Gregory Diamos, The Design and Implementation Ocelot's Dynamic Binary Translator from PTX to Multi-Core x86

[2] Gregory Diamos, Andrew Kerr, Mukil Kesavan, Translating GPU Binaries to Tiered SIMD Architectures with Ocelot

[3] Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, A Characterization and Analysis of GPGPU Kernels

[4] Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, A Characterization and Analysis of PTX Kernels

[5] Albert Claret Exojo, Design and Implementation of a PTX Emulation Library - PDF

[6] Sylvain Collange, David Defour, David Parello, Barra, a Modular Functional GPU Simulator for GPGPU

[7] Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163-174, Boston, MA, April 26-28, 2009.

[8] Naila Farooqui, Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, and Karsten Schwan A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot . Fourth Workshop on General-Purpose Computation on Graphics Procesing Units. March 5, 2011.

[9] Rodrigo Dominguez, Dana Schaa, and David Kaeli. Caracal: Dynamic Translation of Runtime Environments for GPUs. GPGPU-4

Papers Referencing Ocelot

[1] KE Østby, JL Aragón, JM Garcıa, M Ujaldón, FATSEA–An Architectural Simulator for General Purpose Computing on GPUs

[2] J Marathe, B Aarts, M Murphy, Z Hu, WW Hwu, Center for Reliable and High-Performance Computing, in Performance Computing, 2010

[3] Becchi, M., Byna, S., Cadambi, S., and Chakradhar, S. 2010. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures

[4] Stratton, J. A., Grover, V., Marathe, J., Aarts, B., Murphy, M., Hu, Z., and Hwu, W. W. 2010. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In Proceedings of the 8th Annual IEEE/ACM international Symposium on Code Generation and Optimization

[5] Rick Weber, Akila Gothandaraman, Robert J. Hinde, Gregory D. Peterson, "Comparing Hardware Accelerators in Scientific Applications: A Case Study," IEEE Transactions on Parallel and Distributed Systems, vol. 99, no. PrePrints, , 2010

[6] Hong, S. and Kim, H. 2010. An integrated GPU power and performance model. In Proceedings of the 37th Annual international Symposium on Computer Architecture

[7] Zhenyu Ye, Design Space Exploration for GPU-Based Architecture

[8] Cedric Nugteren Improving CUDA’s Compiler through the Visualization of Decoded GPU Binaries

[9] Bruno Coutinho, Diogo Sampaio, Fernando Mango Quintao Pereira, Wagner Meira. "Divergence Analysis and Optimizations." Parallel Architectures and Compilation Techniques. 2011.

Manuals

[1] PTX 1.2

[2] PTX 1.3

[3] PTX 1.4

[4] PTX 2.0

[5] OpenCL 1.0

Projects

[1] GPGPU-Sim

[2] Barra

[3] LLVM

Links

[1] GPGPU.org