HSA Programmer Reference: The Formation Of The New Specification – Heterogeneous System Architecture Foundation

HSA foundation was founded on June 12, 2012, and discussions among board members from founding companies commenced. We quickly reached a consensus to set the work of specifying HSA in motion, and to form the Programmer’s Reference Manual (PRM) working group as early as possible. We also strove to leverage the examples set by successful standard bodies. We picked Khronos as our role model. The first two meetings of the working group, held on Aug 24 and Aug 31 of 2012, produced a Statement of Work, a meeting format, a schedule and meeting frequency. The first working group of the young organization embarked on her journey. The atmosphere of the working group was extremely friendly and cooperative. In a couple of weeks, we were able to associate the voices with the names, thru some trials and errors, and, of course, friendly reminders. Our original plan was to complete the work in 9 weeks, so that we could submit the spec for ratification by the end of year 2012. Finishing the work in 9 weeks proved to be too ambitious. 3 additional months were needed to produce a version deemed ready for the public.

The ultimate mission of HSA is to advance Parallel Computing with GPU or any other kind of programmable devices, to the next level in terms of ease of programming and power efficiency. We needed to repeatedly remind ourselves to strike a balance between current state of the art, and forward-looking ideas beyond the current, conventional way of programming GPUs, or for that matter any SIMD style processors. Also by looking at use cases that do not yet exist in the market place, we needed to revisit some common themes in computing, such as precision, cache coherency, memory consistency again and again. The goal is to create a standard that is not only practical for wide industry-wise adoption, but also for future innovation and differentiation.

The PRM, or commonly referred to as the HSAIL (HSA Intermediate Language) spec, plays a central role for such a revolution in Parallel Computing. It provides a reference for HSAIL, which is intended to decouples software development from hardware one. One key and differentiating feature of HSAIL is that it is positioned as a virtual ISA for any programmable computing device participating in a HSA-compliant system. Programmers can assume that there is HSAIL virtual machine supporting HSAIL, and all practical concerns and issues regarding performance and power can be addressed with respects to such a “machine”. Hardware designers can build their HSA-compliant computing devices with a goal to execute HSAIL code, thru efficient Just-in-Time compilation, as close to the metal as possible. The HSAIL virtual machine is essentially a load/store architecture, supporting fundamental integer and floating point operations, branches, atomic operations, multimedia operations, and using a fixed size pool of registers. Additionally, the machine supports group memory, hierarchical synchronization primitives, and wavefronts which, though looks familiar to programmers of GPUs, could potentially be leveraged in non-GPU computing devices as well.

For middleware, library and compiler developers, HSAIL is a perfect target due to its low-level nature, and stability and universality compared to native hardware ISAs. They can invest in R&D on top of HSAIL, and be sure that they would get the return thru the HSAIL ecosystem. The application developers, can optimize their code manually in HSAIL, and/or leverage the third-party HSAIL development tools or environments, and be confident that the real-world performance and efficiency of the applications developed this way would match their expectations. Such an assurance is achieved thru hardware vendors striving to optimize their HSA-compliant devices for HSAIL. Since HSAIL defines a virtual machine, not a physical one, hardware companies can innovate and differentiate in their native ISAs and micro-architectures. One of the coolest things about HSAIL is that it can potentially enable an ecosystem in which advances in Parallel Computing can happen independently and synergistically between software and hardware companies.

Completing the task of releasing a spec within 6 months from a young foundation is truly an amazing feat. Although AMD provided an initial draft that was nearly complete in terms of features, many of these features required careful reexamination and re-specification. Foundation members sent their best architects to participate, with the mandate to give this work priority. Because of the high quality of collaboration, most issues were resolved through consensus. Only 2 issues had to be resolved by ballot. One ballot question decided whether we should treat FP64 as optional in the base profile. The second ballot question could not be avoided: it was the vote for ratification! Among issues resolved by consensus, naming and specifying the profiles was the most sticky. Due to different views on technology roadmaps, historical backgrounds and market positioning, the working group could not reach an agreement, and we asked the Board of Directors to arbitrate. And just as the US Supreme Court will sometimes return a case to the lower courts, the board sent the issue back to the working group! The working group reconsidered the issue and found a consensus.

The work of the PRM WG continues. There are many cross-group issues, for example, linkage, where the PRM WG plays a necessary role. Additionally, features continue to be examined and tuned. We have also turned our attention to enablement of implementation, by providing the HSAIL grammar and syntax in EBNF format. And we are correcting for consistency: the textual specification, programming examples, EBNF, and BRIG definitions are effectively four different ways of describing a feature.

As it happens, several participants are working in different working groups, and often considering the same issue, from the perspective of the PRM, then from the point of the view of system architecture, then in the context of the runtime, … We joked that we show split personalities when participating in different groups.

We have an outstanding team of processor, compiler and system architects. I am confident that what we have produced and will continue to produce will be superior to any proprietary solution. With such a great team, and the great companies behind it, I can proudly and confidently say that the future of Heterogeneous Parallel Computing is being shaped and defined here.
http://www.mediatek.com/_en/03_news/01-2_newsDetail.php?sn=1111&p=1
Chien-Ping Lu
Working Group Chair Programer Reference Manual
Sr. Director, Corporate Technology Office
MediaTek USA Inc.