HSA Foundation Members Preview Plans for Heterogeneous Platforms

SANTA CLARA, CA, Oct. 6, 2015 – The Heterogeneous System Architecture (HSA) Foundation today previewed several of its members’ plans for supporting HSA in their next-generation products. Products from AMD, ARM, Imagination Technologies and MediaTek will be the world’s first that are based on HSA, a standardized platform design that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices.
HSA allows developers to easily and efficiently apply the hardware resources in today’s complex systems-on-chips (SoCs). It will enable applications to run faster and at lower power across the range of computing platforms spanning mobile devices, desktops, high-performance computing (HPC) systems and servers.
Following the release of the v1.0 specification in March 2015, conformance tests are now available online to members who are testing their platforms in preparation for HSA certification. To support these products, HSA Foundation working groups are standardizing tools and APIs for debug and profiling, creating guidelines for incorporating IP from multiple vendors into the same SoC, and much more.
“These are exciting times for the industry as companies prepare to introduce the first HSA products,” said Greg Stoner, managing director of the HSA Foundation, and senior director, computing technology for AMD. “As we draw ever closer to pervasive adoption of heterogeneous computing, devices will be able to run applications at much higher performance and lower power, providing an opportunity for developers to create incredible new applications across computing platforms.”
“After the HSA’s successful release of the v1.0 specification in March 2015, the organization went to work on developing conformance tests,” said Dr. Jon Peddie of Jon Peddie Research. “Conformance testing is critical to a meaningful HSA certification, and now that is in place too. This firmly and permanently establishes the organization’s place in the industry.”
HSA Foundation members lay out their plans
AMD recently launched the world’s first processors designed to support the full set of HSA features with their SoC products targeting the desktop and laptop PC markets. “AMD is thrilled to be amongst the first companies shipping products designed to fully support the HSA Foundation standards with the introduction of the 6th generation A-series processor (code-named “Carrizo”),” said Stoner. “We see HSA as the right technical direction for the industry to fully utilize the capabilities of modern SoCs to deliver improved performance, power utilization and programmability.”
“As a founding member of the HSA foundation, ARM has worked with our fellow members to develop specifications that enable hardware and software to take advantage of both CPU and GPU compute,” said Jem Davies, vice president of technology, media processing group, ARM. “ARM is actively developing CPU, GPU and interconnect IP with energy efficiency and full system coherency as guiding design principles while extending the system capabilities aligned with HSA coherency standards.”
Imagination is planning a staged rollout of HSA across its processors starting in 2016. This includes MIPS I-class and P-class CPUs, PowerVR GPUs and HSA compliant fabric solutions. According to Peter McGuinness, director of multimedia technology marketing for Imagination, “Because it provides a consistent programming model and enables efficient execution on CPUs, GPUs and beyond, HSA is an important standard for future SoCs. Imagination has played a key role in developing the HSA specifications as a founder member of the HSA Foundation. HSA holds the promise of enabling developers to write software that makes the most of our future heterogeneous platforms targeting a range of devices including mobile and tablets, vision systems, automotive, and more.”
MediaTek is working with partners in developing HSA features on mobile SoCs. The company is already receiving interest in HSA from customers, and is on track to deliver HSA features in mobile SoC products in phases. “MediaTek is a firm believer in the value of heterogeneous computing and a strong supporter of the good work of the HSA Foundation. We are working to leverage this technology into our products to provide even better end user experience,” said Giri Amarakone, senior director, marketing and business development, MediaTek.
General Processor Technologies (GPT) is sponsoring an open source project to expand HSA tools support to the GNU Compiler Collection (GCC) by enabling HSA Intermediate Language (HSAIL) binary format (BRIG) translation for GCC. “We’re delighted to be involved in creating a foundational tools ecosystem for HSA. Through the project we’re sponsoring, heterogeneous processors may benefit from kernel agent support and the vector/SIMD optimizations in GCC,” said Dr. John Glossner, CEO of General Processor Technologies.
As companies roll out their HSA platforms, hardware and software learnings will be quickly integrated into the HSA specifications. The v1.1 specification will be available in the first quarter of 2016 and will be backward compatible with v1.0.
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
##

HSA Foundation Launches New Era of Pervasive, Energy-Efficient Computing with HSA 1.0 Specification Release

SAN JOSE, California, March 16, 2015 – The Heterogeneous System Architecture (HSA) Foundationtodayannounced a major milestone with its release of the 1.0 HSA specification, which brings the technology industry one step closer to true heterogeneous computing on platforms spanning mobile devices, desktops, high-performance computing (HPC) systems and servers.
HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources in today’s complex systems-on-chip (SOCs).
“Through HSA, we are working to ensure that end users of technology live in a world of new, incredible applications that run fast at low power,” said Phil Rogers, president of the HSA Foundation. “The Foundation members have been collaborating on this project since we joined together in June 2012, and we are thrilled to be delivering the fruit of that labor today.”
The newly-approved specification comprises the key elements that improve the programmability of heterogeneous processors, the portability of programming code and interoperability across different vendor devices. These include:

  • The HSA System Architecture Specification, which defines how the hardware operates;
  • The HSA Programmers Reference Manual (PRM), which targets the software ecosystem, tool and compiler developers;
  • The HSA Runtime Specification, which defines how applications interact with HSA platforms.

“HSA specification 1.0 includes several crucial features for efficient implementation of productive high-level languages, such as C++, Java and Python on heterogeneous computing hardware,” said Professor Wen-Mei Hwu, CTO, Multicoreware, and Professor, Computer Engineering, UIUC. “Such enhancement of programmability will make the benefit of heterogeneous computing available to mainstream, mobile and server applications.”
“HSA has been remarkably well accepted and supported,” added Jon Peddie, who heads Jon Peddie Research, a computer graphics market research and management consulting firm. “The specification has answered an obvious need in the industry, which is reflected in its growing membership.”
“Release of the new specification should help improve more power efficient computing performance across a wide array of computing platforms,” said Patrick Moorhead, who leads market research firm Moor Insights & Strategy. “I anticipate a lot of interesting use cases, from video chat apps and search to TV shows and movies. App developers should also find it easier to harness all of the processors together.”
The specification was officially launched today during the HSA 1.0 launch event held at the Fairmont Hotel in San Jose, California. The event featured a panel discussion among HSA Foundation board members, including AMD, ARM, Imagination Technologies, LG, MediaTek, Qualcomm and Samsung. A developer panel of industry luminaries discussing software, the ecosystem and applications in the mobile, PC and HPC computing was also featured.
Additional Resources:

Supporting Quotes
AMD
“HSA 1.0 is an idea whose time has come. It gives developers easier access to the power-efficient performance on today’s rich SoCs than ever before, freeing them to find creative solutions to compute’s toughest challenges. AMD intends to bring processors which incorporate the architecture described in the specification to market in 2015 and help lead the industry into the new era of heterogeneous computing.”
–Manju Hegde, corporate vice president, AMD
ARM
“Heterogeneous computing is playing an increasing role in system design. HSA systems will enable energy-efficient interoperation between multiple processor types to take full advantage of next-generation SoCs.”
Jem Davies, vice president of technology, media processing group, ARM
Imagination Technologies
“The future of computing will be based around heterogeneous platforms, and software APIs will be essential in their creation. As a co-founder of the HSA Foundation, Imagination is pleased to have played a key role in developing the new specifications. These specifications will enable interoperability across devices, and will let developers write software that makes the most of future coherent heterogeneous hardware platforms that include our PowerVR GPUs and MIPS CPUs.”
– John Min, director of processor technology marketing, Imagination Technologies
LG Electronics
“HSA will address the current needs of efficient computing, enabling consumers to take full benefit of maximizing the overall performance in their smart devices. We are looking forward to enhancing our SoC technologies in the partnership with HSA.”
SJ Choi, senior vice president, LG Electronics
MediaTek
“MediaTek has been leveraging heterogeneous computing resources available in SoCs, and was one of the first to productize mainstream heterogeneous applications, including 2D-to-3D, video face beautifier, video stabilization in MT6589 and stereo camera features in MT6785. HSA allows us to move to the next step of heterogeneous computing with the ease of conventional programming and superior power efficiency.”
Giri Amarakone, senior director, marketing and business development, MediaTek
Qualcomm
“Qualcomm Technologies Inc. is developing new, low power, heterogeneous computing technologies for Qualcomm® Hexagon TM DSP, Qualcomm® AdrenoTM GPU and custom CPU micro architectures. We believe that application developers for mobile and “Internet of Everything” devices can deliver innovative experiences on Qualcomm® Snapdragon TM processors if certain aspects of heterogeneous computing are standardized. Together with operating system companies and various standards committees including the HSA Foundation, of which QTI is a founding member, we are collaborating with many industry players to help define open standards that are beneficial for these types of new opportunities.”
Tim Leland, vice president of product management, Qualcomm Technologies Inc.
Samsung
“Samsung is pursuing the best products in the world, such as application processors and smart phones for the mobile market. Heterogeneous system architecture is a good candidate for building efficient systems and the release of the 1.0 HSA specification will help Samsung achieve its goals in a more efficient way.”
Jay Kim, vice president, Samsung Electronics
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com
 

HSA System Architecture, HSA Programer Reference Manual, HSA System Runtime Specifications 1.0 Provisional are Now Available

 The three core HSA Foundation specifications are available for public review.  Also in addition a sample runtime, compiler, and driver are available for to interact for your review of the specifications.

HSA Foundation recently ratified and released  the three main HSA specifications:

  1. HSA Platform System Architecture Specification: Defines the requirements for shared virtual memory, platform coherency, signaling, queuing mechanics and packet formats, context switching, and the HSA memory model.
  2. HSA Programmer’s Reference Manual: Contains the HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and BRIG (the “HSAIL” compiler intermediate language) object format.
  3. HSA Runtime Programmer’s Reference Manual: Defines the APIs in the HSA Runtime used for tasks cuh as initialization and device discovery, queue creation, and memory management. These specifications are at the “1.0 Provisional” Level and are available from the HSA Foundation web site here (http://www.hsafoundation.com/standards/).

AMD is also supplying early implementation to test out capabilities of HSA

The project provides an initial implementation of the HSA specifications on the AMD “Kaveri” silicon a pre-HSA Compatible part. The implementation includes a Linux kernel and associated kernel-level drivers, the HSA runtime, and the HSAIL finalizer. The project includes a reference LLVM-based compiler which generates HSAIL and can extended to add additional languages that support HSAIL-based compute. The project also includes tools for assembling and disassembling HSAIL and for compiling OpenCL 2.0  kernels into HSAIL. Finally, the project includes an approachable runtime layer called “OKRA” designed to minimize the time required to get started with HSA.  You can access these at https://github.com/HSAFoundation

Who should use this project?

The project is aimed at:

  • Compiler and language developers who want to add parallel acceleration to a high-level language.
  • Programmers who want to leverage features of HSA such as shared-virtual-memory, platform atomics, user-level queues, and signals.

Next steps:

For an overall view of the different components of the project, see the list here.
For information regarding target platforms and installation instructions for the HSA drivers and user-mode libraries, see “HSA Platforms & Installation”.
Compiler and programming language developers should investigate the HSAIL Compiler Writers SDK.
Programmers interested in developing code that uses HSA features should investigate these projects:

(1) OpenCL and the OpenCL logo are trademarks of Apple, Inc. and used by permission of Khronos.

AMD Announces Heterogeneous C++ AMP Language for Developers

First Open Source C++ Implementation to See Broad Availability Across Linux, Windows and Other Platforms
SUNNYVALE, CA, Aug 26, 2014 (Marketwired via COMTEX) — AMD AMD, +0.48% in collaboration with Microsoft(R) MSFT, -0.28% today announced the release of C++ AMP version 1.2 — an open source C++ compiler which implements version 1.2 of the open specification for C++ AMP, available on both Linux and Windows for the first time. The release represents another step forward toward AMD’s goal of supporting cross-platform solutions, multiple programming languages and continued contributions to the open source community. The tool, which leverages Clang and LLVM, accelerates productivity and ease of use for developers wishing to harness the full power of modern heterogeneous platforms spanning servers, PCs and handheld devices.
“AMD has a consistent track record of enriching the developer experience, and we’re proud to make the first open source implementation of C++ AMP available to enable greater performance and more power-efficient applications,” said Manju Hegde, corporate vice president, Heterogeneous Applications and Solutions, AMD. “The cross-platform release is another step in strengthening AMD’s developer solutions, allowing for increased productivity and accelerated applications through shared physical memory across the CPU and GPU on both Linux and Windows.”
“AMD continues to deliver excellent developer tools for heterogeneous programming. Partnering with AMD to deliver C++ AMP to the Linux and Open Source communities was a natural step for Microsoft as we work to improve the performance and developer experience on modern computing platforms,” said S. Somasegar, corporate vice president of the Developer Division at Microsoft.
C++ AMP version 1.2 enables C++ developers to accelerate applications across a broad set of hardware and software configurations by supporting three outputs:
— Khronos Group OpenCL(1), supporting AMD CPU/APU/GPU, Intel CPU/APU, NVIDIA GPU, Apple Mac OS X and other OpenCL compliant platforms; — Khronos Group SPIR, supporting AMD CPU/APU/GPU, Intel CPU/APU and future SPIR compliant platforms; and — HSA Foundation HSAIL, supporting AMD APU and future HSA compliant platforms.
Akey performance feature of version 1.2 of the open source C++ AMP specification is support for shared physical memory, which greatly simplifies sharing of data between the CPU and GPU on heterogeneous platforms. Heterogeneous platforms built on the new spec allow programmers to benefit from minimized overhead of expensive data copies and pointer updates when accelerating applications.
Supporting Resources
— Access latest C++ AMP compiler source code here — View the Open C++ AMP specification version 1.2 here — For more information about Clang and LLVM, visit their website.
About AMD AMD AMD, +0.48% designs and integrates technology that powers millions of intelligent devices, including personal computers, tablets, game consoles and cloud servers that define the new era of surround computing. AMD solutions enable people everywhere to realize the full potential of their favorite devices and applications to push the boundaries of what is possible. For more information, visit www.amd.com.
(1) OpenCL and the OpenCL logo are trademarks of Apple, Inc. and used by permission of Khronos.
Contact: Kristen Lisa AMD Public Relations (512) 602-6020 kristen.lisa@amd.com
SOURCE: Advanced Micro Devices

Bringing C++AMP Beyond Windows via CLANG and LLVM

We are happy to report after some great work of MultiCoreWare in conduction with support from AMD and Microsoft today we are releasing a  C++ AMP compiler based on CLANG/LLVM so we can bring C++ AMP  to multiple platforms.  We want to bring this out early so we could work with the community to make sure we get there input prior to make this 1.0.  So we calling all developers who are looking for heterogeneous C++ compiler to help with finding bug, driving feature, creating optimization as well building applications and libraries drive new class of applications.
You can get access to the compiler at the Bitbucket repository link:
https://bitbucket.org/multicoreware/cppamp-driver/
We also have Samples:
https://bitbucket.org/multicoreware/cxxamp_sandbox
FEATURES:
* Compiles C++AMP to OpenCL C  and Khronos Group Provisional SPIR 1.2 for Linux. Works across major GPU platforms.
* Leverages GMAC for CPU-GPU synchronization on non-HSA GPUs.
TODOs/Ongoing works:
* Fix SPIR code generation issue. Right now system headers do not flow thru SPIR path and that causes host code to fail compilation.
* HSAIL code generation and HSA-optimized layout
* Passing MS C++AMP conformance suite
* Async API
* Better address space support — right now small changes to user code are required when taking/passing a pointer to local memory buffers. See samples for details.
* Merge into official Clang main line
 
Remember C++ AMP already has rich set of libraries which Microsoft has released under Apache License.

  1. C++ AMP Algorithms Library (STL-style Algorithms)
  2. C++ AMP RNG Library (Random Number Generator)
  3. C++ AMP FFT Library (Fast Fourier Transform)
  4. C++ AMP BLAS Library (Basic Linear Algebra Subroutines)
  5. C++ AMP LAPACK Library (Linear Algebra Package)

Asymmetric Multiprocessing with Heterogeneous Architectures: Use the Best Tool for the Job

Asymmetric Multiprocessing with Heterogeneous Architectures: Use the Best Tool for the Job   Featured
Contributor: Arteris SA
 Printer friendly
 E-Mail Item URL

September 6,2013 — Often, the term “multiprocessing” is associated with tightly-coupled symmetric multiprocessing (SMP) architectures, due in large part to SMP’s prevalence in high-performance computing, x86/x64 servers, and PCs. Unfortunately, SMP’s incremental performance scaling for most applications decreases significantly with increasing numbers of cores. This lack of scalability has prompted many processor companies to avoid purely SMP solutions for their mobile and consumer electronics applications. Instead, they have implemented asymmetric multiprocessing (AMP) architectures to make more efficient use of silicon.An example of AMP is a mobile phone’s modem baseband SOC, containing an ARM processor and a DSP to handle control and signal processing, respectively. AMP architectures are also found in mobile phone application processors, which have multiple CPU cores and separate discrete graphics cores, video cores, audio cores and imaging cores. Heterogeneous architectures also dominate in most embedded consumer applications, such as digital TVs, set-top boxes, and automotive infotainment.
 
 

Figure 1. The Qualcomm Snapdragon 800 is an example of system-on-chip that implements an asymmetric processing (AMP) architecture with multiple processing units optimized for different functions. Source: Qualcomm.

 

Heat and power drive architecture decisions

Mobile applications face significant design constraints because of battery size and heat dissipation. As a result, processor designers are forced to use “the best core for the job.” So architectures in mobility have always been created from a baseline expectation of heterogeneous core AMP.
Server and PC chips have relatively unlimited power consumption and heat dissipation capabilities, making an SMP architecture tolerable. In these applications, it is often easier to add more cores of the same type, connect them using cache coherency, and reuse the legacy software to run on top. Comparatively little attention has been paid to heat dissipation and power consumption.
But PCs are becoming smaller and mobile. And server farms are eyeing power consumption as well, forcing designers to reconsider SMP architectures. For example, for server farms that power the likes of Google and Facebook, power consumption and heat dissipation have become huge cost and environmental issues. And in the PC space, we have run into a “gigahertz wall” where the only way to have a step function increase in performance is to have different cores optimized for different workload types.

AMP architectures struggle to break into PC/server applications

Why don’t AMP architectures dominate PC and server applications? Because it’s hard to implement!
In mobile designs, each heterogeneous processing core, whether graphics, audio, DSP, etc., typically has a custom firmware and software stack associated with it. This software must be integrated to communicate with the CPU cores’ operating system, requiring coding work in the OS hardware abstraction layer and drivers. In addition, these heterogeneous cores do not have a single view of system memory, so complicated synchronization schemes are usually implemented in hardware and software. Context switching and preemption are difficult to implement. Adding to the challenge, each of these cores requires an expert programmer, conversant in a particular core’s instruction set and tool chains, to code it.
These barriers have forced AMP to remain in the mobile and consumer electronics realm, which is closed to low-level, close-to-the-hardware software developers. Alternatively, SMP has flourished in the wide-open world of PCs and servers, aided by the ease of programming.
Heterogeneous system architectures (HSA) can span the chasm between mobile/ consumer applications and PC/ server applications, easing the design burden while delivering performance, scalability, improved heat dissipation and reduced power consumption.
Recently, a number of companies, including AMD, ARM, Imagination, MediaTek, Qualcomm, Samsung and Texas Instruments, founded the HSA Foundation. HSA defines interfaces for parallel computation utilizing CPU, GPU, and other programmable and fixed-function devices, and support for a diverse set of high-level programming languages, thereby creating the next foundation in general-purpose computing.
Its goals are to:

  • Make heterogeneous programming easy and a first-class pervasive complement to CPU computing.
  • Continue to increase the power efficiency of heterogeneous systems (AMP), keeping it the platform of choice from smartphones to the cloud.
  • Bring to market strong development solutions (tools, libraries, OS run-times) to drive innovative advanced content and applications.
  • Foster growth of heterogeneous computing talent through HSA developer training and academic programs to drive both learning and innovation.

The HSA approach requires a technical framework and architecture

There are several issues that must be addressed to successfully bring these two worlds together:

  • Unified programming model – Today, CPU and GPU (or other accelerator) cores are programmed separately, with the accelerator treated as a remote processor. To make the maximum use of hardware resources while balancing ease of programming, heterogeneous architectures should allow developers to target the CPU or GPU by writing in task-parallel languages, like the ones they use today when writing for multicore CPUs.
  • Unified address space – HSA supports virtual address translation amongst the heterogeneous cores with an HSA-specific memory management unit (HMMU). HSA compute engines will use the same pageable virtual address space as used by CPUs today.
  • Queuing – CPUs, GPUs and other cores can queue tasks to each other and to themselves through an HSA run-time. Queuing can be managed in hardware to avoid OS system calls and enable very low latency communication between cores.
  • Preemption and context switching – HSA enables job preemption, job scheduling and fault handling capabilities to overcome potential problems created by rogue or faulted processes.

HSA Foundation provides key tools for unlocking heterogeneous programming

Today, CPUs and GPUs do not share a common view of system memory, requiring an application to explicitly copy data between the two devices. In addition, an application running on the CPU that wants to add work to the GPU’s queue must execute system calls that communicate through the CPU operating system’s device driver stack, and then communicate with a separate scheduler that manages the GPU’s work. This adds significant run-time latency, in addition to being very difficult to program.
HSA addresses the need for easy software programming of GPUs to take advantage of their unique capability to crunch parallel workloads much more efficiently than x86 or ARM CPUs.

HSA solution stack: Abstracting away hardware specifics

To enable easier programming, HSA allows developers to program at a higher abstraction level using mainstream programming languages and additional libraries. This HSA solution stack includes several components.
The key to enabling one language for heterogeneous core programming is to have an intermediate run-time layer that abstracts hardware specifics away from the software developer, leaving the hardware-specific coding to be done once by the hardware vendor or IP provider. The core of this intermediate layer is the HSA Intermediate Language or “HSAIL.”
 
 

Figure 2. The HSA Intermediate Language (HSAIL) is an intermediate run-time layer that abstracts hardware specifics away from the software developer. Source: AMD.

 
The HSA run-time stack is created by compiling a high-level language such as C++ with the HSA compilation stack. HSA’s compilation stack is based on the LLVM infrastructure, which is also used inOpenCL from the Khronos Group.
Creation of HSAIL can occur prior to run-time or during run-time. Here are two examples: The OpenCL run-time includes the compiler stack and is called at run-time to execute a program that is already in data-parallel form. Alternatively, Microsoft’s C++ AMP (C++ Accelerated Massive Parallelism) uses the compiler stack during program compilation rather than execution. The C++ AMP compiler extracts data-parallel code sections and runs them through the HSA compiler stack, and passes non-parallel code through the normal compilation path.
Figure 3 shows the HSA compilation stack, where programming code is compiled into HSAIL using the LLVM compilation infrastructure:
 
 

Figure 3. The HSA compilation stack creates the HSA Intermediate Language (HSAIL) prior to or during run-time. Source: AMD.

 

The hardware-specific HSA Finalizer is a key component

A key role is played by the hardware-specific “finalizer” which converts HSAIL to the computing unit’s native instruction set. Hardware and IP vendors are responsible for creating finalizers that support their hardware. The finalizer is lightweight and can be run at compile time, installation time or run-time depending on requirements.
Figure 4 shows the HSAIL and its path through the HSA run-time stack:
 
 

Figure 4. The hardware-specific components of the HSA run-time stack are the HSA Finalizer and the hardware driver. Source: AMD.

 
The HSA Finalizer is the point at which the specifics of different heterogeneous computing units are addressed. Initial HSA implementations will most likely support GPU compute with finalizers from GPU vendors such as AMD, Imagination, ARM, and Qualcomm. The quality and features of each vendor’s HSA Finalizer will help determine how software developers take advantage of each hardware element’s computing capabilities.

Benefiting from heterogeneous architectures requires smart scheduling

In addition to GPUs, many existing heterogeneous architectures have additional discrete processing units for functions such as audio (digital signal processing or stream processing), image and video processing (SIMD frame processing), and security. As HSA matures, hardware and IP vendors creating these processing units may want to enable HSA programmability on their hardware by creating hardware-specific finalizers.
Having multiple heterogeneous processing units will complicate workload scheduling from a system perspective. The harsh reality is that existing workload scheduling and OS scheduling algorithms are relatively simple and generally only take into account local activity on a processing unit or a cluster of homogeneous processing units (see the Linux Completely Fair Scheduler for one example of how scheduling is implemented: ).

Interconnect fabric-assisted scheduling is required to implement scalable HSA systems

Existing OS and middleware scheduling algorithms do not take into account the existing traffic throughout the system, nor a view into other processing units. This lack of a global perspective for scheduling virtually guarantees there will be contention and stalling as processing units wait for access to precious system resources, especially the DRAM. It’s like looking out the front door of your house to determine how bad the traffic will be on your commute to work: You are missing very relevant information that could help you determine the optimal route to take.
Probing current run-time data flows at critical points throughout a system’s SOC interconnect fabric can provide critical information to enhance workload scheduling. This information can then be used to assign priorities to workloads, and workloads to processing units. These priorities and assignments can be optimized based on performance requirements or power consumption requirements, as required for a particular use case. As heterogeneous processing becomes the norm, and more processing units are added to a system, this type of interconnect-assisted scheduling will be required.
In other words, the hardware interconnect is a key enabler to putting the heterogeneous into HSA.

Resources

For more guidance on heterogeneous system architectures, visit the HSA Foundation or the Arteriswebsites.
Heterogeneous System Architecture: A Technical Review” whitepaper by George Kyriazis, (AMD), HSA Foundation, August, 2012.
The HSA Compilation and Run-time Stack diagrams are from the whitepaper by George Kyriazis cited above.
 

By Kurt Shuler
Kurt Shuler is Vice President of Marketing, Arteris, Inc.
 
Go to the Arteris SA website to learn more.
http://www.soccentral.com/results.asp?EntryID=41133

Keywords: computer system design, genera

HOT CHIPS 2013- HSA Foundation Presented Deeper Detail on HSA and HSAIL

 

Wanting to find out more about HSA,  at Hot Chips 2013, Phil Rogers ( AMD) , Ben Gaster ( Qualcomm),  Ian Bratt ( ARM), and Ben Sander ( AMD)presented on HSA, HSA Memory Model, HSA Queueing Model and HSAIL this last Sunday.  We now have the presentations posted in our developer publications page (http://107.170.238.52/publications/)  and media presentations (http://107.170.238.52/pubs-presos/)  as well as on HSA Foundation Slideshare. (http://www.slideshare.net/hsafoundation)

Dig into the material and see if you want join the exciting future of HSA enabled devices.

[one_half]

[/one_half][one_half_last]

[/one_half_last][one_half]

[/one_half][one_half_last]

[/one_half_last]

HSAIL: Write-Once-Run-Everywhere for Heterogeneous Systems – IEEE article

Ben Sander of AMD and  Chien-Ping Lu MediaTek HSA Foundation Working group leader for HSA Programer Reference Manual pen a nice article on HSAIL and HSA technology
 
“Power efficiency has emerged as a primary design goal for modern silicon chips.  Accelerators such as GPUs have well-known advantages in compute density per-watt and per-mm^2 – note for example that the systems at the top of the latest Green500 (http://www.green500.org/) and Top500 (http://www.top500.org/) lists are now based on heterogeneous designs.
However, these systems have traditionally been difficult to program, due to two challenges.  First, many accelerators support only dedicated address spaces that require cumbersome copy operations and prevent the use of pointer-based data structures on both the accelerator and the host processor.   Second, accelerator programming has traditionally required a specialized language such as OpenCL™ or CUDA™.  Some of these specialized languages are only supported by a single hardware vendor, which further constrains their adoption.
An intermediate language called HSAIL is helping to address some of the challenges. One of the benefits of HSAIL is its portability across multiple vendor products.  Compilers that generate HSAIL can be assured that the resulting code will be able to run on a wide variety of target platforms. HSAIL also provides existing programming languages with an efficient parallel intermediate language that runs on a wide variety of hardware.  This provides the underlying infrastructure and brings the benefits of heterogeneous computing to existing, popular programming models such as Java™, OpenMP™, C++, and more”. ………..  read more at this link bellow
http://www.computer.org/portal/web/computingnow/software%20engineering/content?g=53319&type=article&urlTitle=hsail%3A-write-once-run-everywhere-for-heterogenous-systems