Heterogeneous Systems Architecture (HSA) Foundation Adds Top German Universities as HSA Academic Centers of Excellence

Posted on December 6, 2016December 6, 2016 by mfrickie

TU Darmstadt and FAU Researching, Developing HSA-Compliant Technologies
BEAVERTON, OR – Dec. 6, 2016 – The Heterogeneous System Architecture (HSA) Foundation has expanded its Academic Partnership Program with the addition of two new HSA Academic Centers of Excellence – Technische Universitaet (TU) Darmstadt, and Friedrich-Alexander-University Erlangen-Nurnberg (FAU), both in Germany. The universities will undertake critical research that will help to further proliferate HSA platforms.
HSA is a standardized platform design that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
“Both TU Darmstadt and FAU are working on a number of innovative HSA projects that will have significant impact not just in the academic community, but also potentially for large-scale commercial use, particularly in data center settings,” said HSA Foundation President Dr. John Glossner.
The Embedded Systems and Applications Group (ESA) of TU Darmstadt (Germany), headed by Professor Andreas Koch, is working with the HSA Foundation to explore how reconfigurable computing (specifically via Field-Programmable Gate Arrays or FPGAs), can be employed in processing units within the HSA framework.
“Our collaboration with the HSA Foundation is enabling us to make excellent progress on research topics that would have been extremely difficult to tackle without the additional insight provided by the industry partners. We look forward to advancing the architecture of heterogeneous computers for both academic as well as industrial use-cases,” said Professor Koch.
According to FAU Professor Dietmar Fey, the Department of Computer Science is currently focusing on integrating image processing accelerators in an FPGA and developing an HSA compliant interface. FAU is also collaborating on HSA technology with TU Darmstadt’s Embedded Systems and Applications Group; and is working on a technical prototype with HSA Foundation members.
“Our Academic Center of Excellence relationship is now having a significant impact – we’re now able to reach out to many industrial partners and work with them in establishing a standardization for heterogeneous computing platforms. It’s now becoming possible to combine fundamental research from a university with real world industry architectures and applications,” said Professor Fey.
“Professor Koch and Professor Fey have extensive experience in heterogeneous platform architecture design,” noted Paul Blinzer, a Fellow in the Systems Software group at Advanced Micro Devices, Inc. (AMD). “Their participation in Heterogeneous System Architecture will advance state-of-the art research in heterogeneous systems.”
Last month the HSA Foundation announced the first Academic Center of Excellence at Northeastern University in Boston. To learn more about engaging with the HSA Foundation on academic programs, contact academic@hsafoundation.com.
About TU Darmstadt
TU Darmstadt was founded in 1877. In 1882, the university established the first chair in Electrical Engineering worldwide. The university is a member of TU9, an alliance of leading institutes of technology in Germany. TU Darmstadt offers a comprehensive spectrum of undergraduate, graduate and doctoral programs, with a focus on the natural and engineering sciences. The university, which also hosts two Fraunhofer Institutes, performs world-class research and actively collaborates with academic and industrial partners nationally and internationally.
About FAU
Founded in 1743, FAU is a research university with an international perspective and one of the largest universities in Germany, with 40,174 students, 256 degree programs, 4,000 academic staff (including over 647 professors), and 500 partnerships with universities all over the world. FAU’s outstanding research and teaching is reflected in top positions in both national and international rankings, as well as the high amount of DFG funding which its researchers are able to secure.
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.

The First HSA Academic Center of Excellence

Posted on November 10, 2016November 10, 2016 by mfrickie

Posted on November 9, 2016, Entrepreneur Podcast Network: http://epodcastnetwork.com/the-first-hsa-academic-center-of-excellence/ (please visit website for podcast download)
Dr. John Glossner, President of HSA or The Heterogeneous System Architecture a non-profit whose goal is making programming for parallel computing easy and pervasive joins Enterprise Radio to discuss more about the foundation, the overall benefit and the new partnership.
Listen to host Eric Dye & guest Dr. John Glossner discuss the following:

Describe the Foundation and how it got started.
What are the overall benefits of joining the Foundation and adopting the membership?
You recently announced new developments in heterogeneous architecture from Northeastern University. Can you tell us more about those developments?
Any particular reason why your selected Northeastern for this partnership?
Will the Foundation also be developing relationships with other key research universities worldwide? Have you identified those partnerships yet?
Which industry constituents might be interested in the HSA Foundation’s work?
How does the HSA Foundation development process work?
How does one get involved and what is the criteria for being accepted into the Foundation?

John Glossner, Ph.D. is the President of The Heterogeneous System Architecture (HSA) Foundation and is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive.
HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption.
HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Glossner currently serves as CEO of General Processor Technologies.

SC16 to Feature Milestones in Heterogeneous System Architecture (HSA) Programming Languages, Open Standards, and Open Source Tools

Posted on November 9, 2016November 9, 2016 by mfrickie

Beaverton, OR, Nov. 9, 2016 – SC16, the international conference for high-performance computing (HPC), networking, storage, and analysis, will feature sessions that highlight recent heterogeneous system architecture (HSA) momentum. SC16 brings together the international supercomputing community to discuss the technologies that will shape the future of large-scale technical computing and data-driven science.
WHO: The HSA Foundation, a non-profit consortium whose goal is making programming for parallel computing easy and pervasive. Participants in the SC16 HSA sessions include:

HSA Foundation Chairman and Senior Director of Radeon Open Compute for AMD Gregory Stoner
AMD Senior Fellow Design Engineer and GPU CTO Ben Sander
AMD Senior Member of Technical Staff Mayank Daga

WHAT: HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. HSA sessions at SC16 will highlight progress toward the Foundation’s goal of bringing true heterogeneous computing to platforms including vision based IoT systems, mobile devices, desktops, HPC systems, AR/VR environments, and servers. HSA-related sessions at SC16 include:

Emerging Technologies: Programming High-Performance Heterogeneous Computing Systems with the Radeon Open Compute Platform (ROCm)
Panel Discussion: Bringing About HPC Open-Standards World Peace
Exhibitor Forum: Revolutionizing Large-Scale Heterogeneous HPC Systems with AMD’s ROCm Platform

WHERE: Salt Palace Convention Center in Salt Lake City
WHEN: Nov. 13-18, 2016; visit the SC16 website for specific session times
“Many HSA Foundation members such as AMD are now delivering a wide range of heterogeneous systems, including those based on HSA,” noted HSA Foundation President Dr. John Glossner. “It’s very exciting as one of the Foundation’s goals is to bring true heterogeneous computing to an array of platforms, some of which include Deep Neural Networks (DNN’s), vision based IoT systems, mobile devices, desktops, high-performance computing (HPC) systems, AR/VR environments, and servers.”
“The HSA Foundation is a strong proponent of open source development tools directly and through its member companies,” said HSA Foundation Chairman Greg Stoner. “AMD’s Radeon Open Compute Platform (“ROCm”) initiative, for example, brings a rich heterogeneous programming foundation for developers, and offers an array of development tools now freely available supporting HSA.”
Stoner added that ROCm via an HSA standardized object loader supports two compiler foundations:

LLVM (Low Level Virtual Machine) compiler supports:

– HCC compiler for heterogeneous C++ with PSTL development
– HIP compiler for simply porting CUDA codes
– Continuum’s Anaconda with Numba for supporting Python development
– Khronos Group’s OpenCL C-based programming language

SUSE GCC via enablement of HSA runtimes and HSA object format in conjunction with General Processor Technologies and Parmance

About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

Platform and Hardware Requirements for HSA Technologies

Posted on September 8, 2016September 8, 2016 by mfrickie

by Paul Blinzer, September 8th, Embedded Computing Design: http://embedded-computing.com/guest-blogs/platform-and-hardware-requirements-for-hsa-technologies/#

Heterogeneous system architecture (HSA) is now a standardized platform design, supported by more than 40 technology companies and 17 universities, that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. Spearheading HSA is the HSA Foundation, a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs, and ISVs, whose goal is making programming for parallel computing easy and pervasive.

Briefly, HSA allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics, and fixed function accelerators—in today’s complex systems-on-chip (SoCs).

In this first of two posts, I’ll focus on platform and hardware requirements for HSA technologies; the second part will center on software and toolchains. Both will be discussed in depth at a tutorial at the upcoming 25th International Conference on Parallel Architectures and Compilation Architectures (PACT). The conference will be held from Sept. 11-15 in Haifa, Israel.

The architecture pillars of HSA

One of the key benefits of HSA is a set of platform architecture features and a programming model that software can depend on for parallel computing. Software using accelerators through an API like OpenCL, CUDA, or similar typically cannot assume that certain hardware features are available on every platform and so must either set a lowest common denominator or support many wildly different ways to essentially implement the same thing all over again to take advantage of an optional API feature while still supporting a lesser-equipped platform.

[Figure 1 | Pillars of HSA requirements]

HSA, in contrast, sets a requirement for a select few modern hardware features that make using the accelerator way more efficient and simplify the programming enormously, similar to what a CPU ISA like x86-64 or ARM has accomplished for compilers and application software development, with a reasonable expectation that a program will run efficiently on a platform with a particular processor and OS.

Let’s start with a brief list of requirements with a short explanation why they’re important.

Security and “quality of service”

An HSA accelerator is used as a peer processor to the CPU by the application, with all benefits and obligations. One of the key obligations is not allowing bad application code to do bad things to the system. Therefore, an HSA accelerator has a “user mode” ingrained for execution, where the operating system (OS) runtime sets strict policies at the hardware level for the accelerator. It can only access data and execute code that is part of the application process and if it accesses anything outside of the expected data, the OS runtime gets notified and can shut down the accelerator access by the application without affecting the rest of the system. A memory management unit (MMU) and other hardware to support it are therefore required by the HSA standard.

Shared virtual memory (SVM)

SVM allows the accelerator to access the application’s data directly and process it. That requires an MMU in hardware. HSA accelerators require it for system security reasons already, so no problem here.

Accelerators without SVM using OpenCL 1.x or the common CUDA API require the CPU to do a lot of work including parse application data, select/copy all data to/from the accelerator (which may need a dedicated buffer), and validate results. There is no concept of passing a pointer and allowing the accelerator to operate on shared memory. Often multiple data must be copied to the accelerator, but only one set of data is chosen. This can waste a lot of time to copy if we don’t know in advance which datasets are required. This copy overhead can seriously degrade the performance benefit of the accelerator.

This performance degradation is eradicated on an HSA accelerator. SVM allows the accelerator to parse and only access the application data it needs directly. As an added benefit, the accelerator and the CPU can access the same data in memory, avoiding unwanted duplicates.

Platform atomic operations

Anyone that has programmed with multithreaded code on a CPU knows how useful atomic operations are for ensuring that two threads can operate on the same data safely. Atomic operations are used in many different ways for implementing semaphores, mutexes, histograms, and many other tasks that require a particular order of execution or of access to work correctly. HSA compatible accelerators must support 32-bit or 64-bit atomics, as threads running on the CPU and on the accelerator operate and synchronize with each other very efficiently using atomic operations. Older accelerators without it always require arbitration using software APIs on the CPU. Software arbitration is very inefficient and these systems end up using far more CPU cycles that are better spent on other tasks.

HSA signals and doorbells

This is something special to HSA and a very significant feature for power-efficiency. You can consider these “atomics with benefits.” HSA Signals are data types created by the runtime that behave similar to atomically updated variables in memory. However, they allow the hardware to monitor and notify state changes, e.g. when a value is changed by an application thread or by an accelerator. One or more accelerators can update an HSA signal, listen and wait on a signal state change, and – if they have nothing else to do – go to sleep while waiting and be woken up immediately when something has changed. By using HSA signals, one accelerator can notify other accelerators directly that data is ready and these can start their processing immediately. If implemented fully, the CPU doesn’t need to perform any coordination and can stay asleep. This is a significant power saving feature because the only hardware that is needed for a task is only active when needed. HSA signals can be used easily everywhere in the software architecture and even HSA-based OpenCL implementations benefit.

User queues and dispatch

If you have programmed any accelerator using OpenCL or CUDA and followed in the debugger how it reaches the hardware, you will have noticed layers upon layers of software levels that the accelerator code and data must pass through until it finally reaches the hardware to be processed. HSA removes this inefficiency and cuts out the middle layers by defining a hardware-based user queue dispatch mechanism that can be directly accessed from the application runtime. The architected queuing language (AQL) that a packet processor of an accelerator uses allows any accelerator to either dispatch work to itself, to the CPU of the system to call OS runtime functions, or to other accelerators.

Cache coherency

Most HSA accelerators have caches that keep frequently used data close to the execution units. But since other processors in the system may also access the cached system memory, the application and the hardware must make sure that their content doesn’t get stale – especially in multithreaded execution. HSA accelerators therefore must provide mechanisms to keep the cached data current and either flush out pending data or invalidate cached data if other processors access the cached system memory. Cache coherency can be automatically enforced by hardware bus protocols or alternatively require instruction controls, which in the case of HSA is part of the definition.

With that, I hope I’ve made you interested enough to eagerly wait for the next blog entry, where I will touch on the HSA memory model HSAIL and how HSA integrates into today’s embedded systems.

Paul Blinzer works on a wide variety of Platform System Software architecture projects and specifically on the Heterogeneous System Architecture System Software at Advanced Micro Devices, Inc. as a Fellow in the System Software group. Paul is the chairperson of the System Architecture Workgroup of the HSA Foundation. Contact Mr. Blinzer at Paul.Blinzer@AMD.com.

HSA Foundation, AMD Spearheading HSA Technologies Tutorial at 25th International Conference on Parallel Architectures and Compilation Techniques

Posted on August 13, 2016August 13, 2016 by mfrickie

BEAVERTON, OR, Aug. 13, 2016 – The HSA (Heterogeneous System Architecture) Foundation and Foundation member AMD will be providing a tutorial on HSA technologies at next month’s 25th International Conference on Parallel Architectures and Compilation Architectures (PACT). The conference will be held from Sept. 11-15 in Haifa, Israel.
PACT brings together researchers from architecture, compilers, applications and languages to present and discuss innovative research.
The one-day tutorial, presented by AMD Fellow Paul Blinzer will have a morning session on Platform and Hardware requirements; the afternoon session will focus on Software and Toolchains. A snapshot on some of the topics:
Platform and Hardware Requirements

Rationale for HSA: GPUs, DSPs and more;
Architecture pillars of HSA
Memory model of HSA
HSAIL, Finalizer, BRIG
Integration of HSA platform features
System architecture research opportunities

Software and Toolchains

HSA software toolchains: LLVM, GCC, HCC, Python
Integrating HSAIL into a new toolchain, experiences and gotcha’s using BRIG, HSAIL, code generation, debugging metadata
Debugging, profiling an HSA-enabled application using these toolchains with CodeXL or gdb
Application frameworks using HSA/ROCR: CAFFE, SPARK, node.js
HSA tool extension for ROCm and CodeXL
Software models, research opportunities

HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
The tutorial and other PACT sessions will be held at the Dan Carmel hotel in Haifa.
For more information on the tutorial and to register, please see http://pactconf.org/program/workshops-tutorials/hsa/
For more information, including a full list of speakers, supporting organizations and sponsors please visit: the PACT 2016 conference.
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
About Paul Blinzer
Paul Blinzer works on a wide variety of Platform System Software architecture projects and specifically on the Heterogeneous System Architecture (HSA) System Software at Advanced Micro Devices, Inc. (AMD) as a Fellow in the System Software group. Living in the Seattle, WA area, during his career he has worked in various roles on system level driver development, system software development, graphics architecture, graphics & compute acceleration since the early ’90s. Paul is the chairperson of the “System Architecture Workgroup” of the HSA Foundation. He has a degree in Electrical Engineering (Dipl.-Ing) from TU Braunschweig, Germany.
https://www.linkedin.com/in/paul-blinzer-4523602
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

HSA Foundation Joins with China Semiconductor Industry Association to Hold the First Heterogeneous System Architecture Global Summit

Posted on August 9, 2016August 9, 2016 by mfrickie

BEAVERTON, OR, Aug. 9, 2016 – The HSA (Heterogeneous System Architecture) Foundation is joining with the China Semiconductor Industry Association (CSIA) to host the HSA 2016 Global Summit, Aug. 22-23 in Beijing. AMD, Huaxia General Processor Technologies, Imagination Technologies, LG, and MediaTek are co-organizers of the event, which will focus on the future of heterogeneous processing technology in electronics systems across a broad array of applications.
At the Summit, HSA Foundation President Dr. John Glossner will outline recent Foundation developments. Allen Lee, corporate vice president, AMD, will present product updates. HSA member companies will present further technical details and demonstrate HSA compatible systems. A range of industry experts, government officials and academics will discuss recent developments and deliver their visions of the future of heterogeneous processing – focused on topics including:

Development of heterogeneous computing and the HSA ecosystem in China
China’s domestic processor development
Multi-core chips and architectures
Design trends and challenges
Tools, software and operating systems
Developments across applications including software defined radio, mobile devices, Internet of Things (IoT), high-precision satellite navigation and positioning, high-performance computing for smart grid and 5G, intelligent unmanned vehicles, cloud computing, tensor computing, artificial intelligence, deep learning and more

HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
The HSA Summit will be held at the Beijing Yizhuang Fengda International Hotel on August 22-23, 2016. The HSA Summit is supported by the Beijing Economic and Technological Development Zone (E-Town), the Ministry of Industry and Information Technology of the People’s Republic of China (MIIT), Cyberspace Administration of China, and the Government of Beijing Municipality. Mr. Sheng Lian, Beijing E-Town’s director of the administrative committee, will chair the summit.
For more information, including a full list of speakers and supporting organizations please visit:. http://www.hsafoundation.com/chinasummit/.
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

HSA Foundation Aims for Broader Adoption of Coherent Memory Standard for Heterogeneous Processors

Posted on July 4, 2016July 4, 2016 by mfrickie

July 4, 2016, BDTi: http://www.bdti.com/InsideDSP/2016/07/05/HSAFoundation
Modern SoCs increasingly contain a variety of processing resources: one or more CPU cores and a GPU, often with a DSP, programmable logic, or one or multiple special-purpose co-processors for tasks such as computer vision. Properly harnessed, such heterogeneous processors often deliver impressive performance at low cost and low power consumption. But mapping applications onto heterogeneous processors is challenging. OpenCL, a specification standard language and runtime from the Khronos Group, enables the development of code that utilizes processing elements within a heterogeneous single- or multi-chip system. However, any processing efficiency gains derived from specialized computing elements can easily be negated by the added latency (not to mention incremental power consumption) incurred by copying data between computing elements.
Memory coherency among these diverse processing elements enables them to more efficiently share data via pointer-passing and queue-updating operations, versus bundling data and moving it via clumsy I/O operations through complex device drivers. Memory coherency has been common for some time in multi-CPU implementations; expanding the concept to GPUs, DSPs and other dissimilar architectures, however, is more challenging. OpenCL and other heterogeneous programming standards such as OpenMP and C++ AMP don’t make any attempt to standardize memory coherency, which requires the implementation of specific hardware features in each processing element. That mission has been taken up by the HSA (Heterogeneous System Architecture) Foundation, an industry group that has its origins in AMD’s proprietary Fusion System Architecture program (Figure 1).

Figure 1. The HSA Foundation boasts a sizeable, diverse membership list, but so far only AMD has chips implementing the organization’s standards.

Founded in mid-2012, the HSA Foundation released v1.0 of its specification suite in the spring of last year. And with newly announced, backward-compatible v1.1, according to foundation president Dr. John Glossner (who is also CEO of General Processor Technologies), the specification further expands beyond its AMD-centric foundations, supporting additional types of processor elements, as well as adding a number of requested features such as more flexible coherent memory access (Figure 2). SoC compatibility with the hardware aspects of the HSA specifications is becoming increasingly common, according to Glossner, and ARM for one agrees. In a recent briefing with Lead Mobile Strategist James Bruce and GPU Developer Tools Product Manager Anand Patel, the two ARM representatives noted that not only the latest Cortex-A73 and Mali-G71 but also the last several generations of ARM CPUs and GPUs are, in combination with newer CoreLink variants of ARM’s AMBA (Advanced Microcontroller Bus Architecture) interconnect, fully compatible with HSA’s memory coherency standards.

Figure 2. The initial v1.0 HSA specification was CPU- and GPU- specific, reflective of the AMD SoC platforms on which it was based (top), but the newer v1.1 spec is more vendor- and processor-agnostic, not to mention more flexible (bottom).

Hardware compatibility alone isn’t sufficient for full compliance with the HSA standards, however, which explains the current dearth of HSA-compliant SoCs in spite of significant industry backing for the HSA concept. At the core of HSA’s software scheme is HSAIL (the HSA Intermediate Language), an intermediate virtualized code abstraction created by HSA-cognizant compilers, which is then dynamically translated to a particular processor’s instruction set by a chip-vendor-supplied HSA Runtime layer. HSAIL-generating compilers are beginning to appear: AMD’s CLOC and the TUT (Tampere University of Technology) POCL both generate HSAIL from OpenCL source code, for example, while General Processor Technologies and Parmance have developed gccBrig, a BRIG (binary format) language front-end to GCC (the GNU Compiler Collection) that is a binary representation of HSAIL. Also, Continuum Analytics sponsors Numba, an open-source Python compiler with direct HSA support, specifically targeting GPU acceleration.
However, to date HSA Foundation creator AMD is the only member company to have developed an HSA Runtime, and then only for its latest Carrizo APU (accelerated processing unit, a CPU-GPU combo), which entered volume production at the end of last year. Even in AMD’s case, direct compilation to the end CPU and GPU instruction sets (versus to a HSAIL intermediate representation) is the preferred approach in AMD’s ROCm (Radeon Open Compute Platform). While we expect to see increased adoption of the HSA standards by AMD and other HSA Foundation member companies, some major chip suppliers are pursuing different approaches. Intel, for example, seems to prefer the Cilk scheme it’s championed, while NVIDIA continues to rely on its proprietary CUDA approach.
Researchers at Northeastern University recently validated that HSA, by removing the need for repeated data copy operations between heterogeneous processing elements, can dramatically improve algorithm performance– at least for a couple of algorithm examples (Figure 3). Three different memory access scenarios were considered: CL12 employs per-element buffers, while CL20 leverages a common albeit small shared virtual memory buffer; both employ only OpenCL. The full OpenCL-plus-HSA implementation, conversely, implements a unified memory space with fine-grained synchronization support, leverages regular pointers and doesn’t require copy operations. The evaluated FIR (finite impulse response) filter algorithm represents a memory-intensive streaming workload; AES (Advanced Encryption Standard) symmetric encryption and decryption conversely is a compute-intensive streaming workload. Glossner was also careful to point out that these results were measured on AMD’s Kaveri APU, which being a pre-HSA 1.0 device supports only limited coherent memory throughput.

Figure 3. Recent evaluations conducted by Northeastern University researchers highlight HSA’s performance-boosting potential in both memory-intensive (top) and compute-intensive (bottom) workloads, even with SoCs that aren’t fully HSA-optimized (A Comprehensive Performance Analysis of HSA and OpenCL 2.0, Proceedings of the 2016 International Symposium on Program Analysis and System Software, April 2016).

Any performance loss due to the HSAIL-plus-HSA Runtime multi-layer abstraction will, Glossner feels, be more than counterbalanced by the significant performance boost delivered by HSA’s support for full memory coherency between heterogeneous processing elements. AMD Carrizo APU-based systems are now shipping from PC OEMs such as ASUS, Dell and Lenovo, and Glossner anticipates additional HSA support announcements to arrive shortly from other SoC and IP core providers. Until then, though, HSA will remain an approach with industry-wide potential but limited deployment.
For more information on the HSA Foundation, see the following two videos from the May 2016 Embedded Vision Summit (Video 1 and 2).

AT&T adds fiber markets, Sprint weighs in on small cells … 5 things to know today

Posted on June 27, 2016June 27, 2016 by mfrickie

By Martha DeGrasse, June 27th, RCR Wireless News: http://www.rcrwireless.com/20160627/carriers/att-adds-fiber-markets-sprint-weighs-in-on-small-cells-5-things-to-know-today-tag4
1. AT&T is promising business customers download and upload speeds of up to 1 gigabit per second. The company said today that it has expanded its AT&T Fiber service in Texas, Tennessee, South Carolina and Oklahoma. In addition, the carrier is expanding AT&T Fiber in several major cities, including Los Angeles, San Francisco, San Diego and Fresno, California; Miami; Dallas and El Paso, Texas; and Louisville, Kentucky.
AT&T also said it is launching nationwide voice-over-IP service through AT&T Business Fiber. The service is available in 180 U.S. cities, most of which are located in the South, Southeast and Central U.S., or on the West Coast.
2. Sprint said it is not concerned about the pace at which its small cell rollout is proceeding. The company’s top executives report that “the permitting and approval stage for its small cell deployment has been ahead of expectations,” according to Wells Fargo analyst Jennifer Fritzsche, who met with Sprint’s CEO, CFO and CTO last week. In a separate meeting, Mobilitie CEO Gary Jabara told Fritzsche’s team that Mobilitie is cooperating closely with local authorities as it deploys on Sprint’s behalf, and that there have been no cities that have put a complete stop to small cell deployments.
Sprint said last summer that it planned to deploy tens of thousands of small cells, and so far has not released a public update to that number. Jabara said this spring that Mobilitie has deployed fewer than 2,000 small cells to date.
3. Apple is now licensing patents from Huawei, according to The Wall Street Journal. Citing “a person familiar with the matter,” the paper did not specify what type of patent Huawei reportedly licensed to Apple. Noting that Huawei spent $9.2 billion on research and development last year vs. Apple’s $8.1 billion, the report said the Chinese company is also the world’s largest filer of international patent applications under the Patent Cooperation Treaty.
In the smartphone market, Huawei is a distant third behind Samsung and Apple, but the company has made it clear that it wants to compete head-on with the two market leaders. Huawei already dominates the market for wireless infrastructure, where it holds a leading position despite political pressures that keep Huawei’s gear out of U.S. wireless networks.
4. Huawei’s smartphone ambitions may include a proprietary operating system. The company has reportedly hired former Apple designer Abigail Brody to work on an operating system that would be an alternative to Android if Huawei’s relationship with Android developer Google takes a turn for the worse. Other smartphone makers have tried to develop proprietary operating systems, but so far application developers have been reluctant to invest time and money in apps for platforms other than Android and iOS.
5. The Heterogeneous System Architecture Foundation has released a specification the group says will make it easier to integrate digital solutions that use disparate hardware. HSA is a standardized platform design supported by more than 40 technology companies and 17 universities. The new spec adds multivendor architecture support, which means manufacturers will be able to combine IP blocks from more than one vendor. One of the group’s primary goals is to enable heterogeneous computing for vision-based “internet of things” systems.

New Infographic From HSA Foundation Details Importance, Benefits of Heterogeneous Systems

Posted on June 17, 2016June 17, 2016 by mfrickie

Beaverton, Oregon, June 17, 2016 – The Heterogeneous System Architecture (HSA) Foundation today released a new infographic entitled, ‘HSA FOUNDATION, Harmonizing Hardware & Software Design for a Connected Future’. The infographic details why heterogeneous architectures are important for future electronic systems, and looks ahead at how heterogeneous architectures benefit end users.
HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
“We created the new infographic based on a survey of the HSA Foundation members, many of whom are designing, programming or delivering a wide range of heterogeneous systems – including those based on HSA. As such, it provides insight into key issues and trends affecting these systems that power the electronic devices across every aspect of our lives,” said Dr. John Glossner, HSA Foundation president.
To access the infographic, visit www.hsafoundation.com/infographic.

About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing. See: http://www.hsafoundation.com/
Follow the HSA Foundation on Twitter, LinkedIn and Facebook.

Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

Toward a Hardware-Agnostic World: HSA Foundation Releases Specification v1.1

Posted on June 2, 2016June 2, 2016 by mfrickie

By Jim Turley, EE Journal: http://www.eejournal.com/archives/articles/20160601-hsa/
I think there’s something great and generic about goldfish. They’re everybody’s first pet. – Paul Rudd
It’s finally happened: processors are now completely generic and interchangeable.
Might as well go home, CPU designers. There is no differentiation left to exploit. All of your processor architectures, instruction sets, pipelines, code profiling, register files, clever ALUs, bus interfaces – all of it is now as generic and substitutable as 80’s hair band drummers. Your entire branch of technology has been supplanted by some programmers.
Okay, so maybe it’s not quite that dire. But we’re getting there.
You have the HSA Foundation to thank for that. Their job is to make CPUs, DSPs, GPUs, VLIW machines (and pretty much anything else that can execute code) totally interchangeable. In the big SWOT analysis of hardware resources, the CPU becomes a “don’t care.” That is, HSA (which stands for Heterogeneous Systems Architecture) tries to make any code execute on any processor, regardless of its architecture, instruction set, or number of cores. They’ll let you run your operating system on a DSP, your graphics code on an integer CPU, and your signal-processing algorithms on a GPU. Hardware is hardware; just write your code and let HSA sort it out.
At least, that’s the promise the group has been making for the past few years. It’s what Steve Jobs would’ve called, “a big hairy audacious goal.” Hey, let’s treat all programming languages the same and all hardware engines the same. Programmers can write their source code in whatever language(s) they prefer, and let it run on whatever hardware they have lying around. Most of all, HSA allows you to mix different processor architectures together (that’s the “heterogeneous” part) so that you can, for example, run a multicore x86 processor alongside a cluster of ARM cores, next to a gaggle of nVidia GPUs. Pay no attention to how those processors are interconnected, or how many there are, or even what type of chip you’ve got. It’s all good! Throw ’em all together and let the software sort ’em out!
Sound like magic? Kind of. Sound like a bad idea that’s already been done to death by a thousand different university students who think they’ve stumbled on a fantastic (and original) idea? You’d be correct there, too. The idea of a universal hardware platform is hardly new, and the road to hardware independence is paved with other people’s venture capital. Java is about the only example of hardware-independent software that made any kind of a dent in the industry – but dents can be good or bad.
But wait a sec – isn’t Java already hardware agnostic (as in, “we don’t believe hardware exists”), and if so, why do we need another one? And for that matter, isn’t all code written in C++ or Python or BASIC or any decent language also platform-independent? Wasn’t that the whole idea of high-level languages? What problem are we actually solving here, and hasn’t it already been solved anyhow?
Well, yes and no. Java bytecode is (ahem) more or less transportable across different CPU architectures… assuming the architecture in question has its own bytecode interpreter or JIT or equivalent translator. And C code is certainly transportable… right up until it’s compiled. At that point, it’s very hardware-specific. But neither of these examples really ignores the underlying architecture of the chip you’re programming. Nobody writes C code without knowing if it’s intended for a conventional CPU, or a DSP, or a graphics processor. Same goes for any other programming language. You always want to know something about the processor it’s going to run on, even if you’re not bit-twiddling individual configuration registers.
So HSA wants to abstract-away that last vestige of processor prejudice. This is particularly important and useful in today’s systems that mix and match so many different kinds of processors. How cool would it be to write your C or Python and truly not care how many processors, or of what type, were ultimately going to host it?
The core of HSA’s technology, as with so many other “universal hardware platforms,” is an intermediate virtual machine. In other words, you’re writing code for an imaginary CPU, and HSA-compliant tools then convert that to actual machine code for the hardware you really have. It’s not too different in concept from any other compiler, and pretty similar to the way Java is compiled.
This intermediate layer is called HSAIL (HSA Intermediate Language), and it’s specified just like a real CPU with a real instruction set and everything. In fact, you can download the HSAIL specification for free and build your own HSA-compliant toolchain if you like. The HSA Foundation would probably be happy to encourage you.
The only hardware requirement to using HSA is that all the processors in your system must share a single, cache-coherent memory space. That’s important, and it’s non-negotiable. It’s the key feature that allows HSA tools to allocate and reallocate code segments among processors. When everyone shares a memory map, a pointer is a pointer, regardless of who created it or who dereferences it. Cache coherence is also mandatory, for much the same reason. The results of one processor’s calculations have to be universally accessible to all the other processors, without careful planning or message-passing.
In fact, that lack of planning and messaging is one of HSA’s strengths, though it’s hardly unique. The group recently ran some benchmarks comparing HSA-compliant code with OpenCL (which also tolerates heterogeneous hardware resources). In HSA’s testing, their code did far better, of course, and often by orders of magnitude.
An FIR filter, for example, ran about 10x to 100x faster than the equivalent OpenCL code. Pretty impressive. But can a toolchain really make that much difference? Depends what you’re comparing it to. Software FIR filters are very memory-intensive, and the OpenCL implementation handles its data structures in a “pass by value” method. In other words, it copies all of the data from one processor’s memory space to another’s. That wastes a huge amount of time (and consumes a lot of memory). HSA, in contrast, does “pass by reference.” Voila – you’ve saved a mountain of time with a different toolchain.
So who’s behind the HSA Foundation? Who stands to gain from this? Like many consortia, HSA draws its members from industry. On the CPU side, they’ve got support from AMD, ARM, and Imagination Technologies. So there’s x86, ARM, and MIPS represented, as well as Radeon, Mali, and PowerVR graphics. Toshiba, Texas instruments, Tensilica, Analog Devices, Ceva, Synopsys (with ARC), and other second-tier CPU vendors also participate. A lot of universities are contributing manpower, and several research laboratories are represented, too. So a good cross-section of interested parties overall.
Does it really work? It seems to, at least in early testing. The group has just released version 1.1 of its specification (also available for free download), and they’re adding support for more compilers and more processors. Compared to v1.0, HSA v1.1 is now more closely compatible with gcc. It’s a long and tricky process, but the HSA Foundation seems to be making real progress toward making CPU designers obsolete.