First Heterogeneous System Architecture 2016 Global Summit Kicked Off Today in Beijing, China

Posted on August 22, 2016August 22, 2016 by mfrickie

BEIJING, Aug. 22, 2016 – The highly anticipated Heterogeneous System Architecture (HSA) 2016 Global Summit kicked off today here in Beijing. The two-day event (Aug. 22-23) is co-sponsored by the Heterogeneous System Architecture (HSA) Foundation and the China Semiconductor Industry Association (CSIA) and is drawing standing room only audiences at the Beijing Yizhuang Fengda International Hotel.
Dozens of influential IP suppliers, processor design companies, tools vendors, software vendors and operating system companies in China’s processor-related industrial chain are participating in the Summit, together with numerous mobile manufacturers, unmanned aerial vehicles and robotics application developers, universities and research institutes, and investment institutions.
The HSA summit is also supported by the Beijing Economic and Technological Development Zone (E-Town), the Ministry of Industry and Information Technology of the People’s Republic of China (MIIT), and Cyberspace Administration of China.
The HSA Summit is discussing topics surrounding heterogeneous system architecture, including various HSA applications in Artificial Intelligence, Deep Learning, Software Defined Radio, Internet-of-Things, and more.
Supporting Quotes
“We’re excited to be hosting the first HSA 2016 Global Summit here in Beijing. A few months ago we released the HSA 1.1 specification that greatly enhances the ability to integrate open and proprietary IP blocks in heterogeneous designs. We’re now seeing an array of HSA compliant solutions entering the market and during the summit HSA member companies will be presenting further technical details and demonstrating HSA compatible systems.”

Dr. John Glossner, President, HSA Foundation

“HSA is now allowing developers not only in China – but worldwide – to efficiently apply hardware resources – from CPUs to GPUs, DSPs to FPGAs – in today’s complex systems-on-chip (SoC). We’re seeing developments across numerous applications, some of which include mobile devices, Internet of Things (IoT), HPC, cloud computing, artificial intelligence, and much more. The HSA ecosystem is also growing rapidly in China and we look forward to further collaborative endeavors with our colleagues here. We are also thankful for AMD’s continued investment in HSA technologies and its open source efforts via the ROCm platform that bring rich HSA-enabled drivers, runtimes, compiler and tools to the global developer community.”

Greg Stoner, Chairman and Managing Director, HSA Foundation

“Today the market demand for high-performance parallel computing is exploding in the fields of Machine Vision, Artificial Intelligence, Cloud Computing, AR / VR, Software Defined Radio, and more. All of these systems are heterogeneous systems. HSA facilitates programming of these systems by enabling GPUs, DSPs and other accelerators to execute computationally expensive workloads in a complex SoC more effectively and efficiently than a CPU. We are very delighted that this event brought attention to a wide range of attendees including chip designers, hardware and software developers, programmers, and even system integrators. These companies will play a key role in building the HSA ecosystem.”

Kerry Li, CEO, HuaXia General Processor Technologies

“Heterogeneous processing represents the future of computing across a wide range of applications. At Imagination, our IP is already used extensively in heterogeneous SoCs. We are focused on making it as easy as possible for customers and developers to create and program next-generation SoCs which will be increasingly complex, and will undoubtedly use IP from multiple vendors. We are delighted to join with other industry experts at this first HSA Summit to discuss the critical issues surrounding the future of processing.”

James Liu, VP and General Manager China, Imagination Technologies

“As the market leader in mobile and home entertainment SOC products, MediaTek continues to deliver superior performance with high energy efficiency that provides exceptional user experiences through cutting-edge heterogeneous computing technologies such as Tri-cluster, deca-core architecture, Deep Learning initiatives and advanced multimedia features. We applaud the HSA Foundation’s efforts to further grow the ecosystem in China, and we wish the HSA 2016 Global Summit a great success.”

Ryan Chen, General Manager of Computing System Engineering, MediaTek

“As a sponsor member of HSA Foundation, AMD is committed to supporting an open ecosystem where developers can choose freely. As a feature-rich open-source software platform, ROCm helps to realize optimization in super-large-scale multi-GPU computing, and support a more inclusive software engineering community, so as to provide developers with an optimal and simple programming environment. We hope to promote more academic research and business innovation in open-source architecture. We also hope to utilize the open-source architecture to develop more user interfaces and tools together with our partners.”

Allen Lee, Corporate Vice President of Engineering and General Manager, China R&D Center, AMD

About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.

HSA Foundation, AMD Spearheading HSA Technologies Tutorial at 25th International Conference on Parallel Architectures and Compilation Techniques

Posted on August 13, 2016August 13, 2016 by mfrickie

BEAVERTON, OR, Aug. 13, 2016 – The HSA (Heterogeneous System Architecture) Foundation and Foundation member AMD will be providing a tutorial on HSA technologies at next month’s 25th International Conference on Parallel Architectures and Compilation Architectures (PACT). The conference will be held from Sept. 11-15 in Haifa, Israel.
PACT brings together researchers from architecture, compilers, applications and languages to present and discuss innovative research.
The one-day tutorial, presented by AMD Fellow Paul Blinzer will have a morning session on Platform and Hardware requirements; the afternoon session will focus on Software and Toolchains. A snapshot on some of the topics:
Platform and Hardware Requirements

Rationale for HSA: GPUs, DSPs and more;
Architecture pillars of HSA
Memory model of HSA
HSAIL, Finalizer, BRIG
Integration of HSA platform features
System architecture research opportunities

Software and Toolchains

HSA software toolchains: LLVM, GCC, HCC, Python
Integrating HSAIL into a new toolchain, experiences and gotcha’s using BRIG, HSAIL, code generation, debugging metadata
Debugging, profiling an HSA-enabled application using these toolchains with CodeXL or gdb
Application frameworks using HSA/ROCR: CAFFE, SPARK, node.js
HSA tool extension for ROCm and CodeXL
Software models, research opportunities

HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
The tutorial and other PACT sessions will be held at the Dan Carmel hotel in Haifa.
For more information on the tutorial and to register, please see http://pactconf.org/program/workshops-tutorials/hsa/
For more information, including a full list of speakers, supporting organizations and sponsors please visit: the PACT 2016 conference.
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
About Paul Blinzer
Paul Blinzer works on a wide variety of Platform System Software architecture projects and specifically on the Heterogeneous System Architecture (HSA) System Software at Advanced Micro Devices, Inc. (AMD) as a Fellow in the System Software group. Living in the Seattle, WA area, during his career he has worked in various roles on system level driver development, system software development, graphics architecture, graphics & compute acceleration since the early ’90s. Paul is the chairperson of the “System Architecture Workgroup” of the HSA Foundation. He has a degree in Electrical Engineering (Dipl.-Ing) from TU Braunschweig, Germany.
https://www.linkedin.com/in/paul-blinzer-4523602
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

HSA Foundation Joins with China Semiconductor Industry Association to Hold the First Heterogeneous System Architecture Global Summit

Posted on August 9, 2016August 9, 2016 by mfrickie

BEAVERTON, OR, Aug. 9, 2016 – The HSA (Heterogeneous System Architecture) Foundation is joining with the China Semiconductor Industry Association (CSIA) to host the HSA 2016 Global Summit, Aug. 22-23 in Beijing. AMD, Huaxia General Processor Technologies, Imagination Technologies, LG, and MediaTek are co-organizers of the event, which will focus on the future of heterogeneous processing technology in electronics systems across a broad array of applications.
At the Summit, HSA Foundation President Dr. John Glossner will outline recent Foundation developments. Allen Lee, corporate vice president, AMD, will present product updates. HSA member companies will present further technical details and demonstrate HSA compatible systems. A range of industry experts, government officials and academics will discuss recent developments and deliver their visions of the future of heterogeneous processing – focused on topics including:

Development of heterogeneous computing and the HSA ecosystem in China
China’s domestic processor development
Multi-core chips and architectures
Design trends and challenges
Tools, software and operating systems
Developments across applications including software defined radio, mobile devices, Internet of Things (IoT), high-precision satellite navigation and positioning, high-performance computing for smart grid and 5G, intelligent unmanned vehicles, cloud computing, tensor computing, artificial intelligence, deep learning and more

HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
The HSA Summit will be held at the Beijing Yizhuang Fengda International Hotel on August 22-23, 2016. The HSA Summit is supported by the Beijing Economic and Technological Development Zone (E-Town), the Ministry of Industry and Information Technology of the People’s Republic of China (MIIT), Cyberspace Administration of China, and the Government of Beijing Municipality. Mr. Sheng Lian, Beijing E-Town’s director of the administrative committee, will chair the summit.
For more information, including a full list of speakers and supporting organizations please visit:. http://www.hsafoundation.com/chinasummit/.
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook and LinkedIn.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

HSA Foundation Aims for Broader Adoption of Coherent Memory Standard for Heterogeneous Processors

Posted on July 4, 2016July 4, 2016 by mfrickie

July 4, 2016, BDTi: http://www.bdti.com/InsideDSP/2016/07/05/HSAFoundation
Modern SoCs increasingly contain a variety of processing resources: one or more CPU cores and a GPU, often with a DSP, programmable logic, or one or multiple special-purpose co-processors for tasks such as computer vision. Properly harnessed, such heterogeneous processors often deliver impressive performance at low cost and low power consumption. But mapping applications onto heterogeneous processors is challenging. OpenCL, a specification standard language and runtime from the Khronos Group, enables the development of code that utilizes processing elements within a heterogeneous single- or multi-chip system. However, any processing efficiency gains derived from specialized computing elements can easily be negated by the added latency (not to mention incremental power consumption) incurred by copying data between computing elements.
Memory coherency among these diverse processing elements enables them to more efficiently share data via pointer-passing and queue-updating operations, versus bundling data and moving it via clumsy I/O operations through complex device drivers. Memory coherency has been common for some time in multi-CPU implementations; expanding the concept to GPUs, DSPs and other dissimilar architectures, however, is more challenging. OpenCL and other heterogeneous programming standards such as OpenMP and C++ AMP don’t make any attempt to standardize memory coherency, which requires the implementation of specific hardware features in each processing element. That mission has been taken up by the HSA (Heterogeneous System Architecture) Foundation, an industry group that has its origins in AMD’s proprietary Fusion System Architecture program (Figure 1).

Figure 1. The HSA Foundation boasts a sizeable, diverse membership list, but so far only AMD has chips implementing the organization’s standards.

Founded in mid-2012, the HSA Foundation released v1.0 of its specification suite in the spring of last year. And with newly announced, backward-compatible v1.1, according to foundation president Dr. John Glossner (who is also CEO of General Processor Technologies), the specification further expands beyond its AMD-centric foundations, supporting additional types of processor elements, as well as adding a number of requested features such as more flexible coherent memory access (Figure 2). SoC compatibility with the hardware aspects of the HSA specifications is becoming increasingly common, according to Glossner, and ARM for one agrees. In a recent briefing with Lead Mobile Strategist James Bruce and GPU Developer Tools Product Manager Anand Patel, the two ARM representatives noted that not only the latest Cortex-A73 and Mali-G71 but also the last several generations of ARM CPUs and GPUs are, in combination with newer CoreLink variants of ARM’s AMBA (Advanced Microcontroller Bus Architecture) interconnect, fully compatible with HSA’s memory coherency standards.

Figure 2. The initial v1.0 HSA specification was CPU- and GPU- specific, reflective of the AMD SoC platforms on which it was based (top), but the newer v1.1 spec is more vendor- and processor-agnostic, not to mention more flexible (bottom).

Hardware compatibility alone isn’t sufficient for full compliance with the HSA standards, however, which explains the current dearth of HSA-compliant SoCs in spite of significant industry backing for the HSA concept. At the core of HSA’s software scheme is HSAIL (the HSA Intermediate Language), an intermediate virtualized code abstraction created by HSA-cognizant compilers, which is then dynamically translated to a particular processor’s instruction set by a chip-vendor-supplied HSA Runtime layer. HSAIL-generating compilers are beginning to appear: AMD’s CLOC and the TUT (Tampere University of Technology) POCL both generate HSAIL from OpenCL source code, for example, while General Processor Technologies and Parmance have developed gccBrig, a BRIG (binary format) language front-end to GCC (the GNU Compiler Collection) that is a binary representation of HSAIL. Also, Continuum Analytics sponsors Numba, an open-source Python compiler with direct HSA support, specifically targeting GPU acceleration.
However, to date HSA Foundation creator AMD is the only member company to have developed an HSA Runtime, and then only for its latest Carrizo APU (accelerated processing unit, a CPU-GPU combo), which entered volume production at the end of last year. Even in AMD’s case, direct compilation to the end CPU and GPU instruction sets (versus to a HSAIL intermediate representation) is the preferred approach in AMD’s ROCm (Radeon Open Compute Platform). While we expect to see increased adoption of the HSA standards by AMD and other HSA Foundation member companies, some major chip suppliers are pursuing different approaches. Intel, for example, seems to prefer the Cilk scheme it’s championed, while NVIDIA continues to rely on its proprietary CUDA approach.
Researchers at Northeastern University recently validated that HSA, by removing the need for repeated data copy operations between heterogeneous processing elements, can dramatically improve algorithm performance– at least for a couple of algorithm examples (Figure 3). Three different memory access scenarios were considered: CL12 employs per-element buffers, while CL20 leverages a common albeit small shared virtual memory buffer; both employ only OpenCL. The full OpenCL-plus-HSA implementation, conversely, implements a unified memory space with fine-grained synchronization support, leverages regular pointers and doesn’t require copy operations. The evaluated FIR (finite impulse response) filter algorithm represents a memory-intensive streaming workload; AES (Advanced Encryption Standard) symmetric encryption and decryption conversely is a compute-intensive streaming workload. Glossner was also careful to point out that these results were measured on AMD’s Kaveri APU, which being a pre-HSA 1.0 device supports only limited coherent memory throughput.

Figure 3. Recent evaluations conducted by Northeastern University researchers highlight HSA’s performance-boosting potential in both memory-intensive (top) and compute-intensive (bottom) workloads, even with SoCs that aren’t fully HSA-optimized (A Comprehensive Performance Analysis of HSA and OpenCL 2.0, Proceedings of the 2016 International Symposium on Program Analysis and System Software, April 2016).

Any performance loss due to the HSAIL-plus-HSA Runtime multi-layer abstraction will, Glossner feels, be more than counterbalanced by the significant performance boost delivered by HSA’s support for full memory coherency between heterogeneous processing elements. AMD Carrizo APU-based systems are now shipping from PC OEMs such as ASUS, Dell and Lenovo, and Glossner anticipates additional HSA support announcements to arrive shortly from other SoC and IP core providers. Until then, though, HSA will remain an approach with industry-wide potential but limited deployment.
For more information on the HSA Foundation, see the following two videos from the May 2016 Embedded Vision Summit (Video 1 and 2).

AT&T adds fiber markets, Sprint weighs in on small cells … 5 things to know today

Posted on June 27, 2016June 27, 2016 by mfrickie

By Martha DeGrasse, June 27th, RCR Wireless News: http://www.rcrwireless.com/20160627/carriers/att-adds-fiber-markets-sprint-weighs-in-on-small-cells-5-things-to-know-today-tag4
1. AT&T is promising business customers download and upload speeds of up to 1 gigabit per second. The company said today that it has expanded its AT&T Fiber service in Texas, Tennessee, South Carolina and Oklahoma. In addition, the carrier is expanding AT&T Fiber in several major cities, including Los Angeles, San Francisco, San Diego and Fresno, California; Miami; Dallas and El Paso, Texas; and Louisville, Kentucky.
AT&T also said it is launching nationwide voice-over-IP service through AT&T Business Fiber. The service is available in 180 U.S. cities, most of which are located in the South, Southeast and Central U.S., or on the West Coast.
2. Sprint said it is not concerned about the pace at which its small cell rollout is proceeding. The company’s top executives report that “the permitting and approval stage for its small cell deployment has been ahead of expectations,” according to Wells Fargo analyst Jennifer Fritzsche, who met with Sprint’s CEO, CFO and CTO last week. In a separate meeting, Mobilitie CEO Gary Jabara told Fritzsche’s team that Mobilitie is cooperating closely with local authorities as it deploys on Sprint’s behalf, and that there have been no cities that have put a complete stop to small cell deployments.
Sprint said last summer that it planned to deploy tens of thousands of small cells, and so far has not released a public update to that number. Jabara said this spring that Mobilitie has deployed fewer than 2,000 small cells to date.
3. Apple is now licensing patents from Huawei, according to The Wall Street Journal. Citing “a person familiar with the matter,” the paper did not specify what type of patent Huawei reportedly licensed to Apple. Noting that Huawei spent $9.2 billion on research and development last year vs. Apple’s $8.1 billion, the report said the Chinese company is also the world’s largest filer of international patent applications under the Patent Cooperation Treaty.
In the smartphone market, Huawei is a distant third behind Samsung and Apple, but the company has made it clear that it wants to compete head-on with the two market leaders. Huawei already dominates the market for wireless infrastructure, where it holds a leading position despite political pressures that keep Huawei’s gear out of U.S. wireless networks.
4. Huawei’s smartphone ambitions may include a proprietary operating system. The company has reportedly hired former Apple designer Abigail Brody to work on an operating system that would be an alternative to Android if Huawei’s relationship with Android developer Google takes a turn for the worse. Other smartphone makers have tried to develop proprietary operating systems, but so far application developers have been reluctant to invest time and money in apps for platforms other than Android and iOS.
5. The Heterogeneous System Architecture Foundation has released a specification the group says will make it easier to integrate digital solutions that use disparate hardware. HSA is a standardized platform design supported by more than 40 technology companies and 17 universities. The new spec adds multivendor architecture support, which means manufacturers will be able to combine IP blocks from more than one vendor. One of the group’s primary goals is to enable heterogeneous computing for vision-based “internet of things” systems.

New Infographic From HSA Foundation Details Importance, Benefits of Heterogeneous Systems

Posted on June 17, 2016June 17, 2016 by mfrickie

Beaverton, Oregon, June 17, 2016 – The Heterogeneous System Architecture (HSA) Foundation today released a new infographic entitled, ‘HSA FOUNDATION, Harmonizing Hardware & Software Design for a Connected Future’. The infographic details why heterogeneous architectures are important for future electronic systems, and looks ahead at how heterogeneous architectures benefit end users.
HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
“We created the new infographic based on a survey of the HSA Foundation members, many of whom are designing, programming or delivering a wide range of heterogeneous systems – including those based on HSA. As such, it provides insight into key issues and trends affecting these systems that power the electronic devices across every aspect of our lives,” said Dr. John Glossner, HSA Foundation president.
To access the infographic, visit www.hsafoundation.com/infographic.

About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing. See: http://www.hsafoundation.com/
Follow the HSA Foundation on Twitter, LinkedIn and Facebook.

Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

Toward a Hardware-Agnostic World: HSA Foundation Releases Specification v1.1

Posted on June 2, 2016June 2, 2016 by mfrickie

By Jim Turley, EE Journal: http://www.eejournal.com/archives/articles/20160601-hsa/
I think there’s something great and generic about goldfish. They’re everybody’s first pet. – Paul Rudd
It’s finally happened: processors are now completely generic and interchangeable.
Might as well go home, CPU designers. There is no differentiation left to exploit. All of your processor architectures, instruction sets, pipelines, code profiling, register files, clever ALUs, bus interfaces – all of it is now as generic and substitutable as 80’s hair band drummers. Your entire branch of technology has been supplanted by some programmers.
Okay, so maybe it’s not quite that dire. But we’re getting there.
You have the HSA Foundation to thank for that. Their job is to make CPUs, DSPs, GPUs, VLIW machines (and pretty much anything else that can execute code) totally interchangeable. In the big SWOT analysis of hardware resources, the CPU becomes a “don’t care.” That is, HSA (which stands for Heterogeneous Systems Architecture) tries to make any code execute on any processor, regardless of its architecture, instruction set, or number of cores. They’ll let you run your operating system on a DSP, your graphics code on an integer CPU, and your signal-processing algorithms on a GPU. Hardware is hardware; just write your code and let HSA sort it out.
At least, that’s the promise the group has been making for the past few years. It’s what Steve Jobs would’ve called, “a big hairy audacious goal.” Hey, let’s treat all programming languages the same and all hardware engines the same. Programmers can write their source code in whatever language(s) they prefer, and let it run on whatever hardware they have lying around. Most of all, HSA allows you to mix different processor architectures together (that’s the “heterogeneous” part) so that you can, for example, run a multicore x86 processor alongside a cluster of ARM cores, next to a gaggle of nVidia GPUs. Pay no attention to how those processors are interconnected, or how many there are, or even what type of chip you’ve got. It’s all good! Throw ’em all together and let the software sort ’em out!
Sound like magic? Kind of. Sound like a bad idea that’s already been done to death by a thousand different university students who think they’ve stumbled on a fantastic (and original) idea? You’d be correct there, too. The idea of a universal hardware platform is hardly new, and the road to hardware independence is paved with other people’s venture capital. Java is about the only example of hardware-independent software that made any kind of a dent in the industry – but dents can be good or bad.
But wait a sec – isn’t Java already hardware agnostic (as in, “we don’t believe hardware exists”), and if so, why do we need another one? And for that matter, isn’t all code written in C++ or Python or BASIC or any decent language also platform-independent? Wasn’t that the whole idea of high-level languages? What problem are we actually solving here, and hasn’t it already been solved anyhow?
Well, yes and no. Java bytecode is (ahem) more or less transportable across different CPU architectures… assuming the architecture in question has its own bytecode interpreter or JIT or equivalent translator. And C code is certainly transportable… right up until it’s compiled. At that point, it’s very hardware-specific. But neither of these examples really ignores the underlying architecture of the chip you’re programming. Nobody writes C code without knowing if it’s intended for a conventional CPU, or a DSP, or a graphics processor. Same goes for any other programming language. You always want to know something about the processor it’s going to run on, even if you’re not bit-twiddling individual configuration registers.
So HSA wants to abstract-away that last vestige of processor prejudice. This is particularly important and useful in today’s systems that mix and match so many different kinds of processors. How cool would it be to write your C or Python and truly not care how many processors, or of what type, were ultimately going to host it?
The core of HSA’s technology, as with so many other “universal hardware platforms,” is an intermediate virtual machine. In other words, you’re writing code for an imaginary CPU, and HSA-compliant tools then convert that to actual machine code for the hardware you really have. It’s not too different in concept from any other compiler, and pretty similar to the way Java is compiled.
This intermediate layer is called HSAIL (HSA Intermediate Language), and it’s specified just like a real CPU with a real instruction set and everything. In fact, you can download the HSAIL specification for free and build your own HSA-compliant toolchain if you like. The HSA Foundation would probably be happy to encourage you.
The only hardware requirement to using HSA is that all the processors in your system must share a single, cache-coherent memory space. That’s important, and it’s non-negotiable. It’s the key feature that allows HSA tools to allocate and reallocate code segments among processors. When everyone shares a memory map, a pointer is a pointer, regardless of who created it or who dereferences it. Cache coherence is also mandatory, for much the same reason. The results of one processor’s calculations have to be universally accessible to all the other processors, without careful planning or message-passing.
In fact, that lack of planning and messaging is one of HSA’s strengths, though it’s hardly unique. The group recently ran some benchmarks comparing HSA-compliant code with OpenCL (which also tolerates heterogeneous hardware resources). In HSA’s testing, their code did far better, of course, and often by orders of magnitude.
An FIR filter, for example, ran about 10x to 100x faster than the equivalent OpenCL code. Pretty impressive. But can a toolchain really make that much difference? Depends what you’re comparing it to. Software FIR filters are very memory-intensive, and the OpenCL implementation handles its data structures in a “pass by value” method. In other words, it copies all of the data from one processor’s memory space to another’s. That wastes a huge amount of time (and consumes a lot of memory). HSA, in contrast, does “pass by reference.” Voila – you’ve saved a mountain of time with a different toolchain.
So who’s behind the HSA Foundation? Who stands to gain from this? Like many consortia, HSA draws its members from industry. On the CPU side, they’ve got support from AMD, ARM, and Imagination Technologies. So there’s x86, ARM, and MIPS represented, as well as Radeon, Mali, and PowerVR graphics. Toshiba, Texas instruments, Tensilica, Analog Devices, Ceva, Synopsys (with ARC), and other second-tier CPU vendors also participate. A lot of universities are contributing manpower, and several research laboratories are represented, too. So a good cross-section of interested parties overall.
Does it really work? It seems to, at least in early testing. The group has just released version 1.1 of its specification (also available for free download), and they’re adding support for more compilers and more processors. Compared to v1.0, HSA v1.1 is now more closely compatible with gcc. It’s a long and tricky process, but the HSA Foundation seems to be making real progress toward making CPU designers obsolete.

HSA spec upgrade supports multivendor SoCs

Posted on June 1, 2016June 1, 2016 by mfrickie

By Peter Clarke, EE Times Europe: http://www.electronics-eetimes.com/news/hsa-spec-upgrade-supports-multivendor-socs
The Heterogeneous Systems Architecture (HSA) Foundation – a grouping of chip, IP and software companies – has released the HSA 1.1 specification claiming it takes developers closer to energy-efficient heterogeneous computing.
The specification update comes just over a year after v1.0 and enhances the ability to integrate proprietary IP blocks and blocks from multiple vendors in heterogeneous designs
The specification is intended to allow developers to write software and efficiently apply it to hardware resources of multiple types – CPUs, GPUs, DSPs, FPGAs, fabrics and fixed-function accelerators. This can be done by writing in OpenCL 2.X, C++, Java and compiling to HSAIL, the HSA intermediate language.
The additions under HSA 1.1 include: multi-vendor support, improved interoperation with graphics, cameras and other image processors, digital signal processors; a formal definition of the HSA memory model; support for system-level profiling; run-time improvements including the capability to wait on multiple signals, a non-temporal memory access that allows infrequently used values to be removed from a cache efficiently
There is also an open-source LLDB-based debugger sponsored by Codeplay Ltd supporting kernels compiled using the open source CLOC compiler and the HSA assembler. All are available from the HSA’s GitHub repository.
“HSA is increasing traction, with HSA compliant systems now in the market, an increasing number of developer tools available, and now the ability to leverage IP blocks from different vendors,” said HSA Foundation president John Glossner.

HSA updated to v1.1 with new features

Posted on May 31, 2016May 31, 2016 by mfrickie

by Charlie Demerjian, SemiAccurate: http://semiaccurate.com/2016/05/31/hsa-updated-v1-1-new-features/
The HSA Foundation is announcing v1.1 of their spec today with some important changes. Since SemiAccurate first brought you the news of HSA years ago, the slow progress forward has been picking up pace quickly.
Just over a year from the March 2015 launch of v1.0 of the HSA spec comes the new revision. Since fully HSA1.0 compatible devices are just hitting the market now, there is a Dell box imminent with all the features enabled on a Carrrizo desktop, how long will we have to wait for v1.1 hardware? That is the beauty of the v1.1 spec, there are no hardware changes required so if you are HSA1.0 compatible you are HSA1.1 compatible.
In light of this it may not seem like v1.1 brings much to the table but that is not the case. Understanding those differences may take a pretty technical mind but most SemiAccurate readers will probably keep up. If you aren’t familiar with the current state of HSA, we won’t rehash it in detail but start here and here and here.
The basics don’t change with v1.1 which is no surprise in light of the hardware compatibility, you still compile your code to the HSAIL intermediate language and that still uses the hQ structure to pass data. Memory is still pinned rather than copied ad infinitum, and efficiency of data use is still a top priority. What the new spec offers are features that most wanted earlier and wider support.
First up is the spreading of HSA to a wider range of hardware. Currently it only supports a CPU and GPU as targets because, well, that’s all the hardware that was out there. Looking at the breadth of HSA Foundation members it is clear that support will add a wider class of devices and v1.1 delivers just that. Image processors, DSPs, NICs, and a whole host of other ISA are now officially supported, and nearly anything with a bunch of parallel processors can be added. Look for more to be added as soon as hardware from supporting vendors is official announced.
Software and drivers are a key point in this type of multi-vendor interaction and HSA1.1 supports that in a few new ways. There are transparent features now and mulit-vendor IP blocks are better integrated too. Also added are several key pieces of information that can be explicitly queried too so if you want to know where a page table resides you can actually get that answer directly. This was a big request in older versions, sometimes you need to find information that a generic abstraction can’t provide.
Speaking of generic abstractions in v1.1 we have a vendor neutral generic driver, something that makes sense for a virtual ISA like the one HSA provides. Think of this driver as the layer from the system to HSA, each hardware vendor still needs to provide a specific driver for their hardware. What this will hopefully do is simplify and standardize the coding process for the user and software writer. If you think about it, abstracting the hardware from the user perspective is a key enabler for multiple classes of hardware, and more direct polling of structures does most of the rest.
Another big gap in the 1.0 spec surrounds pausing of threads. As you might be aware the hQ structure and massively threaded programming does a lot of pausing of threads and HSA is no exception. HSA1.0 has mechanisms for pausing and restarting threads but 1.1 takes it a step further with multi-wait signals. A thread can now wait for more than one signal and apply some simple logic to the unpause process. This may seem basic but it wasn’t there and now is.
Multi-wait signaling can be combined with so called forward progress rules, also new. This does exactly what it sounds like, it guarantees a thread will make progress under certain conditions. Between the two new features you can implement QoS and service guarantees, a necessary step for many media applications especially real-time ones. With HSA1.1 you don’t have to write it all yourself, the primitives are there for you and work across multiple hardware blocks.
Memory has a lot of new features too starting with a formal memory model definition. If you think about it, HSA already has both big and little endian hardware supported and with new additions a formal model will make life a lot saner for programmers. Non-temporal memory accesses, basically cache eviction after use, and multi-agent sharing of images is also now official, the latter being effectively access to pinned memory spaces from multiple blocks.
That brings us to the tools and here we start with the finalizer. It now allows linkage to standard code objects and supports versioning. The first of these is pretty self explanatory, the second tends to be more familiar to the non-consumer space. Versioning effectively allows the code to specify which variant of the finalizer it needs and more importantly get it. This may seem trivial to non-programmers but you just need to look at modern shenanigans by Microsoft and forced march upgrades to realize how valuable flexible versioning is.
Moving farther down the tool chain there is a new profiling API that supports multiple agents. Massively threaded code running across multiple differing hardware blocks can be a tad problematic to optimize and that is where the new API comes in. Depending on the hardware it can work in realtime or just gather data for future examination. There is now direct access to hardware counters and timestamps as well. Gaming is an obvious example of where this will pay off but media work will see huge benefits as well. If nothing else this will save a lot of programmer hair from hitting the floor after being torn out in chunks.
Last up we have added language support with the headliner being Python and the Numba NumPy aware compiler. It is available on GitHub with direct HSA support and does automatic parallization of supported functions. Java, Javascript, OpenCL, and C++ were supported too but the new C++17 standard will probably include a standard template library for HSA as well. All the HSA1.0 tools and toolchains will work with v1.1 as well and most new features will likely be supported in the near future.
Those are the major points of the new HSA1.1 spec. It runs on the same hardware, adds a few languages and profiling tools, and brings a lot of new hardware possibilities into the mix. To support this there is a standard virtual ISA, explicit querying of some structures, and a much more robust signaling and wait model. To the user it should work better on more devices, to the programmer it should be simpler to support and easier to fix when things go wrong. For the hardware vendors it can only increase the available software for their offerings, what’s not to like?

SoC Spec Defines Core Interfaces: HSA version 1.1 defines IP block interfaces

Posted on May 31, 2016May 31, 2016 by mfrickie

By Rick Merritt, EE Times: http://www.eetimes.com/document.asp?doc_id=1329786
SAN JOSE, Calif.—The Heterogeneous System Architecture (HSA) Foundation defined interfaces for third-party IP blocks so engineers can design SoCs compliant with its specs for shared coherent memory. To date, only Advanced Micro Devices, which initiated the group, ships a processor supporting the ad hoc standard for speeding up on-chip processes shared among different cores.
The HSA spec is positioned as a more open alternative to the SoC techniques and programming environments supported by Intel and Nvidia. The interfaces defined in HSA’s version 1.1 released today are already baked into a handful of CPU, GPU, DSP and fabric cores from at least three vendors including Arteris and Imagination Technologies.
“In our version 1.0, SoC makers didn’t need to define how, say, a DSP gets access to a shared memory-page table,” said John Glossner, HSA’s president. “Now what were opaque elements of the standard are specified in a way that gives us multivendor transparency, enabling vendor-neutral device drivers,” he said.
More than 40 companies are part of the group, including ARM and Mediatek which have said they will support the approach. The technology aims to serve a broad set of markets from mobile SoCs and desktops to high-performance computing systems.
“All our new products will support HSA,” said Glossner, who works for Optimum Semiconductor Technologies, a US-based IP licensing company that is part of a larger chip conglomerate in China.
Looking forward, “the next two years are about getting software up and working — this is the first multivendor spec we’ve released,” said Glossner, with a debug spec expected to be the biggest part of the work.
“Debugging multiple hetero cores from a single source is complex, and we want to make sure to get it right, trying to poll and set break points is difficult and needs higher level abstractions – a heterogenous debug tool suite should look like it is uni-threaded,” he said.
HSA already supports a basic form of debugging.
“We support passing high-level debug information to [an abstracted agent], and it is included in the generated code object,” said Glossner. “We are working towards making that support universal across tools just like we did for profiling,” he added.
The 1.1 spec already supports profiling across cores. Tools including open source compilers and a runtime environment for HSA are currently available with test results published by AMD and academics in the group.
The HSA spec is agnostic about programming environments although its main target has been OpenCL. A parallel version of C++ is due in 2017 that should be able to take advantage of the HSA techniques, Glossner said.
The group has not yet decided of future hardware will require additions that might generate a version 2.0 spec.