HSA Foundation China Regional Committee Wraps Up Successful 2nd Annual Symposium

Wide Array of Interfaces, Specs Discussed for Next Gen of Heterogeneous Computing, AI, SDR, and More
BEIJING, CHINA, DEC. 20, 2017 — The China Regional Committee (CRC) of the Heterogeneous System Architecture (HSA) Foundation has successfully concluded its 2nd Symposium in Beijing. The CRC was formed earlier this year; its mandate is to enhance the awareness of heterogeneous computing and promote the adoption of standards such as Heterogeneous System Architecture (HSA) in China.
More than 40 representatives of the CRC members and related companies, research institutes and universities throughout China attended the conference. HSA Foundation President Dr. John Glossner also participated in this important benchmark meeting that exchanged ideas on important topics including interfaces and specifications for the next generation of heterogeneous computing, vector parallel computing model, system security and protection, artificial intelligence, software defined radio, Network-on-Chip (NoC), and programming of commercial HSA chips. The meeting was co-organized by China Electronics Standardization Institute (CESI) and the HSA Foundation’s CRC, and sponsored by Huaxia General Processor Technologies.
Last year the HSA Foundation held its first Global Summit in Beijing. The CRC has actively carried out various work in conjunction with CESI for the development of global heterogeneous computing standards with a China focus.
At the meeting, each CRC working group shared its progress and insights on related key technologies:
Application & System Evaluation Working Group – “The application situation and development trend of artificial intelligence in China and typical rigid demands and key indicators of artificial intelligence” – presented by State Grid;
Virtual ISA Working Group – “Artificial intelligence instruction set design for heterogeneous computing and exploratory research of HSAIL artificial intelligence extended subset” – presented by Dr. Jun Han, Fudan University;
Interconnect Working Group – “Latest research results on network-on-chip in the heterogeneous computing SoCs, and the next step verification and standardization work arrangements” – presented by Dr. Zhiyi Yu, Sun Yat-sen University;
Compilation & Runtime LIB Working Group – “The latest research trends in vector computing models and related programming models, and basic recommendations for facilitating integration into HSA system architectures” – presented by Dr. Lei Wang, Huaxia General Processor Technologies;
System Architecture Working Group – “Using HSA to systematically address the basic views of software-defined communications, software-defined radio, heterogeneous multi-core chip architecture and application development” – presented by Wanting Tian, Sanechips Technology;
Security & Protection Working Group – “Research work and principles on adapting heterogeneous computing for security protection” – presented by Shaowei Chen, Nationz Technologies.
The CRC has been adding members since the first CRC Symposium in May; some of which include Huaqiao University, Hunan University, Jimei University, Tsinghua University, Xiamen University, Xiamen University of Technology and Zhejiang University.
Supporting quotes:
“The HSA Foundation CRC has been laying the groundwork for standardization progress in heterogeneous computing standards in China for almost a year. It is focused on supporting the needs of HSA Foundation members in China and helping to fulfill the mission of the Foundation, which is to make heterogeneous programming universally easier.”
Dr. John Glossner, HSA Foundation President
“Since its formation, the CRC has received the support and attention of many academic institutions, companies, and government authorities in China. The work product and coverage of the CRC has been expanding and developing rapidly, making it one of China’s first “innovative brands” for standardization of heterogeneous computing. In 2018 the CRC and HSAF will work towards adoption of the v1.2 specifications and extensions enabling the transformation of HSA chips and platform products in many applications.”
Dr. Xiaodong Zhang, HSA Foundation CRC Chair
“The main research direction of our team is Software Defined Radio. Due to the flexibility of SDR, it allows for implementation across a wide range of applications. The earliest SDR platforms were based on FPGAs and DSPs with large size and high-power consumption making generalized SDR systems problematic. However, the HSA platform provides new possibilities for SDR research. HSA has many advantages such as low power consumption, low cost, and high integration. Those are hard to find in traditional SDR platforms.”
Dr. Ming Zhao, Professor, Tsinghua University
“Micro-Processor Research and development Center (MPRC) of Peking University is the pioneer of innovating indigenous microprocessor (CPU) and computer systems in China. To minimize the digital gap between developed and developing countries, MPRC is committed to the development of computers with independently developed CPUs and heterogeneous SoCs. The advantage of a heterogeneous architecture is the ability to be adaptable. During the evolution from desktop computing to mobile computing to Big Data, systems that adapt are the ones that are most successful. MPRC will work together with other members in HSA Foundation to improve life with heterogeneous technology.”
Dr. Junlin Lu, Deputy director of MPRC, Peking University
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook, LinkedIn and Instagram.

Developing Heterogeneous Cache Coherent SoCs – and More! Q&A with Arterisip's J.P. Loison, Corporate SoC Application Architect

Computing Now, HSA Connections: https://www.computer.org/portal/web/hsa-connections/content?g=54930593&type=article&urlTitle=developing-heterogeneous-cache-coherent-socs-and-more-

Editor’s Note: 
ArterisIP provides system-on-chip (SoC) interconnect IP to accelerate SoC semiconductor assembly for a wide range of applications from automobiles to mobile phones, IoT, cameras, SSD controllers, and servers for customers such as Samsung, Huawei / HiSilicon, Mobileye (Intel), Altera (Intel), and Texas Instruments. The company is located in Campbell, CA.
Describe in detail the various design challenges faced today in developing Heterogeneous Cache Coherent SoCs.
The first thing you need to do is understand customer requirements. This includes asking the right questions, some of which may include:
  • Understand how many IPs need to connect to the heterogeneous system;
  • What kind of bandwidth does the IP require;
  • What kind of IP and what kind of features can you enable with interconnect IP.
The next step is to define heterogeneity because many people are using the heterogeneous word, but there are different meanings behind the word.  Some key tasks and guidelines:
  • You must have different types of processors within the same family;
  • Then you have to accommodate different types of processors that are available on the market.
  • Different processor types also have a different cache structures.
    • An ARM CPU would use the same cache structure as another ARM core all over the processor.
  • A different CPU poses a different cache structure.
  • Accommodate different types of IPs as well:
    • CPU, GPU, and DSPs:
    • Then there are all other types the IPs that you combine into an SoC like connectivity IP, USB, SATA, etc.
It’s also important to be able to accommodate different (cache) protocol systems in terms of coherent and non-coherent protocol. Some examples:
  • Flexible snoop filter capability accommodates different cache structures of different kinds of processors.
    • Snoop filter capabilities operate in two different directions to accommodate any cache structure of any processor that is available today.
    • Another challenge: Reduce the number of memory bits that you need to perform snoop filtering.
How do you integrate IP that is not-cache coherent and achieve better performance? Provide a brief example or two?
You need to understand what the customer requirements are in terms of the mix of non-coherency and coherency requirements.  Are they separated, a full merger of both domains or a customized mix? Arteris, for instance, developed a component called a non-coherent bridge.  Its purpose is to drive non-coherent accesses back into the coherent domain.  It also enables a differentiator between the non-coherent and coherent domains.
How to you create a cache-coherent system that is easily placed on a chip?
A few years ago, coherency systems were small and compact – a max of three to four different processors. Coherency was confined to CPU clusters, functionality was grouped under an application and all subsystems were connected to an application.
But coherency wasn’t necessarily distributed beyond a subsystem. Customer needs are changing, there is a need for greater processor performance and companies are adding more and different types of processors. In addition:
  • SoC layouts are expanding tremendously;
  • Size of processors growing larger;
  • Complex layouts affect coherency domain;
  • Coherent domain is expanding all over the chip.
So how do you handle it?  First, you must make sure the infrastructure is designed to distribute coherency system-wide.  It has to be an interconnect technology that enables network packet transport and it also must accommodate a variety of topologies such as ring and mesh. The infrastructure must also be configurable and flexible because as design complexity continues to grow, designers must be able to understand which topologies are best suited for a particular chip layout. Having the proper tools that can predict where complexities might cause performance and power issues in the chip layout stage is critical to revising the layout and providing the best solution in terms of which topology might resolve these issues.
How can you optimize power consumption of complex systems? 
You first need to provide power-ready IP; once this accomplished, then you need to implement some well-known techniques – these may include voltage domain, power domain, clock gating and high-level clock gating.
If power-ready it will also have connectivity to a power interface and can be controlled by an MPU in the system that will decide when to shut down the IP when not in use or not needed by the system.  At the application level, this power-aware controller (MPU) can lower system power consumption by putting an IP on idle.
How long will it take to reasonably surmount some/all of the aforementioned issues?
Heterogeneous SoCs are still in development and haven’t yet matured. But processors in coherent domain now sharing data with each other. Other CPUs and GPUs have become cache coherent although I’m confident we can do a lot more.
With data sharing, this is not only between processor and GPU, but between all of the IPs of the system – it’s a concept that is in progress. This IP must be pushed a little bit farther to achieve total coherency. Today there are still not too many non-coherent IPs sharing data with coherent IPs. But we’re now starting to see applications now emerging that need coherency and this will bring new requirements.
Are these design challenges currently hindering product development in select verticals?  If so, which ones?
Yes, one that comes to mind is ADAS (Advanced Driver-Assistance Systems for automotive. Automotive applications will have a lot of requirements because of the need to add performance and share data with heterogeneous processors to achieve those requirements. We’ll see the introduction of new features to this market. Other markets will include artificial intelligence and machine learning.
A decade ago, mobile application processors were driving the need to cache coherency and then data center systems started becoming the primary driver.  Now the automotive market is driving the need to extend cache coherency to all of the heterogeneous processing elements in SoCs.  In two or three years, a new trend will emerge to extend heterogeneous cache coherency even further – but designers will need flexibility, configurability and scalability to ensure that these systems are high-performance, low-in-latency and reasonable in terms of power consumption and cost.

Everything You Need to Know About Why AMD Open Sourced the OpenCL Driver Stack for ROCm

Computing Now, HSA Connections: https://www.computer.org/portal/web/hsa-connections/content?g=54930593&type=article&urlTitle=hsa-connectio-1
Introduction: AMD is a co-founder and member of the HSA Foundation.  This article is excerpted and edited from a blog post by Vincent Hindriksen, founder of Stream HPC, a Netherlands-based software development company.
Last May, AMD open sourced the OpenCL driver stack for ROCm. With this they kept their promise to open source (almost) everything. Earlier the hcc compiler, kernel-driver and several other parts were open sourced.
Why this is a big thing?
There are indeed several open source OpenCL implementations, but with one big difference: they’re secondary to the official compiler/driver. So, implementations like PortableCL and Intel Beignet play catch-up. AMD’s open source implementations are primary.
They contain:
  • OpenCL 1.2 compatible language runtime and compiler
  • OpenCL 2.0 compatible kernel language support with OpenCL 1.2 compatible runtime
  • Support for offline compilation right now – in-process/in-memory JIT compilation is to be added.
Performance of ROCm was mostly on par with AMD’s closed source drivers, with a few outliers. A few months ago ROCm 1.6 was released, where again performance was noticeably improved. For the next release performance improvements are expected again.
Why was it open sourced?
There were several reasons. AMD listened carefully to their customers in HPC, while taking note of where the industry was going.
Get deeper understanding of how functions are implemented
It’s useful to understand how functions are implemented. For instance the difference between sin() and native_sin() can tell you a lot more on what’s best to be used. It doesn’t tell how the functions are implemented on the GPU, but does tell which GPU-functions are called.
Learning a new platform has never been so easy. Deep understanding is needed if you want to go beyond “it works”.
Debug software deeper
Any software engineer has experience with libraries that don’t perform as promised or work as documented. Integration issues with “black box” libraries, are therefore a typical reason for big project delays. If the library was open source, the debugger could step in and give all information needed to solve the problem quickly.
When working with drivers it’s about the same. GPU drivers and compilers are extremely complex and inevitably your project hits that one bug nobody encountered before. With all open source drivers, you can step into the driver with the same debugger. Moreover, the driver can be recompiled with fixed code instead of having to write a less secure work-around.
Get bugs solved quicker
A trace now includes the driver-stack and the line-numbers. Even a suggestion for a fix can be given. This also helps reduce the time to get the fix for all steps. When a fix is suggested AMD only needs to test for regression to accept it. This makes the work for tools like CLsmith a lot easier.
A bonus of open source projects is that over time the code quality becomes better than projects where code is never seen by outsiders, which also adds to quicker solving of bugs.
Get low-priority improvements in the driver
Popular software like Blender and the LuxMark benchmark can expect to get attention from driver developers. For the rest of us, we have to hope our special code-constructions are comparable to one that is targeted. This results in many forums-comments and bug-reports being written, for which the compiler team doesn’t have enough time. This is frustrating for both sides.
Now everyone can help build a driver for everyone.
Get support for complete new things
Proprietary code needs official access and legal documents that have all kinds of restrictions, which open source code does not.
More often there is opportunity in what is not there yet, and research needs to be done to break the chicken-egg conundrum. Optimized 128-bit computing? Easy complex numbers in OpenCL? Native support for Halide as an alternative to OpenCL? All up-to-date driver-code is available to make these possible.
Nurture other projects
Code can be “borrowed” from AMD’s projects and be used in (un)expected places. This ranges from GPU-simulators to experimental compilers.
Currently the forks of the ROCm-driver are mostly used to fix bugs or are thousands of commits behind. Who knows what the future brings.
Get better support in more Linux distributions
It’s easier to include open source drivers in Linux distributions. These OpenCL drivers do need a binary firmware (which were disassembled and seem to do as advertised). There is a discussion if firmware can be seen as hardware and can be marked as “libre”, but fact is that AMD’s contributions to the Linux 4.x kernel do get accepted.
Improve and increase university collaborations
If the software was protected, it was only possible under strict contracts to work on AMD’s compiler infrastructure. In the end it was easier to focus on the open source backends of LLVM than to go through the legal path.
Universities are very important to find unexpected opportunities, integrate the latest research in, bring potential new employees and do research collaborations. Timour Paltashev (senior manager, Radeon Technology Group, GPU architecture and global academic connections) can be reached via timour dot paltashev at amd dot com for more info.
Final words
It probably makes total sense to open source the drivers. Most notably key advantages include reduced costs and increased control due to easier debugging and bug-solving.
AMD is now a modern hardware company that understands software is a crucial part of their products. They believe that open source software gives an edge over the competition and made this bold move to let everybody peek in their kitchen.

HSA Q&A with Dr. John Glossner

Computing Now: https://www.computer.org/web/computingnow/insights/content?g=53319&type=article&urlTitle=hsa-connections
HSA computing standards have progressed significantly since the HSA Foundation (HSAF) was established in 2012. Today, for instance, there are not only royalty free open specifications available but also fully operational production systems.

Representatives from newly joined HSA Foundation members in China

Pictured: Representatives from newly joined HSA Foundation members in China
In this Q&A, Dr. John Glossner, HSA Foundation president, provides additional insights on HSA-specific trends and issues:
What are the connections/differences between heterogeneous computing, general purpose computing and specialized computing? If heterogeneous computing is the future, what will happen to general purpose computing and specialized computing?
General purpose computing is what you find in a CPU. It is meant to be able to process any function but streaming data, like artificial intelligence (AI), might not always be efficiently processed on a CPU.
Specialized computing would be a design made for one particular application such as AI but it would not be intended to run general purpose code (sometimes called control code). The specialized accelerator typically has the advantage that it is much lower power to execute the special purpose application (e.g., AI).
Heterogeneous computing combines the best of both. It specifies how a CPU can talk to an accelerator and often finds both integrated onto the same silicon die. So heterogeneous processors, meaning different types – such as CPUs, GPUs, DSPs, specialized accelerators and others, are all integrated together and cooperate to achieve an ideal balance of performance and power consumption for a given application.
What is the ultimate goal for the HSAF? How and what need to be done to achieve this?
The goal of the HSA Foundation is to make heterogeneous programming easier. That means creating standards that allow different types of processors to be programmed in the same language, using one single source file, and then automatically distributing parts of the application to the best processor to do the computing.
If research institutions and companies participate in establishing and promoting the standards of heterogeneous computing, will it affect their current development and solutions?
With open specifications and open source implementations of standards and tools, the Foundation’s hope is that it accelerates the pace of development and adoption of the technology. Corporations participating in HSAF enjoy royalty free access to all technologies developed.
The Foundation announced the formation of the China Regional Committee (CRC) in May. What were the motivations and goals in establishing the CRC and what is the connection/differences between CRC standards and HSA standards?
While the HSA Foundation has made a lot of progress there are always regional considerations and research opportunities to improve current systems. Recently China has become a leader in AI and other semiconductor technologies. With the emergence of low latency applications such as AI and virtual reality (VR) the Foundation anticipates improvements to current specifications. As this is an area of research and development being led by China, it is natural to invite key scientists and companies from China to adopt and adapt technologies and specifications.
How many local organizations have joined the CRC? What are members’ perspectives?
More than 30 members have joined the CRC to date. They comprise semiconductor companies, research universities and institutes (e.g., Chinese Academy of Sciences), tools and algorithms designers, test verification, and China standardization groups.
What effects will in-depth research and development of heterogeneous computing standards and technologies have on promoting China’s semiconductor industry advances?
China has become a global leader in semiconductor development and algorithms such as AI that execute on semiconductor chips. Heterogeneous systems that are now emerging are expected to accelerate R&D throughout the global industry. The formation of the CRC and future global adoption of the work done by the CRC should advance China’s semiconductor industry as well as contribute to worldwide growth.
What are the implications of developing and promoting heterogeneous computing standards for the creation of China’s heterogeneous computing industry chain and ecosystem?
While the algorithms that the CRC is evaluating are of immediate concern within China, it is expected that the entire global community and ecosystem will benefit from the standardization work being performed by the CRC.
Do heterogeneous computing chips have a wide range of AI applications? What are the specific advantages?
Heterogeneous chips have the potential to dramatically reduce the electric power to perform AI applications. When programs are optimized for specialized heterogeneous systems, each processor in the system can execute code that is most power efficient for its own function. This provides higher performance at lower power than non-heterogeneous systems.
What should China do to rapidly cultivate the heterogeneous computing industry?
By participating in the HSAF CRC, China can adapt and adopt technologies related to heterogeneous systems for China-specific issues. However, it is anticipated that these enhancements will be integrated into global HSAF specifications because the problems are common to many semiconductor companies.

Parallel pleasure: deep-geek chip consortium opens test tool

By Adrian Bridgwater, ComputerWeekly UK: http://www.computerweekly.com/blog/Open-Source-Insider/Parallel-pleasure-deep-geek-chip-consortium-opens-test-tool

The HSA Foundation has made available to developers the HSA PRM (Programmer’s Reference Manual) conformance test suite as open source software.

HSA who?

Yes, sorry… the HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive.

The test suite is used to validate Heterogeneous System Architecture (HSA) implementations for both the HSA PRM Specification and HSA PSA (Platform System Architecture) specification.

But what is HSA?

HSA is a standardised platform design designed to unlock the performance and power efficiency of the parallel computing engines found in most modern electronic devices.

It allows developers to apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).

“The HSA Foundation has always been a strong proponent of open source development tools directly and through its member companies,” said HSA Foundation chairman Greg Stoner. “Open sourcing worldwide the PRM conformance test suite is yet another example of an expanding array of development tools freely available supporting HSA.”

The HSA Foundation through its member companies and universities has also released many additional projects which are all available on the Foundation’s GitHub site.

Parallel pleasure: deep-geek chip consortium opens test tool

By Adrian Bridgwater, TechTarget USA: http://itknowledgeexchange.techtarget.com/open-source-insider/parallel-pleasure-deep-geek-chip-consortium-opens-test-tool/
The HSA Foundation has made available to developers the HSA PRM (Programmer’s Reference Manual) conformance test suite as open source software.
HSA who?
Yes, sorry… the HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive.
Parallel pleasure
The test suite is used to validate Heterogeneous System Architecture (HSA) implementations for both the HSA PRM Specification and HSA PSA (Platform System Architecture) specification.
But what is HSA?
HSA is a standardised platform design designed to unlock the performance and power efficiency of the parallel computing engines found in most modern electronic devices.
It allows developers to apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
“The HSA Foundation has always been a strong proponent of open source development tools directly and through its member companies,” said HSA Foundation chairman Greg Stoner. “Open sourcing worldwide the PRM conformance test suite is yet another example of an expanding array of development tools freely available supporting HSA.”
The HSA Foundation through its member companies and universities has also released many additional projects which are all available on the Foundation’s GitHub site.

Mixed Reality: Computer Vision Killer App Will Change How We Communicate, Collaborate

By Jeff Bier, Founder, Embedded Vision Alliance. Computing Now: https://www.computer.org/web/hsa-connections/content?g=54930593&type=article&urlTitle=mixed-reality-computer-vision-killer-app-will-change-how-we-communicate-collaborate
At this year’s Consumer Electronics Show, I walked many miles and saw countless demos. Several of these demos were memorable, but one in particular really got my mental gears turning: Microsoft’s HoloLens.
HoloLens will spur many “aha” moments, leading to accelerated innovation in wearable computer vision devices, low-power 3D computer vision, and mixed reality.
HoloLens, of course, is Microsoft’s “mixed reality” glasses product, which has been shipping in pre-production form for about a year. Previously, I would have used the term “augmented reality” to refer to HoloLens, which overlays computer-generated graphics on the user’s view of the physical world. But here I’m adopting Microsoft’s preferred term, “mixed reality,” which many people now use to describe systems in which “people, places, and objects from your physical and virtual worlds merge together.”
Over the past five years, I’ve seen many demos of virtual reality, augmented reality and mixed reality. Most of these showed promise—but the promise usually felt distant, because the demos weren’t sufficiently polished to feel “real,” and weren’t easy to use.
That was then, this is now: HoloLens has nailed both the “feels real” and ease-of-use aspects. Wearing HoloLens, I played a shoot-em-up video game against an army of robots, illustrated in this video. The experience was stunning, thanks to three key capabilities. First, HoloLens is a wearable, battery-powered device so I was able to move about the room to dodge hostile robots. Second, HoloLens accurately mapped the room I was in, enabling the robotic invaders to create what looked like real cracks in the actual walls of the room. And third, as I turned my head and shifted my position within the room, HoloLens adapted to these movements seamlessly so that the illusion of merged physical and virtual worlds was maintained.
Now that I’ve experienced robust mixed reality, I foresee many compelling applications for this technology beyond gaming: Enabling physicians to see inside a body to enable safer, more accurate treatment. Giving utility workers a clear view of underground pipes and cables. Providing consumers with a realistic preview of how a room will look after redecorating it. Allowing museum visitors to see a skeleton transform into a fully formed, animated dinosaur (the fact that HoloLens sells for $3,000 suggests that, for a while at least, this technology is more likely to be adopted by hospitals, utility companies and museums than by individual consumers).
Of course, a convincing mixed reality (“MR”) experience—one in which the virtual and physical worlds interact in a realistic way—requires the MR device to maintain an accurate understanding of the surrounding physical world—and the user’s position within it—in three dimensions with very low latency. That is, it requires fast, highly accurate 3D computer vision.
Mixed reality doesn’t necessarily require a wearable device. Vehicle applications, for example, can use the windshield as a projection screen. And 8tree’s clever handheld device for quantifying surface damage projects information onto the surface being inspected. But in many cases, glasses are the most compelling way to deliver mixed reality. This is because they leave your hands free, because they know where you are looking, and because they have the ability to project information into your field of view wherever you’re looking. Packing all of the technology required for a convincing MR experience into a wearable device is a daunting challenge, however. With HoloLens, Microsoft has given us a hint of what’s possible. The HoloLens team has clearly put enormous effort into everything from custom chips to industrial design to create a device that’s reasonably comfortable to wear (though still bulky).
One of the key challenges for developers of products like HoloLens is harnessing the capabilities of heterogeneous compute resources—CPUs, GPUs, DSPs, FPGAs, and fixed-function accelerators—to deliver high performance with low cost and low energy consumption. HSA provides an approach that enables developers to easily and efficiently apply compute resources to demanding applications in today’s complex SoCs.
Learn more about heterogeneous computing for efficient computer vision at the upcoming Embedded Vision Summit. Marc Pollefeys, Director of Science for HoloLens and a pioneer in 3D computer vision, will be one of the keynote speakers.

The HSA Foundation expands its Academic Partnership Program

HSAFJohnGlossner-ea39abe42b1ba6b583663d54964c7d8f-e1487196664909
Entrepreneur Podcast Network: http://epodcastnetwork.com/the-hsa-foundation-expands-its-academic-partnership-program/
Dr. John Glossner, President of HSA or The Heterogeneous System Architecture a non-profit whose goal is making programming for parallel computing easy and pervasive again joins Enterprise Radio to discuss more about the foundation, the overall benefit and the new partnership.
Listen to host Eric Dye & guest Dr. John Glossner discuss the following:

  • Dr. Glossner, we last talked in early November. For the benefit of our listeners, can you please provide a brief synopsis again on what the HSA Foundation is.
  • In November, we also talked about what the Foundation calls Academic Centers of Excellence. Please elaborate again on what these are, and how does a higher educational institution become one?
  • You mentioned then that Northeastern University in Boston was the first of these; in early December, two leading German universities also became Academic Centers of Excellence. Tell us about each and elaborate on some of the innovative HSA projects they’re working on.
  • AMD, a founding member of the Foundation, recently provided a tutorial at an international conference on code generation and optimization. The title was ‘Updates in Heterogeneous Compute.’ Please share what you see as recent heterogeneous compute updates and developments.
  • It appears that heterogeneous compute will be applicable for an array of apps. This can be everything from vision based IoT systems to mobile devices; desktops, high-performance computing (HPC) systems, AR/VR environments, and servers. So how will heterogeneous compute improve performance and power efficiency?
  • How does HSA make life easier for IP and system designers?

John Glossner, Ph.D. is the President of The Heterogeneous System Architecture (HSA) Foundation and is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive.
HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption.
HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Glossner currently serves as CEO of General Processor Technologies.
hsaflogo2015
Website: www.hsafoundation.com
Social Media Links:
Facebook: facebook.com/thehsafoundation
Twitter: @hsafoundation

You’ll likely find the HSA software and toolchains quite useful and timeless

by Paul Blinzer, Embedded Computing Design: http://embedded-computing.com/guest-blogs/youll-likely-find-the-hsa-software-and-toolchains-quite-useful-and-timeless/#
Many people talk about hardware architecture as if it’s the most important part of a new platform. It’s true that hardware architecture is important for performance, which was discussed at length in a previous blog post. As a refresher, the pillars of the Heterogeneous System Architecture (HSA) are unified and shared virtual memory user-mode dispatch, platform atomics, architected signals, strict memory model, quality of service, and cache coherency.
However, including these features into the platform architecture is not for their own sake; it allows to be written easily and to run efficiently. Even more so, it enables existing software to be ported easily and ideally automatically onto the new architecture.
While hardware typically has a limited lifespan of a few years at most, software may live almost forever. Sure, almost no one uses actual VT100 text terminals to communicate with the computer and the programs running back then, yet a lot of the software used today uses libraries and application frameworks that have their origin as far back as the 1970s. That software set the foundation of high-performance computing, the Internet, and security protocols used today, usually behind a shiny user interface. Even the good old VT100 terminal still lives on in the command lines of many popular operating systems (OSs) where the control sequences still behave as they did 40 years ago.
This is one reason why some platform architectures have endured over decades. While the and implementation may have changed substantially internally, the software-visible (ISA) has endured and got incrementally extended without breaking backward compatibility to run the old programs, while other, more modern architectures were popular for a time but ultimately withered away as their performance advantage diminished. Software-compatible platforms came close enough to their levels to make binary software compatibility the overwhelming factor. Good examples are the x86 ISA, the ARM instruction architecture, or IBM’s System/360 ISA, the latter celebrating its 53rd anniversary and still in use.
How do you ensure the long-term viability of a platform architecture? You ensure that software written for the traditional architectures can run well and faster on it but also keep the software development tool chain like compilers, linkers, and development process familiar, so that the programmer doesn’t have to deal with two or more different software toolchains to get to performant software running on the platform.

Today’s extensive use of open-source software is an important factor, especially the GNU and LLVM-based compiler toolchains, readily available in open source repositories, and OSs like , which are used as a foundation in embedded systems in various forms, sometimes “hidden away” (like in the case of ). However, applications need to start and run without much delay, so it’s important that the compilation and time-expensive compiler code optimization to the accelerator doesn’t happen at the application’s load time (as often happens with many current accelerator APIs).
Most code optimization should happen once, when producing the application binary and then readily loaded and mapped to the accelerator. This needs a portable, accelerator-neutral ISA with fast transcription to the target accelerator ISA, instead of full compilation. Hence, it’s important to define a vendor-neutral ISA, which in the case of HSA is called HSA Intermediate Language (IL) or HSAIL. This IL represents a common ISA to target by compilers and is designed to be close to a data-parallel accelerator like a GPU, or other hardware.
The source code written in a common high-level language like C++ or Python, be it an application framework or a popular application, will then produce code that’s defined in the IL. The compiler can apply all the extensive optimization steps to generate the intermediate code, which can then can be linked with other libraries, and even with modules written in different languages, such as C++, for some functions.
By integrating the IL as a binary section in the application binary (which is defined in an object format called BRIG), the program loader can then load both the host ISA and the accelerator code blocks in parallel and allow each to execute the program as written by the programmer without the end user seeing a difference from regular program load. Using the HSA run-time functionality, the software engineer can either target the HSA run-time directly or use an application interface or framework sitting on top of it, such as OpenCL.

But that’s not all. AMD has developed an open-source HSA run-time called Radeon Open Compute (ROCm) and added a portability layer called Heterogeneous Interface for Portability (HIP) that allows source code using proprietary CUDA APIs to compile and run on top of the ROCm run-time, while keeping source code compatibility. Alongside CodeXL, an open-source tool for profiling and debugging data parallel applications, this a powerful toolset to automatically port and run large application frameworks. While not using all ROCm features, it’s an easy way to take advantage of AMD’s HSA implementation without refactoring legacy code.
More information can be found in half-day HSA-focused tutorial at the HPCA/CGO conference in a couple of weeks.