HSA Foundation China Regional Committee & China Standard Group of Heterogeneous System Technical Symposium to Be Held in Hunan

YUEYANG, CHINA, May 22, 2018 — The Heterogeneous System Architecture (HSA) Foundation China Regional Committee (CRC) and China Standard Group of Heterogeneous System Technical Symposium – Hunan Session, will be held on May 29 in Yueyang, Hunan.
The Symposium is sponsored by the China Electronic Standardization Institute (CESI), an HSA Foundation promoter member, and the HSA Foundation China Regional Committee (CRC). China Standard Group of Heterogeneous System and Hunan Institute of Technology are serving as co-organizers.
The Symposium will focus on the latest developments in China’s heterogeneous computing standards research and target on the discussion of heterogeneous computing in artificial intelligence, software-defined communications and other applications.
““The HSA Foundation CRC has led the way in nationalizing heterogeneous computing standards in China and continues to advance the mission of the Foundation, which is to make heterogeneous programming universally easier,” said Dr. John Glossner, the Foundation’s president
 
AGENDA

  1. General Info
    1. Date/Time: May 29, 2018 | 9:00-18:00
    2. Venue: Hunan Institute of Technology, Yueyang, Hunan
    3. Participants: HSA Foundation Members, HSA Foundation China Regional Committee (CRC) Members, Related Universities, Research Institutes and Companies
  2. Topics
                a. Morning Session: 9:00-12:00

i. Interpreting the Industry Standard Establishment Process by the Ministry of Industry and Information Technology – Fang Liu, CESI (10
ii. Interpreting the Draft Standard Recommendations by the CRC Working Groups (60 minutes)

– Application & System Evaluation Working Group
– Virtual ISA Working Group
– System Architecture Working Group
– Compilation & Runtime LIB Working Group
– OS & Multivendor Working Group
– Interconnect Working Group
– Security & Protection Working Group
– Conformance Test Working Group

iii. System Requirements and Standardization Concepts for Software-Defined Satellite Chip Platforms (30 minutes)
iv. 2018 Working Objectives and Execution Plan for the CRC – Dr. Xiaodong Zhang, Chair of the CRC (30 minutes)
v. Recommendations for Implementation of Variable-Length Vector Parallel Computing in Heterogeneous Computing Standards – Dr. Lei Wang, Huaxia (Beijing) General Processor Technologies (30 minute)

                     b. Afternoon Session: 14:00-18:00

i. Recommendations for Heterogeneous Computing Instruction Architecture for Convolutional Neural Network Parallel Computing – Dr. Jun Han, State Key Laboratory of ASIC, Fudan University (30 minutes)
ii. Recommendations for Software-Defined Radio Instruction Architecture based on Heterogeneous Computing Standards – Dr. Chaoyang Tian, ​​ Jiangsu Research Center of Software Digital Radio (30 minutes)
iii. Several Enhancements to Security and Protection Strategies for Heterogeneous Computing Systems – Shaowei Chen, Senior Expert, Nationz Technologies (30 minutes)
iv. Recommendations for New Heterogeneous Computing Network Interconnection Specification Compatible with International Mainstream Standards – Dr. Zhiyi Yu, Sun Yat-sen University (30 minutes)
v. HSA Lecture Series (120 minutes)
1.
HSA Basic Knowledge Series
2. HSA Basic Knowledge Series – the Use of HSA Open Source Tools and Programming Practices
3. HSA Basic Knowledge Series – the Status of HSA Artificial Intelligence Open Source Library

 
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook, LinkedIn and Instagram.

Proof-of-Concept C++17 Parallel STL Offloading for GCC/libstdc++

Computing Now, HSA Connections: https://www.computer.org/portal/web/hsa-connections/content?g=54930593&type=article&urlTitle=proof-of-concept-c-17-parallel-stl-offloading-for-gcc-libstdc-
Introduction:
Parmance and General Processor Technologies have been collaborating on C++17 Parallel STL offloading support based on HSA (Heterogeneous System Architecture) and GCC (GNU Compiler Collection). A working proof-of-concept has been now released and made available in https://github.com/parmance/par_offload. This post is a high level overview of the project.
Heterogeneous Offloading and C++17
The C++17 standard released in December 2017 adds execution policies in its standard template library (STL) algorithm definition. Execution policies enable the programmer to declare that the algorithm library call, along with any user-defined functionality the call uses, is safe to execute in parallel. The user-defined functionality is referred to as “element access functions” (EAF) by the standard.
The PSTL (Parallel Standard Template Library) of C++17 focuses on forward progress guarantees and their implications to parallelization safety on homogeneous processors.
However, there is no “parallel heterogeneous offloading execution policy” yet in the C++ standard; there seems to be an implicit assumption that the parallel execution will occur in the same processor where it was invoked. To make the offloading decisions explicit, for our offloading implementation, we defined a new execution policy type ‘parallel_offload_policy’ (par_offload) which the programmer can use to declare “heterogeneous offload” or “multiple-ISA” safety for the involved user-defined functions.
A call to the ‘transform’ PSTL function with this policy looks like the following:
std::transform(std::execution::experimental::par_offload,
pixel_data.begin(), pixel_data.end(),
pixel_data.begin(),
[](char c) -> char {
return c * 16;
});
In this case, a lambda function was used to iterate over all the elements in the pixel_data of std::vector type with the processing offloaded to a heterogeneous device, if one is available.
Shared Virtual Memory
Explicit data management is problematic in the case of offloading general purpose C/C++ programs that assume a unified address space and allow passing pointers to functions without attached size information. Indeed, a single unified coherent address space across all the processors in a heterogeneous platform would remove a major obstacle in heterogeneous platforms and make programming such devices much simpler.
Heterogeneous System Architecture (HSA) (1.0 published in March 2015) is a language neutral standard targeting heterogeneous systems. It defines a cache-coherent shared global virtual memory as a core feature. That is, an HSAF heterogeneous platform supports data sharing across devices (called agents) as easily as in “homogeneous” C/C++ multithreaded programming.
In the GCC PSTL offloading work we used the HSA Runtime as a heterogeneous platform middleware and rely on the coherent system memory capabilities of the HSA Full Profile. HSA is interesting for this use case most importantly due to its shared heterogeneous memory requirement that is expected to work seamlessly with C/C++ memory model. Also there is a wide selection of open source components implementing the different parts of the specs available. For example, its intermediate language HSAIL has both front end and backend support already in upstream GCC. There are also implementations of its runtime API to enable development and testing via offloading to CPU based targets.
Implementation Status and Future Plans
We now have a proof-of-concept offloading implementation of several PSTL algorithms running with multiple ways to define the user-specified functionality working. The implementation supports lambda functors (with and without captures), C functions, std::functions containing C functions, function objects, and user defined data types.
Next we plan to properly integrate the prototype to libstdc++ and GCC, implement the rest of the algorithms and finally optimize the performance.
Links and references
The code in Github https://github.com/parmance/par_offload
ISO/IEC 14882:2017 Programming languages — C++ Publication date 2017-12 https://www.iso.org/standard/68564.html
Heterogeneous Systems Architecture Foundation http://www.hsafoundation.com

Developing Heterogeneous Cache Coherent SoCs

Chip Design: http://eecatalog.com/chipdesign/2017/12/19/developing-heterogeneous-cache-coherent-socs/
Automotive and other customer needs are not what they once were.
As with other challenges, the task of successfully developing heterogeneous cache coherent SoCs demands an understanding of your customers’ requirements. Conducting interviews with your customers aids this understanding. For example, through the interview you should learn:

  • How many IPs are needed to connect to the heterogeneous system;
  • What kind of bandwidth each IP requires;
  • The types of IPs that are in the system;
  • What kind of features you would enable in the interconnect IP.

The next step is to define “heterogeneity” because, while many people use the word “heterogeneous,” it has a number of meanings. Some guidelines:

  • You must have different types of processors within the same system;
  • Different processor types also have different cache structures. For example, an Arm CPU would use the same cache structure as another Arm core, but a different CPU may pose a different cache structure
  • Different types of IPs must also be considered:
    • CPUs, GPUs, and DSPs
    • IPs that make up an SoC, such as those for connectivity, USB, SATA, etc.

A highly flexible snoop filter architecture accommodates different cache structures of different kinds of processors. It also reduces the number of memory bits required to perform snoop filtering.
Adapt to Changing Customer Needs
Understanding what the customer requirements are for non-coherency and coherency is a must. Are the coherent and non-coherent domains separated, a full merger, or a customized mix? ArterisIP, for instance, has developed a component called a non-coherent bridge. Its purpose is to drive non-coherent accesses into the coherent domain.
A few years ago, coherency systems were small and compact with a maximum of three to four different processors. Coherency was confined to CPU clusters, and functionality was grouped under an application. Coherency wasn’t necessarily distributed beyond a subsystem.
However, customer needs are changing, and today there is a need for greater processor performance. Companies are adding more and different types of processors. In addition:

  • SoC layouts are expanding tremendously;
  • Processors are growing larger;
  • Complex layouts are affecting the coherency domain;
  • Coherent domain is expanding all over the chip.

So how do you handle all these?  First, you must make sure the infrastructure is designed to distribute coherency system-wide. The interconnect technology must enable network packet transport and accommodate a variety of topologies, such as ring and mesh. The infrastructure must also be configurable and flexible because as design complexity continues to grow, designers need to understand which topologies best suit a particular chip layout. Having the proper tools to predict where complexities might cause performance and power issues in the chip layout stage is critical to adapting to the layout and discovering which topology best resolves these issues.
Optimizing Power Consumption of Complex Systems
To optimize for power, first, you need to provide a power-ready IP. Once this is accomplished, you need to implement some tried and true techniques—these may include voltage domain, power domain, clock gating, and high-level clock gating.
When an IP is power-ready, it will have connectivity to a power interface and can be controlled by a PMU (Power Management Unit) in the system. The PMU will decide when to shut down the IP – i.e. when it is not in use or not needed by the system. At the application level, this power-aware controller (PMU) can lower system power consumption by putting an IP on idle.
Maturing to Meet Challenges
Heterogeneous SoCs are still in development and haven’t yet matured. But processors in coherent domain are now sharing data with each other. Other CPUs and GPUs have become cache coherent, although I’m confident we can do a lot more.
Moreover, data sharing is not only between the processor and the GPU, but among all the IPs of the system—a concept that is still work in progress. This idea must be pushed a little bit farther to achieve total coherency. Today not many non-coherent IPs share data with coherent IPs. But applications are emerging that need coherency, and this will bring new requirements.
Some of these design challenges are hindering product development, for example, for Advanced Driver-Assistance Systems (ADAS) for automotive. Automotive applications have performance requirements and the need to share data with heterogeneous processors to achieve those requirements. We’ll see the introduction of new features to this market. Other markets will include artificial intelligence and machine learning.
A decade ago, mobile application processors were driving the need to cache coherency. Next, data center systems took over as the primary drivers. Now the automotive market is fuelling the race to extend cache coherency to all of the heterogeneous processing elements in SoCs. In two or three years, a new trend will emerge to extend heterogeneous cache coherency even further—but designers will need flexibility, configurability and scalability to ensure that these systems are high-performance, low-latency, and power- and cost-efficient.


J.P. Loison is Corporate SoC Application Architect, ArterisIP, which provides system-on-chip (SoC) interconnect IP to accelerate SoC semiconductor assembly for a wide range of applications. These applications include those spanning automobiles to mobile phones, IoT, cameras, SSD controllers, and servers for customers such as Samsung, Huawei / HiSilicon, Mobileye (Intel), Altera (Intel), and Texas Instruments. The company is located in Campbell, CA.

New Survey from HSA Foundation Highlights Importance, Benefits of Heterogeneous Systems

Beaverton, Oregon, Dec. 5, 2017 – The Heterogeneous System Architecture (HSA) Foundation today released key findings from a second comprehensive members survey. The survey reinforced why heterogeneous architectures are becoming integral for future electronic systems.
HSA is a standardized platform design supported by more than 70 technology companies and universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).
Some of the survey questions – and results:
Will the system have HSA features? 
Last year, 58.82% of the respondents answered affirmatively; this year, 100%!
Will it be HSA-compliant?
In 2016, 69.23% said it would; 2017 figures rose to 80%.
What is the top challenge in implementing heterogeneous systems?
27.27% responded in 2016 that it was a lack of standards for software programming models; the 2017 survey also identified this as the most important issue, but the numbers decreased to 7.69%. Also, half of the respondents last year said it was a lack of developer ecosystem momentum.
 Some remarks that further accentuate key survey findings:
“Many HSA Foundation members are currently designing, programming or delivering a wide range of heterogeneous systems – including those based on HSA,” said HSA Foundation President Dr. John Glossner. “Our 2017 survey provides additional insight into key issues and trends affecting these systems that power the electronic devices across every aspect of our lives.”
 Greg Stoner, HSA Foundation Chairman and Managing Director said that “the Foundation is developing resources and ecosystems conducive to its members’ various focuses on different application areas, including machine learning, artificial intelligence, datacenter, embedded IoT, and high-performance computing. The Foundation has also been making progress in support of these ecosystems, getting closer to taking normal C++ code and compiling to an HSA system.” 
 Stoner added that “ROCm 7 by AMD will port HSA for Caffe and TensorFlow; GPT, in the meantime, is releasing an open-sourced HSAIL-based Caffe library, with the first version already up and running – this permits early access for developers.”
 Dr. Xiaodong Zhang, from Huaxia General Processor Technologies, who serves as chairman of the China Regional Committee (CRC; established by the HSA Foundation to enhance global awareness of heterogeneous computing), said that “China’s semiconductor industry is rapidly developing, and the CRC is building an ecosystem in the region to include technology, talent, and markets together with an open approach to take advantage of synergies among industry, academia, research, and applications.”

About the HSA Foundation

The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.”
Follow the HSA Foundation on Twitter, Facebook, LinkedIn and Instagram.
 
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com
 
 

Developing Heterogeneous Cache Coherent SoCs – and More! Q&A with Arterisip's J.P. Loison, Corporate SoC Application Architect

Computing Now, HSA Connections: https://www.computer.org/portal/web/hsa-connections/content?g=54930593&type=article&urlTitle=developing-heterogeneous-cache-coherent-socs-and-more-

Editor’s Note: 
ArterisIP provides system-on-chip (SoC) interconnect IP to accelerate SoC semiconductor assembly for a wide range of applications from automobiles to mobile phones, IoT, cameras, SSD controllers, and servers for customers such as Samsung, Huawei / HiSilicon, Mobileye (Intel), Altera (Intel), and Texas Instruments. The company is located in Campbell, CA.
Describe in detail the various design challenges faced today in developing Heterogeneous Cache Coherent SoCs.
The first thing you need to do is understand customer requirements. This includes asking the right questions, some of which may include:
  • Understand how many IPs need to connect to the heterogeneous system;
  • What kind of bandwidth does the IP require;
  • What kind of IP and what kind of features can you enable with interconnect IP.
The next step is to define heterogeneity because many people are using the heterogeneous word, but there are different meanings behind the word.  Some key tasks and guidelines:
  • You must have different types of processors within the same family;
  • Then you have to accommodate different types of processors that are available on the market.
  • Different processor types also have a different cache structures.
    • An ARM CPU would use the same cache structure as another ARM core all over the processor.
  • A different CPU poses a different cache structure.
  • Accommodate different types of IPs as well:
    • CPU, GPU, and DSPs:
    • Then there are all other types the IPs that you combine into an SoC like connectivity IP, USB, SATA, etc.
It’s also important to be able to accommodate different (cache) protocol systems in terms of coherent and non-coherent protocol. Some examples:
  • Flexible snoop filter capability accommodates different cache structures of different kinds of processors.
    • Snoop filter capabilities operate in two different directions to accommodate any cache structure of any processor that is available today.
    • Another challenge: Reduce the number of memory bits that you need to perform snoop filtering.
How do you integrate IP that is not-cache coherent and achieve better performance? Provide a brief example or two?
You need to understand what the customer requirements are in terms of the mix of non-coherency and coherency requirements.  Are they separated, a full merger of both domains or a customized mix? Arteris, for instance, developed a component called a non-coherent bridge.  Its purpose is to drive non-coherent accesses back into the coherent domain.  It also enables a differentiator between the non-coherent and coherent domains.
How to you create a cache-coherent system that is easily placed on a chip?
A few years ago, coherency systems were small and compact – a max of three to four different processors. Coherency was confined to CPU clusters, functionality was grouped under an application and all subsystems were connected to an application.
But coherency wasn’t necessarily distributed beyond a subsystem. Customer needs are changing, there is a need for greater processor performance and companies are adding more and different types of processors. In addition:
  • SoC layouts are expanding tremendously;
  • Size of processors growing larger;
  • Complex layouts affect coherency domain;
  • Coherent domain is expanding all over the chip.
So how do you handle it?  First, you must make sure the infrastructure is designed to distribute coherency system-wide.  It has to be an interconnect technology that enables network packet transport and it also must accommodate a variety of topologies such as ring and mesh. The infrastructure must also be configurable and flexible because as design complexity continues to grow, designers must be able to understand which topologies are best suited for a particular chip layout. Having the proper tools that can predict where complexities might cause performance and power issues in the chip layout stage is critical to revising the layout and providing the best solution in terms of which topology might resolve these issues.
How can you optimize power consumption of complex systems? 
You first need to provide power-ready IP; once this accomplished, then you need to implement some well-known techniques – these may include voltage domain, power domain, clock gating and high-level clock gating.
If power-ready it will also have connectivity to a power interface and can be controlled by an MPU in the system that will decide when to shut down the IP when not in use or not needed by the system.  At the application level, this power-aware controller (MPU) can lower system power consumption by putting an IP on idle.
How long will it take to reasonably surmount some/all of the aforementioned issues?
Heterogeneous SoCs are still in development and haven’t yet matured. But processors in coherent domain now sharing data with each other. Other CPUs and GPUs have become cache coherent although I’m confident we can do a lot more.
With data sharing, this is not only between processor and GPU, but between all of the IPs of the system – it’s a concept that is in progress. This IP must be pushed a little bit farther to achieve total coherency. Today there are still not too many non-coherent IPs sharing data with coherent IPs. But we’re now starting to see applications now emerging that need coherency and this will bring new requirements.
Are these design challenges currently hindering product development in select verticals?  If so, which ones?
Yes, one that comes to mind is ADAS (Advanced Driver-Assistance Systems for automotive. Automotive applications will have a lot of requirements because of the need to add performance and share data with heterogeneous processors to achieve those requirements. We’ll see the introduction of new features to this market. Other markets will include artificial intelligence and machine learning.
A decade ago, mobile application processors were driving the need to cache coherency and then data center systems started becoming the primary driver.  Now the automotive market is driving the need to extend cache coherency to all of the heterogeneous processing elements in SoCs.  In two or three years, a new trend will emerge to extend heterogeneous cache coherency even further – but designers will need flexibility, configurability and scalability to ensure that these systems are high-performance, low-in-latency and reasonable in terms of power consumption and cost.

Everything You Need to Know About Why AMD Open Sourced the OpenCL Driver Stack for ROCm

Computing Now, HSA Connections: https://www.computer.org/portal/web/hsa-connections/content?g=54930593&type=article&urlTitle=hsa-connectio-1
Introduction: AMD is a co-founder and member of the HSA Foundation.  This article is excerpted and edited from a blog post by Vincent Hindriksen, founder of Stream HPC, a Netherlands-based software development company.
Last May, AMD open sourced the OpenCL driver stack for ROCm. With this they kept their promise to open source (almost) everything. Earlier the hcc compiler, kernel-driver and several other parts were open sourced.
Why this is a big thing?
There are indeed several open source OpenCL implementations, but with one big difference: they’re secondary to the official compiler/driver. So, implementations like PortableCL and Intel Beignet play catch-up. AMD’s open source implementations are primary.
They contain:
  • OpenCL 1.2 compatible language runtime and compiler
  • OpenCL 2.0 compatible kernel language support with OpenCL 1.2 compatible runtime
  • Support for offline compilation right now – in-process/in-memory JIT compilation is to be added.
Performance of ROCm was mostly on par with AMD’s closed source drivers, with a few outliers. A few months ago ROCm 1.6 was released, where again performance was noticeably improved. For the next release performance improvements are expected again.
Why was it open sourced?
There were several reasons. AMD listened carefully to their customers in HPC, while taking note of where the industry was going.
Get deeper understanding of how functions are implemented
It’s useful to understand how functions are implemented. For instance the difference between sin() and native_sin() can tell you a lot more on what’s best to be used. It doesn’t tell how the functions are implemented on the GPU, but does tell which GPU-functions are called.
Learning a new platform has never been so easy. Deep understanding is needed if you want to go beyond “it works”.
Debug software deeper
Any software engineer has experience with libraries that don’t perform as promised or work as documented. Integration issues with “black box” libraries, are therefore a typical reason for big project delays. If the library was open source, the debugger could step in and give all information needed to solve the problem quickly.
When working with drivers it’s about the same. GPU drivers and compilers are extremely complex and inevitably your project hits that one bug nobody encountered before. With all open source drivers, you can step into the driver with the same debugger. Moreover, the driver can be recompiled with fixed code instead of having to write a less secure work-around.
Get bugs solved quicker
A trace now includes the driver-stack and the line-numbers. Even a suggestion for a fix can be given. This also helps reduce the time to get the fix for all steps. When a fix is suggested AMD only needs to test for regression to accept it. This makes the work for tools like CLsmith a lot easier.
A bonus of open source projects is that over time the code quality becomes better than projects where code is never seen by outsiders, which also adds to quicker solving of bugs.
Get low-priority improvements in the driver
Popular software like Blender and the LuxMark benchmark can expect to get attention from driver developers. For the rest of us, we have to hope our special code-constructions are comparable to one that is targeted. This results in many forums-comments and bug-reports being written, for which the compiler team doesn’t have enough time. This is frustrating for both sides.
Now everyone can help build a driver for everyone.
Get support for complete new things
Proprietary code needs official access and legal documents that have all kinds of restrictions, which open source code does not.
More often there is opportunity in what is not there yet, and research needs to be done to break the chicken-egg conundrum. Optimized 128-bit computing? Easy complex numbers in OpenCL? Native support for Halide as an alternative to OpenCL? All up-to-date driver-code is available to make these possible.
Nurture other projects
Code can be “borrowed” from AMD’s projects and be used in (un)expected places. This ranges from GPU-simulators to experimental compilers.
Currently the forks of the ROCm-driver are mostly used to fix bugs or are thousands of commits behind. Who knows what the future brings.
Get better support in more Linux distributions
It’s easier to include open source drivers in Linux distributions. These OpenCL drivers do need a binary firmware (which were disassembled and seem to do as advertised). There is a discussion if firmware can be seen as hardware and can be marked as “libre”, but fact is that AMD’s contributions to the Linux 4.x kernel do get accepted.
Improve and increase university collaborations
If the software was protected, it was only possible under strict contracts to work on AMD’s compiler infrastructure. In the end it was easier to focus on the open source backends of LLVM than to go through the legal path.
Universities are very important to find unexpected opportunities, integrate the latest research in, bring potential new employees and do research collaborations. Timour Paltashev (senior manager, Radeon Technology Group, GPU architecture and global academic connections) can be reached via timour dot paltashev at amd dot com for more info.
Final words
It probably makes total sense to open source the drivers. Most notably key advantages include reduced costs and increased control due to easier debugging and bug-solving.
AMD is now a modern hardware company that understands software is a crucial part of their products. They believe that open source software gives an edge over the competition and made this bold move to let everybody peek in their kitchen.

HSA and ROCm Architectures to be Highlighted at Next Week’s CppCon

BEAVERTON, OR, Sept. 19, 2017– The HSA (Heterogeneous System Architecture) Foundation and Foundation member AMD will be providing a comprehensive session on HSA technologies and AMD’s ROCm architecture at next week’s CppCon. The conference will be held from Sept. 24-29 in Bellevue, WA at the Meydenbauer Conference Center.
CppCon is an annual gathering for the worldwide C++ community and is geared to appeal to anyone from C++ novices to experts.
The presentation by AMD Fellow Paul Blinzer is included as part of a session on ‘concurrency and parallelism’ running from 8:30-10 PM on Tuesday, Sept. 28 at the Meydenbauer Conference Center, Harvard, Room #406. Attendees will learn about what allows these architectures to use computational hardware accelerators like GPUs, DSPs and others with native C++, without resorting to proprietary APIs, programming libraries or limited language features.
Heterogeneous System Architecture (HSA) is a standardized platform design that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It provides an ideal mainstream platform for next-generation SoCs in a range of applications including artificial intelligence.
For more information on the presentation and to register, please see https://cppcon.org/registration/.
For more information, including a full list of speakers, supporting organizations and sponsors please visit: https://cppcon.org/cppcon-2017-program/
About Paul Blinzer
Paul Blinzer works on a wide variety of Platform System Software architecture projects and specifically on the Heterogeneous System Architecture (HSA) System Software at Advanced Micro Devices, Inc. (AMD) as a Fellow in the System Software group. Living in the Seattle, WA area, during his career he has worked in various roles on system level driver development, system software development, graphics architecture, graphics & compute acceleration since the early ’90s. Paul is the chairperson of the “System Architecture Workgroup” of the HSA Foundation. He has a degree in Electrical Engineering (Dipl.-Ing) from TU Braunschweig, Germany.
https://www.linkedin.com/in/paul-blinzer-4523602
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
 
Follow the HSA Foundation on Twitter, Facebook, LinkedIn and Instagram.
 
 Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com

HSA Foundation, AMD Headlining HSA Technologies Tutorial at 26th International Conference on Parallel Architectures and Compilation Techniques

BEAVERTON, OR, Sept. 6, 2017 – The HSA (Heterogeneous System Architecture) Foundation and Foundation member AMD will provide a half-day tutorial on HSA technologies and AMD’s Radeon™ Open Compute at this week’s 26th International Conference on Parallel Architectures and Compilation Architectures (PACT). The conference will be held from Sept. 9-13 in Portland, OR.
PACT brings together researchers from architecture, compilers, applications and languages to present and discuss innovative research of common interest. PACT recently widened its scope to include insights useful for the design of machines and compilers from applications such as, but not limited to, machine learning, data analytics and computational biology.
The tutorial, presented by AMD Fellow Paul Blinzer, runs from 9 AM to 12 PM on Saturday, Sept. 9th. Key elements will include an introduction into HSA and Radeon™ Open Compute runtime, followed by an in-depth session focusing on HSA, its components and the software ecosystem.
Heterogeneous System Architecture (HSA) is a standardized platform design that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It provides an ideal mainstream platform for next-generation SoCs in a range of applications including artificial intelligence.
The tutorial and other PACT sessions will be held at the Doubletree by Hilton Hotel Portland.
For more information on the tutorial and to register, please see https://parasol.tamu.edu/pact17/rates-registration.
For more information, including a full list of speakers, supporting organizations and sponsors please visit: https://parasol.tamu.edu/pact17/main-conference.
About Paul Blinzer
Paul Blinzer works on a wide variety of Platform System Software architecture projects and specifically on the Heterogeneous System Architecture (HSA) System Software at Advanced Micro Devices, Inc. (AMD) as a Fellow in the System Software group. Living in the Seattle, WA area, during his career he has worked in various roles on system level driver development, system software development, graphics architecture, graphics & compute acceleration since the early ’90s. Paul is the chairperson of the “System Architecture Workgroup” of the HSA Foundation. He has a degree in Electrical Engineering (Dipl.-Ing) from TU Braunschweig, Germany.
https://www.linkedin.com/in/paul-blinzer-4523602
About the HSA Foundation
The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.
Follow the HSA Foundation on Twitter, Facebook, LinkedIn and Instagram.
Contact:
Neal Leavitt
Leavitt Communications
(760) 639-2900
neal@leavcom.com