C++ 23 Standard Won’t Have a Key Parallelism Feature

The next version of the C++ standard coming up next year won’t have a key feature that makes it easier to write code for execution in parallel computing environments.
The C++ 2023 standard won’t have an asynchronous algorithm feature called senders and receivers, which will allow for simultaneous execution of code on a system with multiple chips such as CPUs and GPUs.
“The goal there is maybe to try to get it into the working draft next year — the [C++ 26] working draft — so once it’s there, then people will take it a lot more seriously,” said Nevin Liber, a computer scientist at the Argonne National Laboratory’s Advanced Leadership Facility, and a C++ committee member, during a break-out session at last month’s Supercomputing 2022 conference in Dallas.
Fundamental Changes
Software applications written in C++ are going through fundamental changes, with PCs, servers and mobile devices executing code simultaneously on multiple chips. The goal with senders and receivers is to bring the standard C++ framework up to date so programmers find it easier to write applications that take advantage of the new execution environments.
Programmers are increasingly writing code for CPUs and accelerators like GPUs and AI chips, which are important for faster execution of applications.
“While the C++ Standard Library has a rich set of concurrency primitives … and lower level building blocks … we lack a Standard vocabulary and framework for asynchrony and parallelism that C++ programmers desperately need,” says a document that maps out the proposal.
Senders and Receivers
Currently, C++ code has to be optimized to specific hardware. But senders and receivers will add an abstraction layer for standard C++ code to run across multiple parallel environments. The goal is to add portability, so the code works at different installations.
“We certainly have ideas of how to connect that with algorithms. My hope would be that for C++ 26 we can do that. You have a good way of connecting these things and also have … algorithms being able to do asynchronous work,” said Christian Trott, a principal member of staff at the Sandia National Laboratory and also a C++ standards committee member.
The asynchronous communication feature is largely being pushed by Nvidia, whose CUDA parallel programming framework is widely used in machine learning, which relies on the concurrency of CPUs and GPUs to reduce training time.
Nvidia has open-sourced its libcu++ C++ library. The company also last week released the CUDA 12.0 parallel programming framework, which supports the C++20 standard, and supports host compilers such as GCC 10, Clang 11 and ArmC/C++ 22.x.
Senders/receivers may not make it to C++ 23, but it will make life easier for coders in the future, said Stephen Jones, CUDA architect at Nvidia, told The New Stack.
“I feel pretty confident about 2026, but senders/receivers — it’s a big shift in C++. It’s a really very new thing for them to try and embrace asynchronous sort of pipeline execution,” Jones said.
Mature Technology Needed
While the delay of a key feature may not look good on paper, C++ committee members said it’s better to wait for a technology to mature before adding it as a standard. Computing with accelerators is in its early days, with chip designs, memory and storage requirements changing constantly.
“I think we need to see more accelerators, said James Reinders, a software engineer at Intel, adding, “I think that needs a little more time to play out.”
Intel provides a tool called SYCLomatic that makes code portable across hardware by stripping out CUDA code that limits applications to Nvidia GPUs. Reinders said that GPUs won’t be the only accelerators available.
Reinders also pointed out a vigorous debate on whether hooks for technologies like remote memory are needed permanently in standard C++. Some are better as extensions, he said.
“Give it some time to play out and we’ll see if that’s the right thing to put into C++ or if it’s better as an extension, OpenMP has been very strong for a long time. It’s never been incorporated into Fortran or C. It’s appropriate to not overcomplicate a core language,” Reinders said.