SKA Telescope May Change Computing as Much as Astronomy

When the almost $2 billion Square Kilometer Array goes live in 2020 it will not only be the largest scientific structure in the world, a square kilometer of dishes arranged in “multiple spiral arm” configurations, it will also be powered by a computer three times more powerful than the largest supercomputer currently in existence.
This radio telescope array, located mostly in the South African Karoo, with a second facility in the Australian outback and outliers in eight other African countries, will have over 250,000 separate antennae. The parallel processing needs of the project will require a computer with more than three times the power of China’s Tianhe-2 supercomputer, currently the world’s fastest at 33.86 petaFLOPS.
Even as NASA has shut down its Space Shuttle program and some have worried about a dimming curiosity about the universe, SKA, as it’s called, is only the most dramatic expression of a genuine space science renaissance in the 21st century, which has seen projects like CERN’s Large Hadron Collider in Switzerland and the MeerKAT precursor array in Africa.
According to scientists like Professor Nithaya Chetty, group executive for South Africa’s National Research Foundation a project with the scope of the SKA is only possible now due to the advent of high speed computing, the reduction in price for connectivity, the proliferation of fibre optic cable, including submarine cable systems, and improvements in material science.
But it is also a function of imagination on the part of scientists and a vision of possibility shared with enough government actors and public enthusiasts to find its footing.
Tim Cornwell, SKA Architect and former chief computer scientist for the project, told The New Stack, “There are a huge number of very ambitious astronomy projects happening these days. Around the world it’s a great time for astronomy.” The public, he said, is captivated by the beauty and power of space sciences.
The Hubble Telescope’s gorgeous photographs of the Whirlpool Galaxy, the Hourglass Nebula, and especially the Mystic Mountain have reawakened a public sense of wonder. SpaceX has done the same for its sense of possibility.
The Karoo site in South Africa will see 254 dishes 15 meters in diameter. The Australian site will host 96 dishes. Additionally, the project will feature 256,000 two-meter dipole antennae, arranged in 1,024 “clumps,” with each clump’s data processed as a unit. (The largest array currently operating is the Chile’s ALMA, which “only” has 64.)
When it comes to radio astronomy, a baseline is the equivalent of a pixel in computing. Each line between one antenna, dish, or clump, and another is a baseline. The more baselines you have, the greater the resolution of the “picture” you’re taking of the universe.
If you move the antennae together, you get more “focus.” If you move them apart, you get a wider picture. For example, the dishes that make up the Very Large Array in New Mexico, which you probably know from the movie Contact, are on railroad tracks. To change the focus of the VLA, you move the dishes around on the rails. SKA has both, a large number of telescopes, a large number clustered in the Karoo, and a maximum baseline of 3,000 miles.
As Prof. Scott Fisher, astronomy lecturer and outreach coordinator for the University of Oregon’s Department of Physics, explained to The New Stack, if you place a telescope in Hawai’i, another in Texas, and a third in the Caribbean (as the Very Long Baseline Array does), you essentially have a telescope as big as the world, but the picture is not very sensitive. If, on the other hand, you place, say, a dozen in close commerce, you can image distant, dimmer objects in greater detail.
When SKA is up and running it will have almost 527,000 baselines. “This will enable astronomers to see the period of the universe when stars was first formed,” said Fisher. In other words, dimmer light, much further away, which reflects events in the more distant past, will be “visible” to the SKA.
When most people think of a telescope, they think of something like Palomar, a huge optical telescope with a giant lens. Optical telescopes detect energy within a certain frequency, that which produces visible light. That frequency is pervasive, but it is not nearly as common as frequencies that indicate colder temperatures. Those frequencies do not produce either visible light or energy in the infrared. To “see” those lower temperatures, which the majority of the universe produces, you need a radio receiver. That’s what a radio telescope is, a receiver.
Radio arrays work using interferometry, the process of observing waves from different receivers and sending the data to a central correlator, which uses software to adjust for the time differences, often very slight, between receivers in the array. The correlated data form a picture of the section of space being observed. That picture is not the stunning false-colored beauties of the Hubble but more like extremely detailed topographical maps of sections of the universe.
If it takes a good computer to do that correlating for an array with a dozen or two receivers, imagine how much work will need to be done, and how fast it will need to be done, by the SKA’s computer: at least one hundred thousand million million floating point operations per second, the equivalent of the processing power of 100 million laptops.
According to the SKA website, 160 Gigabits per second of data will be transmitted from each radio dish to a central processor via fibre optic cables. “The high frequency dishes alone will produce ten times the current global internet traffic.”
Mordecai Mark Mac Low, curator in the American Museum of Natural History’s Astrophysics Department believes that getting the funding for the development, and meeting the anticipated launch date of 2020, are the X factors. Reaching the supercomputing goals, he told The New Stack, seems quite doable.
“Moore’s Law, that processing speed doubles roughly every two years, ended for single processor speeds some ten years ago,” said Mac Low, “but was picked up almost without pause by multiprocessor computers. Although programming hugely parallel machines efficiently is enormously difficult, the six years between now and 2020 should easily suffice for the raw power necessary to become available.”
Construction on the SKA will not begin until 2017. In the interim, Cornwell said the group is working on prototypes and are benchmarking the software to be used in the array. Almost all of the work until the groundbreaking will be focused on software the software design.
The hardware is likely to be “a large cluster with internode Infiniband (or similar) connections,” he said. This hardware will not require anything new to be invented.
“It seems unlikely that we will do anything other than use standard components such as blade systems from the standard vendors.I hope we don’t have to invent anything new,” said Cornwell. “The reason for this is that for maintenance and upgrade purposes we benefit most from having fairly generic systems, even if those systems are very large.”
The SKA stack is different from other large arrays “primarily in that it has to handle highly parallel data reduction, and to scale to a number of nodes, perhaps 10 million,” said Cornwell, something that has never before been attempted.“ In addition, the data flow is roughly 1000 times larger than the best attempted before. Also, the scientific performance of SKA is sufficiently enhanced over previous telescopes that novel processing algorithms will be necessary.”
Synchronizing the data will require clock stabilities of the order of 10 to the negative 12 Picoseconds in one second and will require processing power as high as 100 petaFLOPS.
Much of the hard work by the SKA folks will go into developing this processing software and the algorithms that run it. This “totally bespoke” software development is being handled by one of the cooperating institutions working together on the project, the Science Data Processor Consortium in Cambridge, England.
As work is done using the SKA, the data captured will be hosted in a shareable archive, accessible to academics around the world.
Ian Bird, a member of the committee that advises SKA on computing issues, is the project leader for the Worldwide Large Hadron Collider Computing Grid Project at CERN in Switzerland. The computing demands of the LHC were the closest in both size and speed to what the SKA will require. The area of computing that Bird oversees is the distribution of the results of experiments, the “science products,” from the LHC to scientists.
CERN has a multi-tiered approach to the distribution of the huge amount of data that comes out of the experiments run on the LHC. In addition to saving the data in storage at CERN, they send a copy of that data to “tier one” facilities, primarily national labs, like Brookhaven and Fermilab in the U.S. Those facilities provide the data at the request of tier two facilities, such as university physics department.
“It is becoming cheaper to move data around than to store it,” Bird told The New Stack. “We spend most of our budget on disc space, and now we have to try to optimize between storage and transmission.”
CERN, in other words, is migrating to the cloud. And it is likely that the cloud will play a large role in SKA’s data sharing scheme.
Like a moon shot, building something as big as the SKA, or the LHC, requires trajectory and dynamic reassessment, not just a static blueprint. The reason CERN is able to share its LHC data so well is because the technology to do so grew as the project did. The networking became much more “performant” than it was when the LHC approved the construction plan in late 1994. In fact, the LHC has used a “rolling technology forecast” to anticipate where the tech will be at any given point.
“I wouldn’t call it a leap of faith but you have to believe the tech will keep up,” said Bird.
The experience of SKA should help others in far different areas as well.
According to the website, SKA will advance signal processing algorithm development in two areas.
“Faster and better ways will be developed to make the high dynamic range (a ratio of 106:1) images required for SKA science. Effective radio interference (RFI) mitigation algorithms will also be needed to enable observations across wide segments of the radio spectrum.”
But according to Cornwell, these advancements will not be restricted to astronomy.
Because the project, if successful, will deal quickly with massive amounts of data, “The ability to do science with big data” will be advanced, according to Cornwell. “SKA is a telescope bolted onto a supercomputer, as an intrinsic part of the telescope. It’s an interesting model” and one that may have an effect on such currently compromised but sought-after skills such as long-term, on-demand weather forecasting, anticipating storm interactions, hurricanes, tornados, perhaps even earthquakes.
SKA should affect “any systems that process large volumes of data from geographically dispersed sources,” said William Garnier, communications and outreach manager for the SKA. “Systems that carry out high speed detection and analysis could also benefit intelligent surveillance for the recognition of faces in a crowd, traffic monitoring and monitoring of financial and retail markets.”
Mac Low agrees, saying some of the SKA’s work may have wider use, “particularly for applications that require signal processing and Fourier transforms, which pervade real time computing.”
In the end, however, Cornwell and his colleagues at the SKA are focused on how much of the wonder of the universe will be unwrapped with this tool. Before, radio astronomers had to choose between seeing “weak sources” and “bright sources,” dim, wide pictures or harrow, bright images. The SKA could very well provide us with the ability to look at the universe with a species of three-dimensional sight which we could not have even known we were lacking.
Image courtesy Square Kilometer Array