Will real-time data processing replace batch processing?
At Confluent's user conference, Kafka co-creator Jay Kreps argued that stream processing would eventually supplant traditional methods of batch processing altogether.
Absolutely: Businesses operate in real-time and are looking to move their IT systems to real-time capabilities.
Eventually: Enterprises will adopt technology slowly, so batch processing will be around for several more years.
No way: Stream processing is a niche, and there will always be cases where batch processing is the only option.
Tech Life

Hunting Down Asteroids with Machine Learning and a World of Programmers

Apr 11th, 2015 7:00am by
Featued image for: Hunting Down Asteroids with Machine Learning and a World of Programmers

There are millions of asteroids in space. If one of them hits, we know the results. It might disintegrate before reaching the earth’s surface, or the space object might be the size of Comet Shoemaker-Levy 9, which hit the surface of Jupiter in 1994 with asteroids two kilometers in diameter, traveling at a speed of about 216,000 kilometers per hour. That’s about 134,000 mph.

That discovery has changed our view a bit about how to detect these asteroids. But our detection rate is still not entirely refined. And deadlines are fast approaching. In 2004, the U. S. Congress mandated NASA should find more than 90 percent of all near-earth objects larger than 140 meters in diameter by the year 2020. We’re getting there, but not just because we have brilliant researchers and powerful telescopes. We now have thousands of developers and amateur astronomers who can use programming to help do the research once done exclusively by people in white lab coats.

NASA decided to ask programmers for some help. They organized a contest with the TopCoder website. It was similar to the Netflix Prize, an open competition used to help offer better recommendations to customers.

The NASA and TopCoder Asteroid Data Hunter project ran from March 2014 through January 2015, and had a target of improving the algorithm that detected asteroids. The competition winner achieved 15 percent improvement from the current method of identifying asteroids that orbit between Mars and Jupiter.

The TopCoder Asteroid Data Hunter site offers a download that includes source code, also available as open source on Github. The 420 MB download includes images, docs and installers for Windows and Mac. Note, there is also source code for Linux, only no installer yet. The main hunter software is written in Java with the algorithms coded in C++. The algorithm can run on a laptop or desktop.

How Does it Work?

The code splits into two distinct parts. The first part — data management, uploading, etc. — is all written in Java, and includes a local web server running off port 8080 for the browser user interface. The second part, which performs the comparisons, is written in C++.

Before you can understand how it works, you need to know a little about the Flexible Image Transport System (FITS) file format. This is a popular format for holding astronomical images and is endorsed by NASA and the International Astronomical Union. It’s a curious format, because it includes a human-readable header as well as the image data, and can have multiple sections of data, including non-image data. In short, you can put in any structured data you want. Modern telescopes can output data in FITS format, so you are encouraged to use the program with your own data as well.

Asteroid Data Hunter Screenshot

The Asteroid Data Hunter includes four examples of FITS image files and a Java class to read them. The algorithms written by the top five scorers each come in a header file, and vary in size from 1,021 to 14,328 lines long, though most are under 2,000 lines. Each of the five algorithms did comparisons differently, though several used the same algorithm: the random forest classifier. This is a machine learning technique used in classification and regression. Classification, in this case, is determining whether the observed pixel is a star or an asteroid.

The contest winner pre-processed the input FITS images to generate the background image, making an area around each pixel to get a median value. He then filtered these to try and detect objects that could be removed from the background. After that, his algorithm tries to link objects across different images, thus letting him remove stationary objects and hopefully leave the moving ones. Identifying the moving objects is done by looking at seven specific features related to the speed and direction of the movement:

  • Total (euclidean) speed.
  • Total (euclidean) movement.
  • RA-component (Right Ascension) speed.
  • RA-component movement.
  • DEC-component (declension) speed.
  • DEC-component movement.
  • Angle (atan2) of the movement.

Having determined the path of a moving object on two images, his code verifies that it is an asteroid by comparing pixel values across two other images. FITS images include time stamps, so the position of the asteroid in the other two images can be determined by calculating how far it moved in a straight line in the time intervals.

This isn’t a static program to detect asteroids, but an improved algorithm to learn from the input, and make predictions about detecting asteroids in other data sets.


The need to detect asteroids is about discovery as much as protecting the earth as we know it. The interest has made it possible to think through how we might build a gravity tractor that would gently push a giant asteroid so it would not collide with earth. In the meantime, the simplification of machine learning technologies is allowing programmers to participate in new ways, so we can detect asteroids and avoid a Hollywood disaster such as as a “Fire in the Sky,” or as astronomers like to say: “The Asteroid that Ate Phoenix.”

Feature image via BBC.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.