Development / Open Source

Apache Sets Out On a Geospatial Voyage

22 Jan 2016 12:27pm, by

A handful of geospatial specialists are rallying to put together a geospatial track at ApacheCon North America, conference in May.

Some of them met at Apache Big Data Europe in September and started talking about increasing awareness of geospatial within Apache and coordinating collaboration among the projects.

“The intent is to start a dialog within Apache that is cross-cutting of projects,” says George Percivall, chief engineer at The Open Geospatial Consortium (OGC), based in Annapolis, MD.

It will be an opportunity to consider the use of open standards to increase interoperability and code reuse, according to the call for projects.

The instigators of this geospatial track include Percivall; Chris Mattmann, principal data scientist and chief architect in the Instrument and Data Systems section of the Jet Propulsion Laboratory in Pasadena, Calif.; Ram Sriharsha is a senior member of technical staff at Hortonworks; Sergio Fernández, software engineer for Redlink, Salzburg, Austria; and Martin Desruisseaux, a developer at Geomatys, Arles, France

“There’s a range of projects, and some are aimed at a more complex kind of understanding of geospatial, to address some of the complexities. Some are more about even just using a point location. That’s all great. We just want to make sure it’s consistently done. Even just using a point location has caused problems – which comes first: longitude or latitude – believe it or not, is a continuing problem,” Percivall says. The OGC has been focused on creating consensus-based open standards

Deloitte estimates that by 2020, location-based services will be a $1.3 trillion industry and the use of geo-location data, including GPS, will generate $500 billion in consumer value. A Market and Markets report forecasts a 10 percent annual compound growth rate for GIS systems, with software accounting for 48 percent of that growth. It says the preference for open source software will be among the drivers of that growth.

And geospatial applications increasingly will require big data solutions and analytics technology, according to the 2016 industry report from Geospatial World magazine.

Among the projects:

The Spatial Information System (SIS) project is a free Java language library for developing geospatial applications. It can be used to represent coordinates for searching, data clustering, archiving and other spatial functions. The work has centered on implementing the OGC GeoAPI Implementation Specification 3.0 interfaces for use in desktop or server applications.

SIS has wide applications in meteorology, in land use and other areas, according to Desruisseaux, who says he’s working full time on it. The work on implementing ISO standards for metadata is almost complete, and the work on referencing by coordinates is under way. It will continue moving up the stack, with imaging next, he said.

Apache SIS also provides support for reading and writing Coordinate Reference System (CRS) objects from Geography Markup Language (GML) documents and performing map projections with those CRSes. He says it’s the first open source project to implement the new standard ISO 19162 for text representation of coordinate reference systems.

“It would be important for me to understand how to use it if I were trying to create software by myself,” Desruisseaux said of the work. “OCG standards, to me, are like a skeleton, a blueprint which allows me to design Apache SIS in a way that will serve the public now as well as in five or 10 years.”

He works in a sandbox called Constellation, where he’s put together a demo. (For password/username, use demo/demo.)

Apache Marmotta aims to provide an open implementation of a Linked Data Platform for organizations that want to publish linked data or build custom applications on linked data.

“We come from a completely different world. We come from research in the semantic web and all this data from a high level,” said Fernández.

Last summer, a student from Google’s Summer of Code implemented a version of GeoSPARQL in Marmotta, and it will ship in the version 3.4 in February, Fernández said.

It uses KiWi, a high-performance transactional triple store backend for Sesame building on top of relational databases. It has optional support for rule-based reasoning and versioning. The new extension to KiWi makes use of the PostGIS extension in PostgreSQL.

Marmotta is most used in tourism applications, Fernández said.

“Say, if you are in a town, you could say, ‘Let me find the best restaurant where I could smoke and [it’s] five minutes walking or 10 minutes by car’ or whatever. It provides special features. We’re using little pieces of knowledge from different databases. We put them together in a data model, then we can query,” he said.

Open Climate Workbench is developing a software library that, among other things, makes it easier to evaluate climate models based on model and observational datasets in heterogeneous formats and resolutions from a variety of sources. Those sources might be the Earth System Grid Federation, the U.S. National Climate Assessment and others, as well temporal/spatial scales with remote sensing data from NASA, NOAA and other agencies. The toolkit includes capabilities for data extraction and manipulation, metrics computation and visualization.

The Jet Propulsion Laboratory donated this technology from the previous Regional Climate Model Evaluation System (RCMES). The project then added in Apache Object Oriented Data Technology (OODT) to manage massive datasets, which has allowed the project to scale beyond climate model evaluation.

Finally, there is Magellan is a geospatial analytics engine designed to make it easy to parse and efficiently query spatial data sets at scale.

If a user searches for “canyon hotels,” for instance, in an application without location awareness, the results might be for hotels in Canyon, Texas, rather than near the Grand Canyon. This project aims to pair the search term with location context to deliver more relevant results.

It uses Spark SQL, DataFrames and Catalyst as the underlying engine.

Feature Image: Pushpins in a map over the U.S.A.” by Marc Levin,  licensed under CC BY-SA 2.0.

A newsletter digest of the week’s most important stories & analyses.