IBM Bulks up Bluemix with Apache Spark-based Analytics
IBM continues to expand its Bluemix Cloud Data Services portfolio. It has just added four new cloud and analytics tools for developers and analytics folks.
That portfolio now contains with more than 25 services to help developers and data scientists work with data in the cloud and across hybrid cloud environments.
The new batch of tools are all based on Apache Spark, in which Big Blue announced a heavy investment last summer. Derek Schoettle, general manager for IBM Analytics Platform and Cloud Data Services said the aim is to create a one-stop shop to access, build, develop and explore data.
In addition, to a continuing commitment to open source technologies and tools, according to Adam Kocoloski, chief technology officer for IBM Analytics.
“It’s a [continued] elevation of data as a first-class citizen,” Kocoloski said. “It’s not so much about any individual technology, any execution engine, it’s rather saying, ‘What is the data you have and what do you want to do with it and we’ll figure out the right mix of technologies to bring to bear to be successful.’”
The data serves as a common foundation for data engineers, scientists and analysts to collaborate to solve problems for the business, he said.
The new tools include:
IBM Compose Enterprise
IBM Compose Enterprise is a managed platform to help development teams to deploy business-ready open source databases in minutes on their own dedicated cloud servers.
Based on its acquisition of database-as-a-service startup Compose, this service should help satisfy line-of-business folks who want to set up their own database, Kocoloski said.
“With Compose Enterprise, we can support self-service provisioning of MongoDB, Redis, Elasticsearch and Postgres [and others]…we can support all of that in a self-service way, so that under the hood, IT has one consistent way to containerize these databases, to scale them, to monitor them and so on. You can do a consolidation play and institute a governance model around this while still allowing the line of business to be agile and allow them flexibility …”
IBM Graph is a fully managed graph database service built on Apache TinkerPop.
In the announcement, Marko A. Rodriguez, Apache TinkerPop project management committee member said he was pleased with IBM’s choice to use the Gremlin graph traversal language and TinkerPop, which provides graph computing capabilities for both graph databases (OLTP) and graph analytic systems (OLAP).
Kocoloski said graph databases are commonly used in recommendation engines, in fraud and risk analytics modeling and in routing, either in transportation or network routing. He said he believes there will be more new uses as the technology becomes more widely adopted.
“We think TinkerPop has the potential to do for graph databases what SQL did for relational databases. By putting a standard interface in front of these graph engines, we can encourage the pace of innovation under the hood without forcing users to rewrite their applications anytime they want to try a new engine, a new algorithm or new application.
“There are a lot of use cases for graph databases and clients haven’t necessarily extracted all the value they could from graphs. By removing concerns about installation and deployment, we feel we can encourage greater experimentation and ultimately greater adoption of graph technology,” he said.
IBM Predictive Analytics
The set of tools allows developers to easily build machine learning models from a broad library into predictive applications without the help of a data scientist.
“With the predictive analytics engine, we’re trying to bring more people into the world of machine learning. We’re trying to help them get started,” Kocoloski said. “With the predictive analytics service, people can take a dataset and proactively test different models against it. It will automatically select the one that is best fit for the data and then help the user understand why that one was selected. It’s a gentler introduction to the world of multivariate analysis and machine learning.”
IBM Analytics Exchange
The IBM Analytics Exchange includes a catalog of more than 150 publicly available datasets that can be used for analysis or integrated into applications.
These public datasets cover a range of topics including geographic data, census data, military expenditures and a host of others.
“The exchange simplifies the process of feeding those datasets in for analysis and provides some structure around the licensing, making clear that they are good to use,” Kocoloski said.
In addition, it provides a foundation to start thinking about metadata management in the cloud.
“So often we find clients who are competent at managing databases, they’re competent at managing processing, but the organization has gotten so big and the datasets have gotten so varied, that it’s difficult to actually find the right data or what transformation was applied. That becomes a real challenge. This is providing a cloud-native approach to help clients manage this problem,” he said.
IBM is a sponsor of The New Stack.