IBM Bluemix Now Offers Python Processing for Streaming Data
As more workloads move to the cloud, various cloud data platforms and offerings are becoming a major point of differentiation. To this end, IBM had introduced streaming analytics into its Data Science Experience portfolio of data science tools and services. Existing DSE users will now be able to pipe streams of data into their Python applications for the purpose of reacting to that data in real time.
IBM’s Data Science Experience is an ever-growing retinue of data tools for developers, ranging from packaged versions of the Jupyter Notebook, and RStudio, an IDE for the R statistical programming language. The platform also ties into the IBM Watson set of services, though direct machine learning tools are still a forthcoming addition to the IBM DSE.
For developers working with the DSE, streaming data will now be available for processing with Python in IBM’s Bluemix cloud platform. The Bluemix streaming analytics platform is based on Apache Spark. Combined with the workspaces provided by IBM DSE, developers can share and collaborate on stream processing code and embed that code into notebooks for later use, or for live experimentation.
This is part of a larger movement inside IBM to push its Watson set of cloud service offerings as a central differentiator for its cloud services. Specifically, Willie Tejada, Chief Developer Advocate at IBM, calls this a movement towards cognitive development, where developers are working with Watson to build applications that can think.
“No one thinks of themselves as a cognitive developer,” said Tejada. “They’re usually associated with a language or platform. The skillset for the future cognitive developer masters the intersection of data science and AI development to gain insights from both structured and unstructured data, and have a mastery of these AI services, like vision, speech, and even empathy.”
Move to the Cloud
The move is part of a larger trend of shifting “big data”-styled workloads into the cloud. According to Matei Zaharia chief technologist at DataBricks, and creator of the Apache Spark project, the hottest new thing in big data is not running your own cluster.
With such a cluster firmly ensconced in a cloud hosted by the major providers, the real work of big data can become the day to day work of big data: real-time analytics, stream processing, and machine learning. Zaharia said that the difficulties of running and managing a Hadoop cluster have driven many companies to use the cloud for this type of processing, leaving the on-premise Hadoop providers with something of a disadvantage around their whole-cluster sales engagements.
“What we see is this gap in skills. We’re really committed to taking the developers tools and communities they like today, and essentially bringing things that build on that mastery,” said Tejada. “Big engagements are still a huge part of IBM’s business, but essentially the modern developer wants a lot of self-service and wants access to these resources in these communities today. We’ve really put a huge focus on putting what we call a developer-first mentality in relation to the way we develop our services.”
Feature image via Pixabay. Inset image from IBM.