The New Stack welcomes stories that explain open source technologies. Is there an open source technology you would like to explain? Please contact us with your post and we will consider it for publication.
The world is moving toward a NoSQL one. It’s requiring us to learn new techniques and approaches to working with data. We have to spend more time engineering and designing schemas. Finally, we have to know more about our database’s workings than with relational databases.
That gets us to the first difficulty of NoSQL and HBase — the lack of knowledge. What is HBase? How does it work? Why should I use it?
What is HBase?
Apache HBase is a column-oriented, NoSQL database built on top of Hadoop (HDFS, to be exact). It is an open source implementation of Google’s Bigtable paper. HBase is a top-level Apache project and just released its 1.0 release after many years of development.
Data in HBase is broken into tables. Within these tables, a row’s data is broken into groupings called column families. These column families group similar or frequently accessed data together. A row key uniquely identifies a row’s data.
In the figure above, the “user_data” and “pictures” are both column families in the “users” table. We divide data between these two column families because the “user_data” column stores text data about the user, such as their name, password and email. The “picture” column family stores the user’s profile photo. To retrieve one of these rows, we would use a row key such as “email@example.com”.
How Does it Work?
I like to use playing cards to show how HBase works. Take a look at this video I created. I encourage you to break out your own deck of cards and play along.
Why Should I Use It?
HBase is useful for Big Data problems — when you need to randomly read, randomly write or do both. There are many companies using HBase in production with multi-petabyte databases, running them as mission critical data stores.
There are many other different use cases for HBase, and each year at HBaseCon, users talk about how they’re using HBase. The original use case for Google was storage of massive databases for the Internet and its users. Pinterest uses HBase to store graph data (not charts). Flipboard uses it to store content and personalize that content for its users. FINRA uses HBase to store trading graphs. Some companies use HBase for click stream data storage and analysis, and still others use it for time series analysis. Hadoop MapReduce is used together with HBase to process data in an efficient way.
There are many places to learn more about HBase. This week, HBaseCon will be running May 7. There are still tickets available. I’ll be speaking and giving an introduction to HBase.
This brief article gives you a little information about HBase. As with most Big Data technologies, you’ll need a much better understanding of it before you can use it to create effective solutions. You’ll really want some amount of training before embarking on this transformation.
As you’re exploring NoSQL solutions, HBase is a great one to look at. Choosing the right tool for a solution often means the difference between success and failure.
Feature image via Flickr Creative Commons.