Data / Programming Languages

TiDB Brings Distributed Scalability to SQL

26 Apr 2017 1:00am, by

A rash of new databases have emerged, such as Google Spanner, FaunaDB, Cockroach and TimeScaleDB, that are focused on solving the problem of scale that plagues standard SQL. Now another entrant, the Beijing, China-based PingCap’s open source TiDB project, aims to make it as scalable as NoSQL systems while maintaining ACID transactions.

Its support for the MySQL protocol means users can reuse many MySQL tools and greatly reduce migration costs, according to PingCap co-founder and CEO Qi (Max) Liu. You can use it to replace MySQL for applications without changing any code in most cases. And it scales horizontally; increase the capacity simply by adding more machines.

Liu presented TiDB at the Percona Live conference in Amsterdam last October. The project was in beta then; it’s since evolved to release candidate. On Thursday, PingCap co-founder and Chief Technology Officer Edward Huang will be speaking about TiDB at the Percona Live event in Santa Clara, Calif.

They tout that TiDB offers the best of both the SQL and NoSQL worlds. They focused on making it:

  • easy to use;
  • ensuring that no data is ever lost; it is self-healing from failures;
  • cross-platform and can run in any environment;
  • and open source.

It also allows online schema changes, so the schema can evolve with your requirements. You can add new columns and indices without stopping or affecting operations in progress.

As an open source project, it has more than 100 contributors, Liu said in an email interview.

PingCap drew inspiration for TiDB from Google F1 distributed database and Spanner. Google built Spanner atop its own proprietary systems and it’s not open source, considered a downside to some.

“With Spanner, you’re making a commitment to running the service in Google Compute Engine (GCE) and probably running it there for the service’s lifetime. You’re not going to have an off-ramp if you choose to run your own stack,” Spencer Kimball, CEO of Cockroach Labs, told The New Stack previously.

Keeping Track of All the Bikes

TiDB takes a loose coupling approach. It consists of a MySQL Server layer and the SQL layer. Its foundation is the open source distributed transactional key-value database TiKV, another PingCap project, which uses the programming language Rust and the distributed protocol Raft. TiDB is written in Go. Inside TiKV are MVCC (multi-version concurrency control), Raft, and for local key-value storage, it uses RocksDB. It also uses the Spark Connector.

TiDB makes two distinctions from Spanner, Liu said:

While the bottom layer of Spanner relies on Google’s Colossus distributed file system, TiDB ensures that the log is safely stored in the Raft layer. TiDB does not depend on any distributed file system, which greatly lowers write latency.

“We also see great potential in SQL optimizer, but Google didn’t seem to go deep into this aspect in its F1 paper. When designing our project, we aimed to explore the optimizer’s capability,” he said.

Spanner gained attention for its use of atomic clocks to gain time synchronization among geographically distributed data centers. TiDB does not use atomic and GPS clocks. Instead, it relies on Timestamp Allocator introduced in Percolator, a paper published by Google in 2006.

It supports the popular containers such as Docker. And the team is working to make it work with Kubernetes, though, for this work, Liu pointed out difficulties there to the Amsterdam audience.

The biggest problem they’re working on now is latency, especially between geographically distributed data centers, he said, one he hopes to have resolved in the near future.

PingCAP was founded in April 2015 by Huang, a senior distributed system engineer; Cui Qiu, a senior system engineer; and Liu, also an infrastructure engineer. It has 48 engineers working in Beijing and others working remotely from elsewhere in China.

Its clients include mobile gaming provider GAEA, which uses TiDB to support its cross-platform real-time advertising system, which requires high-volume data capacity and experiences peak loads during certain periods. TiDB supports automatic sharding and the bottom layer, TiKV, automatically distributes data among the cluster, which helps GAEA cut the cost of operation and maintenance, Liu said.

Another customer is the cashless and station-free bike sharing platform Mobike which uses TiDB for data analysis and to replace a MySQL database for online orders, which now number more than 400 million.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.