Data / Storage / Sponsored / Contributed

Getting Started with Python and InfluxDB

21 Jan 2022 11:00am, by

Rahul Banerjee
Rahul Banerjee is a computer engineering student who likes playing around with different libraries/APIs.

Although time-series data can be stored in a MySQL or PostgreSQL database, that’s not particularly efficient. If you want to store data that changes every minute (that’s more than half a million data points a year!) from potentially thousands of different sensors, servers, containers, or devices, you’re inevitably going to run into scalability issues. Querying or performing aggregation on this data also leads to performance issues when using relational databases.

A time-series database (TSDB), on the other hand, is optimized to store time-series data points. This is particularly useful in situations like:

  • Analyzing financial trends in stock prices.
  • Sales forecasting.
  • Monitoring the logs and metrics of an API or web service.
  • Monitoring the sensor data from a car or a plane for safety purposes.
  • Tracking power usage in IoT devices such as a smart power grid.
  • Tracking an athlete’s vitals and performance during a game.

InfluxDB has created an open source time-series database that makes it easier for developers to work with time-series data. This article will show you how to set up InfluxDB using Python, working with stock data fetched using the Yahoo Finance API.

You can access all the code written in this tutorial in this repo.

Why Use InfluxDB?

InfluxDB comes with a pre-built dashboard where you can analyze your time series data without much groundwork. And let’s not forget that it outperforms Elasticsearch and Cassandra.

It has a free open source version you can run locally, and there’s a cloud version that supports major cloud services such as AWS, GCP and Azure.

Setting up InfluxDB with Python

Before getting started, make sure you have Python 3.6 or later installed on your computer. You’ll also need a virtual environment. This article uses venv, but you can use conda, pipenv or pyenv as well.

Finally, some experience with Flux querying.

This guide uses the module influxdb-client-python to interact with InfluxDB. The library only supports InfluxDB 2.x and InfluxDB 1.8+, and it requires Python 3.6 or later.

All set? Let’s get started installing and connecting the client library.

If you have Docker installed on your computer, you can simply run InfluxDB’s Docker Image using the following command:

If you don’t have Docker, download the software for your OS here and install it. If you’re running InfluxDB on a Mac, you can use Homebrew to install it:

If you’re running the Docker image, you can directly go to localhost 8086. However, if you downloaded the software and installed it, you will need to enter the following in the command line:

You should see the following screen on localhost 8086:

Screen shot of welcome page

Click **Get Started**, which redirects you to the following page:

Screenshot of the next page after the welcome message

For this tutorial, choose **Quick Start** and enter your information on this page:

Screenshot of setup initial user page

You can create organizations and buckets later on as well, but for now, just pick an easy name for each of these fields.

After signing up, you should find yourself on the dashboard page. Click **Load your data** and then choose the **Python** client library.

Screenshot of load data screen

You should now see the below screen:

Screenshot of code sample options

Under **Token**, there should already be a token listed. However, if you’d like, you can generate a new token for this tutorial. Click **Generate Token** and select **All Access Token** since you will be updating and deleting data later in the tutorial.

Note that InfluxDB will raise a warning at this point, but you can ignore it for now.

Screenshot of generate all access token

Now, you’ll have to set up a Python virtual environment. Create a new folder for the tutorial:

Then change your directory into the new folder:

Create a virtual environment:

Activate it.

Finally, install InfluxDB’s client library:

Create a new file named __init.py__, then go back to the InfluxDB UI:

Select the appropriate token and bucket, then copy the code snippet under **Initialize the Client** and paste it in your Python file. The code snippet will be automatically updated if you change your token/bucket selection.

Next, run your Python file:

If no error messages are shown in the terminal, you have successfully connected to InfluxDB.

To follow best practices, you can store your credentials in an .env file. Create a file named .env and store the following information:

Then install the python-dotenv module to read the .env variables:

Finally, update your Python file to load the data from the .env file:

Note that you will need to change the url parameter if you are using an InfluxDB Cloud account. The URL will depend on which cloud region you chose. The cloud URLs can be found in the docs here.

The lines that are importing the DateTime module and the InfluxDB library will be required later on in the tutorial. It’s a good practice to have all your import statements together at the beginning. However, if you choose to, you can import them when necessary as well.

Alternatively, you can store your credentials in a file with the extension .ini or .toml and use the from_config_file function to connect to InfluxDB.

CRUD Operations with influxdb-client-python

This article used the yfinance module in Python to gather some historical stock data. Install it using the following command:

You can use the following code snippet to get the data:

Make sure to pass a filename parameter to the to_csv method; this will store the CSV locally so you can read the data later.

Alternatively, you can get the CSV file from the GitHub repo.

Next, create a class and add the CRUD operations as its methods:

If you are using a cloud instance of InfluxDB, you will want to replace the URL parameter with the proper cloud region.

To create an instance of the class, use this command:

Write Data

InfluxDBClient has a method called write_api which is used to write data into your database. Below is the code snippet for this method:

InfluxDBClient supports asynchronous and synchronous writes, and you can specify the write type as required. For more information about asynchronous writes, see “How to use Asyncio in influxdb-client.”

The data parameter can be written in three different ways, as shown below:

Line Protocol String

Note that the string has to follow a particular format:

There’s a space between the tagValue and the first fieldKey, and another space between the last fieldValue and timeStamp. While parsing, these spaces are used as separators; therefore, you have to format it in the manner shown above. Note also that in this case I assumed that the first field value, fieldValue1, is a string, while fieldValue2 is a number. Therefore, fieldValue1 should appear in quotes.

Note also that the timestamp is optional. If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine. You can read more about Line Protocol here.

Data Point Structure

If you do not want to deal with the format in the Line Protocol String, you can use the Point() Class. This ensures that your data is properly serialized into line protocol.

Dictionary Style

In this method, you’re passing two data points and setting the write option to ASYNCHRONOUS. This is Python-friendly, since the data is passed as a dictionary.

All the different ways to write the data are consolidated in the below gist:

Next, insert all the data for the MSFT stock and the AAPL stock. Since the data is stored in a CSV file, you can use the first method — Line Protocol String — to write the data:

You can insert the data for the AAPL stock by changing the file path and strings from MSFT to AAPL:

Reading the Data

InfluxDBClient also has a method called query_api that can be used to read data. You can use queries for various purposes, such as filtering your data based on a specific date, aggregating your data within a time range, finding the highest/lowest values in a time range, and more. They are similar to queries you would use in SQL. You’ll need to use queries when reading data from InfluxDB.

The following code is for our class’s read method:

Here, it accepts a query and then executes it. The return value of the query is a collection of Flux Objects that match your query. The Flux Object has the following methods:

Two query examples are shown below that demonstrate the query_data function in action. The first query returns the high value for MSFT stock since Oct. 1, 2021, and the second query returns the high value for the MSFT stock on 2021-10-29.

Make sure you change the bucket name in the beginning of the query as needed. In my case, my bucket name is *TestBucket*.

Updating the Data

Unlike the Write and Query APIs, InfluxDB does not have an Update API. The statement below is taken from their documentation about how they handle duplicate data points.

For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new fieldsets. For any matching field keys, InfluxDB uses the field value of the new point

To update a data point, you need to have the name, tag set, and timestamp and simply perform a write operation.

Deleting Data

You can delete data using delete_api. Below is some code demonstrating how to delete data:

Delete functions require the measurement value of the data point. The following code shows a simple use case of the delete function:

InfluxDB’s documentation includes a list of best practices for writing data. There are also some best practices for data layout and schema design, which you should follow for the best results.

Some Practical Use Cases of Time Series Databases

This article examined a simple use case of a TSDB to store stock values, so you could analyze historical stock prices and forecast future values. However, you could also work with IoT Devices, sales data, and any other data series which is time-varying.

Some other practical use cases include:

  1. Time series forecasting using Tensorflow and InfluxDB
  2. Integrating InfluxDB with IFTTT to monitor your smart home
  3. Monitoring your internet speed

Conclusion

Hopefully, this guide empowered you to set up your own instance of InfluxDB. You learned how to build a simple app to perform CRUD Operations using InfluxDB’s Python client library, but if you want to take a closer look at anything, you can find the repo with the entire source code here.

Check out InfluxDB’s open source TSDB. It’s got client libraries for ten programming languages including Python, C++, and JavaScript, and it’s also got a lot of built-in visualization tools so you can see exactly what your data is doing.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.

Featured image via Pixabay