Let SQL Do All the Work: User Defined Functions
SQL user-defined functions are one way to optimize application performance, namely by moving operations from the application layer to the database layer.
There are numerous ways to optimize the application layer. For applications with SQL database layers, in some cases, it’s possible to just do less. When working with SQL databases, user-defined functions (UDF) is the “do less” option. The work still gets done, but the database layer is leaned on heavier, which has the end result of optimizing application performance, noted Charlie Custer, Cockroach Labs‘ senior technical content marketer, in a great introductory blog post about UDFs.
SQL UDFs are functions created to execute in a SQL database that can limit the amount of work going on between layers and optimize the application layer. UDFs can mutate data intentionally and unintentionally.
What Are SQL UDFs?
UDFs are functions. And they’re very similar to functions in all other programming languages, except they are executed by your SQL database software. Their characteristics include:
- a name, needed for the function invocation.
- arguments, inputs and their types accepted by the function.
- an output/return statement.
- a function body that includes the operations performed on the arguments to reach the outcome.
A SQL UDF create function in some databases looks like this:
- CREATE FUNCTION add() lets the database software know a function named add is being created.
- a INT, b INT lets the database software know to expect two arguments, both integers when running this function.
- RETURNS INT lets the database software know that an integer is getting returned.
- AS ‘SELECT a + b’ is the operation running in the function body.
The invocation looks like this:
Eight is the target output.
Here’s where the optimization kicks in. UDFs remove business logic from the application layer to the database which can optimize the application layer, “do less”, but still perform necessary operations. Reducing the amount and frequency of data traveling between application layers is also key for enhancing performance.
Consider that SQL UDF add() is invoked in the database with two values that exist in the database (a = 3, b = 5) making only one value returned to the application layer, 8. Without the UDF, two values would get returned and then the operation would take place in the application layer. More steps, less efficient.
But how useful is a function that adds two numbers and returns one? Not very. The function below is a UDF created by CockroachDB user Saquib Ali that takes timestamp data and converts into a more human-friendly, “x hours ago” format.
Similarly to other functions, UDFs can mutate data in SQL tables in unintended ways. Because of this, some databases offer space to provide information about it’s volatility, the extent to which the function will impact other data.
IMMUTABLE refers to the function not mutating any other data and LEAKPROOF means no side effects.
How to Create UDFs in SQL
Different SQL systems have different syntax but Custer provided some sample UDFs for various database systems:
CockroachDB looks similar to PostgreSQL because CockroachDB is compatible with the Postgres wire protocol.
This is but a scratch on the topic of SQL UDFs. They aren’t only used to optimize the application layer but the database layer as well. Here are some other great articles on UDFs although the complexity jumps off pretty steeply from here. Google and Microsoft also have some posts (Google BigQuery, Microsoft UDF) that include some specs geared towards their own database systems as well.