Modal Title
Python

Codon Is a Faster Python Compiler (for Some Workloads)

Created as a way to speed processing of genomic data, Cordon can compile some Python programs at a fraction of the speed of the Python compiler itself.
May 11th, 2023 3:00am by
Featued image for: Codon Is a Faster Python Compiler (for Some Workloads)

Python has a new compiler that comes with the claim that it makes the language faster than C/C++. But is it worth its salt?

Codon is a new Python compiler that aims to solve a few of the less favorable characteristics of the language — the global interpreter lock and type checking at runtime to name a few.

Codon’s own documentation states the high-performance compiler compiles Python code to native machine code with no runtime overhead. It also says the typical speedups over Python are on the order of 100x or more on a single thread. (A new blog post in Usenix also goes into great detail on the topic.)

Is Codon too good to be true?

Quick answer. No, Codon isn’t too good to be true but all your prayers aren’t answered. Codon is effective but isn’t an all-purpose solution. Codon was created for a specific purpose, as a domain-specific language (DSL) for working with genomics data, and though the project scope was eventually widened, it has features still related to the original goals. Codon’s trade-offs include losing some diverse capabilities of Python 3.10 for speed.

Today, the creators claim Codon is great for genomic and financial data, but also can provide faster performance for applications that use GPUs or are targeted for WebAssembly, as well as for speeding up Python libraries.

Here is straight Python code…


and time it takes to run the code on Python, and on Codon (according to Codon benchmarks):

Features of Codon

Codon was derived from Seq, a DSL that works with genomics data. Programs or scripts that analyze genomics data deal with massive amounts of data because a single sequenced genome consists of a five-gigabyte index and tens of gigabytes for tables. That meant the early goals for this project closely aligned with the following:

  • Allow researchers without programming training to use Pythonic syntax.
  • Support parallel execution.
  • Include specific features helpful for working with genome sequences.

As the story goes, the scope was eventually widened and eventually the more general-purpose-ish Codon we have today came to be.

The Codon pipeline: starting with the Pythonic code, Codon parses it into an abstract syntax tree, performs static type checking, converts the AST into Intermediate Representation, and performs optimizations or includes DSL extensions, before using LLVM for the conversion to native code.

Static Type System

Even though Codon uses a static type system, programmers using Codon type dynamically as they would in any other Python use case. Codon then implements duck typing (“if it looks like a duck, quacks like a duck, then it must be a duck” type guessing) to perform runtime type checking at compile time. Shifting the type checking left reduces the runtime overhead. This method also identifies errors earlier.

LLVM Backend

Codon uses a LLVM virtual machine as a backend and as a general framework for optimizations. Codon’s engineers chose LLVM because of the general flexibility with software and hardware systems. Compilers begin by parsing the input file then use a set of rules to convert the code into an abstract syntax tree (AST). Later versions of the Codon compiler perform type checking, convert the AST into immediate representations (IR) that get optimized, then converted to machine code through LLVM.

Multiple Threads

Codon uses OpenMP, an API for shared-memory multiprocessing. Using the @par decorator will indicate the loops that are candidates for multiple threads. @par expects several parameters similar to the pragmas used in C++.

Just-in-Time

Adding the decorator @codon.jit to functions that will benefit from parallel execution or compiling will result in parallel execution.

What’s the Catch?

 “Codon is not the same as Python” USENIX stressed. Codon is a working dev tool but it’s not a 1:1 with other Python compilers. Testing revealed some soft spots. Duck typing will cause problems when compiling existing scrips in Codon. USENIX also confirmed that Python 3.10 features weren’t fully implemented.

Codon takes Python as an input and produces executables, making the distribution of the code simpler while avoiding disclosure of the source. The LLVM backend makes it a potential great solution for people wanting to use Python for embedded projects. Here’s a Y Combinator thread with a ton of different opinions.

Conclusion

Codon engineers created a new way to support Python based on their own needs. The end result is a new compiler with unique optimizations. The developers of Codon formed Exaloop, a company to further the development of Codon. Commercial use of any version of Codon newer than three years old needs to be licensed but non-commercial users are welcome to experiment.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.