Excel: The Functional Programming Tool You Didn’t Know You Had
Haskell creator Simon Peyton Jones recently announced he was leaving Microsoft Research. His work there has focused on educational computing, building on his functional programming research. In recent years he’s taken an interesting diversion, working with the Excel functions team to extend the capabilities of the functions programming environment.
Thanks to functions, Excel has become one of the most popular development tools. Many businesses are built on top of Excel spreadsheets, with code that’s been written by folk who don’t know that they’re developers, let alone that they’re using what at heart is a functional programming language. As Peyton Jones noted in a 2003 paper, the structure of a spreadsheet is somewhat unique, forcing development to take functional routes.
We call Excel code “functions,” because they implement mathematical operations on cells.
Excel as Code
If you lay out those cell formulae in a text file, you can quickly see its code. Cell numbers are variables and you’re writing code that links cell to cell, with changes rippling through your spreadsheet.
Traditional programs are perhaps best thought of as single-dimensional, with each step in a procedure making changes to variables.
The structure of a spreadsheet changes that, giving you a two-dimensional structure, where operations fan out across your programming surface.
It’s a very simple version of the actor/message pattern. Each cell is an individual actor containing a very simple functional program, where you’re limited to the Excel-provided functions, and any change to a cell becomes a message that triggers a function in another cell (or in several other cells).
Peyton Jones’ early papers on the subject, from 2003, show the direction he’s guided the development of the Excel formula language, filling in core primitives that have allowed Microsoft to extend the language without breaking the billions of cells of existing code. It’s a set of changes that started with new data types, giving cells a more object-like behavior while still allowing the original strings and numbers to work.
Brian Jones, who is head of product for Excel at Microsoft, describes this process as “thinking through what are the core primitives that are missing, that would make [Excel] robust, less error-prone, and actually make it so people can do more.”
Those new data types have helped extend Excel’s role as a data conversion tool.
If you’ve used recent versions of its data import tools, you should be familiar with these new features, as they make it easier to take data from many different sources and work with it in the familiar spreadsheet.
I’ve recently been using it to analyze large-scale JSON data structures, converting what would have been MongoDB or similar JSON documents into spreadsheets and then using formulae to process that data.
Extended data types have allowed Microsoft to extend Excel into other data platforms, for example with its partnership with Wolfram. Here data in a spreadsheet can be linked to Wolfram’s curated content, allowing it to fill in additional relevant data. Geographic data can be linked directly to related demographic information, or food to nutritional data, with over a hundred data types.
…Then Named Variables
Adding new types is only part of what’s required to make Excel’s built-in programming tools more of a general-purpose programming language. Over the last year or so Microsoft has begun to add those features, first adding a new LET directive to provide in-formula variables that aren’t tied to cell contents that can be shared across the formulae in a spreadsheet.
LET provides two important features. Firstly it gives you a tool to name your expression’s values, rather than referring to them as cell references. That makes your Excel code much easier to read, as users don’t have to keep looking back at a cell to understand the context of the code they’re working with. Secondly, it speeds operations up. Without LET, if you copy an expression into another cell, it calculates it again, even if you’re only intending to reuse the result in a different calculation. Using a LET, it’s only calculated once and can be referred to many times across your application. If an expression is used a lot, this can add up to a significant performance boost.
A LET statement is structured as a set of key-value pairs, associated with a calculation. The key is the name, the value is a standard Excel cell reference. This does change the structure of an expression, but overall, the benefits outweigh the extra typing.
…Finally LAMBDAs and Excel Is Turing Complete at L`ast!
While the LET statement makes Excel formulae clearly more code-like, it was the introduction of LAMBDA that changed Excel programming completely, allowing you to take your functions and wrap them up as repeatable, reusable functions. Now code can be written once and used across a spreadsheet without the risk of copying errors or typos. What’s more, LAMBDAs are recursive, so a formula can call itself — or better still, call another LAMBDA, allowing you to quickly build complex program structures in a few cells. As Jones notes, that means “Excel is now Turing complete!”
A LAMDA has a very simple structure. It takes one or more inputs and then applies a calculation on those values, returning a response in its cell. You can test a LAMBDA by adding cell identifiers to your formula (these can be removed when you’re ready to publish it as a named function). So a basic LAMBDA to add two numbers would look like this:
To test it on values in cells A1 and A2 you simply add cell references.
Once you’re sure that your LAMBDA is ready for use, you’ll need to step out of the spreadsheet cells for a moment to create a named function using Excel’s Name Manager.
Create a new name, add a scope, any comments, and your LAMBDA code (without test cell references). Once assigned to a name you can then call the function by the new name, passing cell values at the same time, treating it like any of Excel’s built-in functions.
Name Manager isn’t the best user experience for application development, but for now it fits into the way Excel does things. That’s perhaps the one downside of the Excel functional programming experience, that sometimes you have to step out of the flow of in-worksheet development and use other tools.
But it’s also the first attempt at a new way of doing things, so we can expect changes in the future that will make Excel development more natural, with better debugging and code sharing options as well.
The combination of LET and LAMBDA fills out a big hole in Excel programming, giving us the tools we need to quickly build complex custom functions.
The result is a language that offers many of the features we expect from functional programming, with the accessibility we expect from productivity tools.