How LLMs Helped Me Build an ODBC Plugin for Steampipe

I’d written my first two Steampipe plugins (Hypothesis, Mastodon) in the pre-LLM era, so I was eager to work with my team of assistants on my next project: a plugin for ODBC (Open Database Connectivity).
Steampipe nominally maps APIs to database tables. When you select * from aws_sns_topic
, Steampipe winds up calling the AWS ListTopics API. Many Steampipe plugins work like that: a table corresponds to a specific API call.
But some plugins work in a more general way. The net_http_request table of the Net plugin turns Steampipe into an HTTP client. The exec plugin creates a SQL facade over the universe of shell commands, and the Terraform plugin does the same for infrastructure-as-code config files. By broadening the definition of what counts as an API, Steampipe keeps extending its embrace of structured data in all forms.
Databases present another kind of API. Steampipe plugins for databases can’t use fixed schemas, but rather must discover them on the fly. When the plugin SDK added support for dynamic schemas, the CSV plugin was the first to use the feature. So it became one inspiration for an ODBC plugin that would create a SQL facade over any database with an ODBC driver.
Another inspiration was the Postgres plugin from Jose Reyes. (Which, to be clear, is just a small part of his deep dive into Steampipe.) The Postgres plugin enables Steampipe queries against remote Postgres tables
See These Examples? Do the Analogous Thing for ODBC.
That was the dream. Hey, it never hurts to ask, right? But that wasn’t a great use of my team. I wasn’t able to get ChatGPT, Sourcegraph Cody, or GitHub Copilot to extrapolate from the examples to anything close to a working plugin. Instead, as usual, we broke things down into manageable chunks. And as usual that worked well.
Here’s a small example of useful assistance. The plugin requires a config file that defines ODBC data sources and table names. Those definitions are written in HCL. It took some iteration to arrive at a format that would work with the Steampipe config schema. Back-and-forth dialogue with the team helped me arrive at this ODBC connection format.
1 2 3 4 5 6 7 8 |
connection "odbc" { plugin = "odbc" data_sources = [ "SQLite:foo", "PostgreSQL:jose" ] } |
Given that, the LLMs were then able to write the boilerplate code needed for the plugin’s config machinery. Small things like that add up.
Showstopper and Workaround
ODBC is a portal to a universe of data sources. First, you install a driver manager (like unixODBC on Linux), and then you add drivers that talk to SQLite or Postgres or sources that aren’t even databases (they’re portals to other universes of data sources). One such portal is CData, which offers a broad set of ODBC drivers, some that overlap with Steampipe plugins and many that don’t. That sounded like an interesting first test for the plugin so I installed CData drivers for RSS and Slack and began working on getting the plugin to discover their schemas.
When I tried calling the ODBC driver from the plugin’s initialization phase, though, nothing worked; and there were ominous messages in the log about low-level OS signal handling. It was debugging that was beyond me — was it Steampipe? CData? unixODBC? a combination? — but I wanted to make progress if possible. So I tried a few workarounds: guarding the plugin’s use of the ODBC driver with a mutex, fiddling with timing, and — what finally worked — running schema discovery after initialization and caching schemas on the file system. It’s “a bit hacky,” said ChatGPT. But the speed with which I was able to iterate through those options, with its help, made all the difference.
Schema Discovery
Steampipe plugins are written in Go, and they rely heavily on its ecosystem of data-source SDKs. The best option for the ODBC plugin was github.com/alexbrainman/odbc. It works well and supports some introspection, but the most universal method seems to be the dumbest: select one row of data, capture column names, and try to sniff their types. The LLMs (mostly ChatGPT) made quick work of implementing that strategy.
We did discuss its flaws. For example, what if the sample’s first row contains nulls? That isn’t a fatal flaw, though, it just means the column will always be a string type, and the Steampipe query author will have to write where number::int > 1
instead of where number > 1
, which isn’t the end of the world. We also agreed that if the plugin survives and matures, it might be worth investing in a way for users of the plugin to provide hints that activate database-specific discovery mechanisms. But meanwhile, the dumb method was good enough to keep moving.
Implementing SQL-to-SQL Pushdown
Here’s a query to find open issues assigned to you.
1 2 3 4 5 6 7 8 |
select repository_full_name, number title from github_my_issue where state = 'OPEN'; |
If the GitHub plugin didn’t implement pushdown, Steampipe would map the query to the GitHub API that lists all your issues and return a table with all of them. Steampipe’s Postgres engine would then apply the WHERE condition to filter results to just open issues.
Of course, you’d rather push that filtering down into the API where possible. So what actually happens here is that the plugin defines state
as an optional key column (aka qualifier or “qual”). When the query includes where state = 'OPEN'
, the plugin adjusts the API call to include that filter condition.
The same idea works with plugins where the API is SQL. You can see that happening here in the Postgres plugin. The table definition‘s List
function sets up all the columns in each discovered schema as optional key columns, so any or all can be mentioned in the Steampipe WHERE clause and pushed down into the WHERE clause handled by the remote Postgres.
It’s the same with the ODBC plugin. You can’t tell just by looking at this query.
1 2 3 4 5 6 7 8 |
select name, number, _metadata from odbc.sqlite_foo where number = 1 |
1 2 3 4 5 |
+------+--------+-------------------------------------------+ | name | number | _metadata | +------+--------+-------------------------------------------+ | jon | 1 | {"connection_name":"odbc","dsn":"sqlite"} | +------+--------+-------------------------------------------+ |
But under the covers, because the plugin implements pushdown, its debug log shows that the WHERE filter is handled by SQLite, not by Steampipe.
ChatGPT didn’t get this right on the first try. Despite the clear example of the Postgres plugin, it offered a partial solution that correctly adjusted the SQL passed to SQLite but neglected to define the optional key columns. That was an easy fix, and together we implemented the feature much more easily than I’d have done on my own.
Test Strategy
I asked my team to discuss ways to test the plugin, and the responses were pretty good all around. Copilot proposed a reasonable test for the getSchemas
function, but after resolving hallucinations there were still issues getting it to run. Logging was problematic, as was mocking the database connection.
Working through these issues was far easier with LLM assistance than it would be otherwise. So easy, in fact, that I lost the thread. It was interesting to learn details about how to interact with the plugin SDK’s logging mechanism, and how to work with mocked connections to ODBC drivers. With the ability to rapidly iterate through solutions, I made rapid progress. But as the test code grew more complex, it all started to feel like too much effort for too little return.
So I made the executive decision to switch to an end-to-end testing strategy: populate various ODBC sources with sample data, and run Steampipe queries against them. I’ve found LLMs to be spectacularly good at generating test data. In this case, for starters, that meant writing a standalone program to populate a SQLite database. All three assistants did that easily, but ChatGPT’s version was the most interesting. Given our discussion of the first-row-sampling strategy, it “knew” the first row should contain nulls.
Post Mortem: Review and Explain
Finally, I invited the team to review the code and explain how it all works. ChatGPT, which had plenty of context, did a great job. Because Cody and Copilot hadn’t participated as much they had less context, and I think that made this a useful test. The ability of LLMs to help you orient to unfamiliar code is a key strength.
Both Cody and Copilot produced useful explanations. Given that both can see the local repo where the code lives, though, I was surprised to see Copilot hallucinate the names of files and functions that Cody got right.
I then asked Cody and Copilot to evaluate the strategy for schema discovery. I’d already discussed this extensively with ChatGPT and had concluded the obvious flaw — that sampling the first row risks finding nulls for some columns — was an acceptable risk for a first version of the plugin that might later be enhanced with database-specific logic.
Cody’s more complete and cogent response noted the key flaw, Copilot’s skimpier response missed it.
In general, I’m finding it helpful to ask LLMs to review both code and prose. When the rubber duck talks back, the responses may or may not be useful and accurate. But either way, the interaction can prompt you to think differently about what you’re trying to do. It feels intrinsically valuable.