Why Debugging Doesn’t Need to Be So Complex or Outdated
VMware sponsored this post.
await. Kotlin deserves special mention here, because it supports coroutines through a single solitary syntax element:
suspend functions. Either way — reactive programming or coroutines — the result can be a convoluted execution graph.
In this environment, developers are no longer able to see the forest for the trees. The most common refrain I get is, “how can I better support debugging?” Developers are persuaded by the benefits, but can’t figure out how to get started. There’s no clarity around the debugging workflow. Of course, you can still put breakpoints in the pipeline, but it’s a little more involved.
If you understand what’s going on under the hood, debugging is second nature. But if you’ve been abstracted away from the depths of the frameworks (which most software developers now are), debugging becomes quite tricky. Most folks struggle to map their mental model against what the debugger is showing them.
In the reactive world, you don’t throw exceptions — you return a reactive type and handle things that way. The question is, “that exception, at best, contains one thread of exception: how do I get the assembly trace across all of the different threads”? We can do that, but it’s expensive (from a runtime perspective) to keep all of that data. It’s an assembly trace, like a stack trace but across all of the pipeline.
Spring has 19 years of history and has released initial reactive support — in the form of integrated support in Spring Webflux and elsewhere — over the past three years. I hope that everyone starts their new projects in reactive programming.
Project Loom, well, looms over us and promises us lightweight, “green threads” in Java — kind of like Goroutines in Go. Thousands and thousands of threads being scheduled in one “real” thread through time slicing, without knocking your computer down: that’s awesome!
However, there’s a tension between reactive programming and Project Loom. Reactive programming requires you to change the way you write code, to chaining together function calls. In exchange, you get many more opportunities for your code to yield the thread that it’s using back to the runtime. This in turn means you’re handling more requests with the same computer, so you’re getting better scalability. In Java, this requires explicit changes to your code. In Kotlin, with its coroutine support, you get a very nice syntax for expressing asynchronous computation while still writing what looks like synchronous code.
What’s in a scheduler? Quite a bit. Being able to switch one boolean value, and now my web server can handle tens of thousands of threads? That’s awesome! But it doesn’t solve the control flow issue.
We’re heading to a world where you’re doing Reactive programming (coroutines & Kotlin), or Project Loom and green threads, or — hopefully — both!
And, of course, what’s old is new again. The first version of Java had its own scheduler — it had green threads — but it introduced considerable complexity, and it stifled the ability to port to different operating systems. Programmers said, “screw it, Moore’s law is still a thing. In the meantime, we can add more RAM and more CPU.”
Now, here we are, more than 20 years later, and we’ve circled back around. Java is moving back to green threads.
In Go, you get these super cheap, lightweight green threads. The compromise is that if you want to benefit from these, you have this one paradigm for concurrency: Go routines. If you’re willing to express your algorithm in terms of those APIs, then you get thousands and thousands of threads for free.
Java threads are an opaque box and if you just put something in there, you get the run of the system and can pretend that your brand new thread gives you a computer to yourself. It doesn’t compel the developer to change the way they write their code.
JVM developers took a different approach: what if we introduced a green scheduler? And then rewrote all of the input-output stuff to provide hooks, to see when we’re blocking? The goal is to maintain the contracts. Of course, that only works for the code the JDK team has updated. You do a blocking read, and it will yield to another thread. It’ll end up being an asynchronous read.
Why would you go all-in on GraalVM (an Java JIT replacement and an ahead-of-time compiler) today if you know something new is on the horizon? The reality is that GraalVM is something that is here today. Twitter is using GraalVM as a JIT, but not a native image compilation.
It’s not like there’s an alternative, because the alternative is the status quo. As we go into this containerized world, packing more microservices into containers, GraalVM is a big deal. Project Leydon, which looks to incorporate ideas from GraalVM into the standard JDK, is also a big deal.
Another exciting thing is potentially coming from Oracle: Project Panama, announced at the same time as Loom. Like JNI, but much nicer. Imagine GraalVM native images that talk to the kernel and have tens of thousands of native threads. Now we’re starting to erase some of the perceived benefits of native code, all while still writing code for the JVM.
Pulling the Threads Together
In order to deal with this increasingly complex world, we’ve built elements of Spring to now do the ahead of time compilation. We have a native image feature. GraalVM needs to be forewarned about any “funny business” — any dynamic loading of anything — any dynamic proxies, any URL resource loading, any refraction, etc. All of these are things that the compiler needs to know about upfront.
You need to capture all of that information for the native image compiler; and that work is not for the faint of heart.
Between the return of threads and the exponential abstraction of developers from the frameworks holding up these threads, the process of debugging in Java is only becoming more complicated. As a result, I predict that we will see many more companies begin to adopt solutions that provide live Software Failure Replay technology. This world isn’t slowing down — it’s only speeding up — and there simply aren’t enough superhuman developers in the world to keep up. Without the ability to see the entire execution in development and production, software failures or failures-in-the-making will only pile up.
Feature image via Pixabay.