How has the recent turmoil within the OpenAI offices changed your plans to use GPT in a business process or product in 2024?
Increased uncertainty means we are more likely to evaluate alternative AI chatbots and LLMs.
No change in plans, though we will keep an eye on the situation.
With Sam Altman back in charge, we are more likely to go all-in with GPT and LLMs.
What recent turmoil?
DevOps / Kubernetes / Software Development

Rethinking Java @Scheduled Tasks in Kubernetes

A new pattern for scaling Java scheduled tasks in Kubernetes using three open source dependency injection frameworks: Spring Boot, Micronaut and Guice with Java Spark.
Jan 20th, 2023 6:45am by
Featued image for: Rethinking Java @Scheduled Tasks in Kubernetes

Historically, most scheduled tasks in Java applications I’ve worked on have used Spring’s scheduling feature. Spring handles methods that you annotate with @Scheduled in the background of the application. This works fine if only one instance of the application is running.

However, applications are increasingly becoming containerized and are being run in container orchestration platforms, such as Kubernetes, to take advantage of horizontal scaling so that multiple instances of an application are running. This creates a problem in the way scheduled tasks have been used historically: Because scheduled tasks are run in the background of the application, we have duplicated (and possibly competing) scheduled tasks as we horizontally scale the application.

To address this problem of scaling Java scheduled tasks in Kubernetes, I’ve created a new pattern that works with three popular open source dependency injection frameworks: Spring Boot, Micronaut, and Guice with Java Spark. Let’s walk through the scenario below to understand the pattern.

The Scenario

Let’s say we have a requirement to run some business logic that lives in the service layer of a Spring Boot API as a scheduled task. For the purposes of this article, let’s say the service looks like this:

Historically, we would accomplish this by writing a class in the Spring Boot API that calls the service logic and annotate a method with @Scheduled, like so:

While this solution is straightforward, it limits our ability to scale the application horizontally in a modern container orchestration platform like Kubernetes. As this API horizontally scales to 2, 3, 4 … n pods, we’ll have 2, 3, 4 … n scheduled tasks duplicating the same scheduled task logic, which could cause duplicated logic, race conditions and inefficient use of resources.

There are solutions like ShedLock and Quartz that address this problem. Both ShedLock and Quartz use an external database to allow only one of the scheduled tasks in the n pods to execute at a given time. While this approach works, it requires an external database. Also, an instance of the scheduled task still runs in each pod, which consumes application/pod memory, even though only one of them will execute its business logic. We can improve these solutions by eliminating the multiple scheduled task instances altogether.

Is There a Better Way to Schedule Tasks in Kubernetes?

Yes, with Kubernetes CronJob. We can overcome these disadvantages by separating the concerns of running the scheduled task and serving the application. This requires us to expose the service logic as an API endpoint by writing a controller that calls the service logic, like this:

Next, we create a CronJob resource that will call this new endpoint on a set schedule:

Now we have a horizontally scalable solution.

However, what if we have a regulation that prevents us from exposing HelloService as an API endpoint? Or what if the security team said that we need to retrieve a JSON Web Token (JWT) and put it in the curl request’s Authorization header before calling the API endpoint? At best, it would require more time and shell expertise than the team might have and, at worst, this would make the above solution infeasible.

Is There an Even Better Way to Schedule Tasks in Kubernetes? 

Yes. We can alleviate these concerns by using Java’s multiple entry points feature.

However, the unique challenge in our case is that the service logic lives in a Spring Boot API, so certain Spring dependency injection logic needs to execute so that the service layer and all its dependencies are instantiated before an alternative entry point is executed.

How can we give Spring Boot the time it needs to configure the application before we run the alternative entry point? I found that the code below accomplishes this:

This pattern also works with other Java frameworks such as Micronaut and Guice with Java Spark, so it is relatively framework agnostic. Below is the same pattern using Micronaut:

The only major difference is that the class does not need an annotation, and the Micronaut equivalents of Spring methods are used (ex: Micronaut#run).

Here is the same pattern using Guice and Java Spark:

The main differences are that you retrieve the beans from the Guice Injector rather than from an ApplicationContext object like in Spring and Micronaut, and that there is a run method that contains all the controller endpoints rather than there being a controller class.

You can see these code samples and run them by following the directions in this repo’s README.

In each of these examples, you’ll notice that I control whether the alternative entry point’s logic is invoked by checking if an environment variable exists and, if it does exist, what its value is. If the environment variable does not exist or its value is not what we expect, then the HelloService bean will not be retrieved from the ApplicationContext or the Injector (depending on the framework being used) and will not be executed. While this is not exactly an alternative entry point, it functions in a similar way. Instead of using multiple main methods like traditional alternative entry points, this pattern uses a single main method and uses environment variables to control the logic that is executed.

Note that when using Spring and Micronaut, the applicationContext is closed using try with resources, regardless of whether the service method call executes successfully or throws an Exception. This guarantees that if an alternative entry point is specified, it will always result in the application exiting. This will prevent the Spring Boot application from continuing to run to service HTTP requests with the controller API endpoints.

Last, we always exit the JVM if an alternative entry point environment variable is detected. This prevents Spring Boot from throwing an Exception because the ApplicationContext is closed but the JVM is still running.

Effectively, this solution allows dependency injection to occur before the entry point routing logic occurs.

This solution allows us to write a Kubernetes CronJob resource that uses the same docker image that we would use if we were to run the Spring Boot application as an API, but we simply add an environment variable in the spec as seen below.

By using a Kubernetes CronJob, we can guarantee that only one scheduled task is running at any given time (provided that the task is scheduled with sufficient time between invocations). In addition, we did not expose HelloService through an API endpoint or need to use shell scripting — everything was implemented in Java. We also eliminated duplicated scheduled tasks instead of managing them.

I like to visualize this pattern as making a jar act like a Swiss Army knife: Each entry point is like a tool in the Swiss Army knife that runs the jar’s logic in a different way. Just as a Swiss Army knife has different tools, like a screwdriver, knife, scissors, etc., so does this pattern make a jar act on its embedded business logic as a RESTful API, scheduled task, etc.



Wouldn’t it be easier to write a @Scheduled method and disable it based on some configuration property?


First, it’s worth considering that other frameworks like Micronaut do not have the ability to disable a @Scheduled method. Moreover, Java Spark cannot schedule tasks. On the other hand, the pattern described in this article (I’ll call it the Swiss Army knife pattern) works across more frameworks than just Spring.

But even if your project does use Spring, one of the main disadvantages I see in using @Scheduled in general is that we’re requiring the Spring app to run 24/7 in order for the Spring task scheduler to run and invoke the @Scheduled task based on the cron schedule. This would require a Kubernetes pod that’s running 24/7 with the Spring app running inside it. I see this use of resources (and probably money) as unnecessary because Kubernetes provides its own task scheduler that we can take advantage of by creating a CronJob resource. Kubernetes resources will only be used for the life of the CronJob rather than having a pod running at all times with the @Scheduled task inside it.

In other words, I liken the @Scheduled and CronJob options to this: We wouldn’t spin up an EC2 instance and create a cronjob on the EC2 instance that invokes a Lambda function because we can invoke a Lambda function with a CloudWatch cron rule. One of the reasons why we don’t do this is because the EC2 instance would be more expensive compared to the free CloudWatch rule. Like the EC2 instance in this example, I see a @Scheduled pod as an unnecessary provisioning of resources because we already have a scheduling tool available in Kubernetes’ CronJob (which is like CloudWatch cron rules).


Does this pattern work in a multicluster environment?


This pattern has not been tested in a multicluster environment, and it likely would not work because this pattern does not include a way for a scheduled task running in Cluster A to be aware of another instance of the scheduled task running in Cluster B. Quartz and ShedLock use an external, centralized database to orchestrate these multicluster scheduled tasks. This pattern does not include an external database.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.