Wikipedia defines hell as “a place of torment and punishment in the afterlife.” It also defines Dependency hell but if you ask me, it should start with “a place of torment and punishment for lazy and reckless programmers.”
Dependencies — From 0 to ∞
The first language I learned back in the early 1980s was Applesoft BASIC supplied with the Apple II computers. BASIC is, well, basic. There is nothing that makes a program benefit from ready functionalities other than what’s provided by the language itself. Everything else, one had to write inside each and every program!
The language I learned to be able to consider myself part of the “advanced programmers” caste back in the early 1990s was Turbo Pascal. It had units which allowed one to write some functionality, save it in a file and then use it from an application or other units. One thing I clearly remember from those days is that in Pascal 5.5 there were no OOTB units to deal with mice. Every developer I knew had a big box of floppy disks under their desk and buried in there was a unit that provided this functionality. Yet developers preferred to inline assembly code. Programmers simply weren’t comfortable with depending on other people’s code without understanding what was “hidden” within the implementation.
Fast forward two decades and we meet the programmers who spend hours, if not days, searching for a library that performs a specific task. Then they spend a few more hours (or, god forbid, days) seeking confirmation from other programmers on Slack that said library indeed does the things it claims it does. My local Maven repo is 12GB and a colleague’s node_modules directories add up to 11GB. “Hello world” written in Spring Boot now produces a 15MB JAR file!
We favor Spring Boot for “an opinionated view of … third-party libraries.” We favor containers for an opinionated view of a runtime. We favor container orchestrators for an opinionated view of the network. We favor cloud for an opinionated view of infrastructure. None of this is necessarily wrong. It’s just that it covers up a fundamental problem by applying the (deliberately misquoted) fundamental theorem of software engineering:
“All problems in computer science can be solved by another level of indirection … except for the problem of too many layers of indirection.”
Consider the pseudocode of this MakeMeCoffee app:
If you could do run this, your code would depend solely on what it needs, assuming all those functionalities are following the same minimalistic approach. What renders this concept useless is distribution concerns (storing, updating and finding the right functionalities). Thus the obvious solution is to group the functionalities into larger units. So we import CoffeMachine and then consume the functionalities like CoffeMachine.makeCoffee() for example. Of course, that also means one gets tons of other functionalities that are not needed like cleanCoffeMachine, replacePartX, etc.
Now put yourself in the shoes of the coffee machine maker. All coffee machines use the same water tank, so it makes sense to have a WaterTank to which the checkWaterLevel and addWater functionalities naturally belong. Take into account all the other common parts and having one CoffeeMachineVendor module would make perfect sense. Walk down through all other vendor modules that CoffeMachine depends on and you get the famous “download the Internet” or what I prefer to call it, the WYSINWYG (What You See Is Not What You Get) dependencies. You don’t get what you see — you get a hell lot more! Often without even being aware of it.
From a consumer perspective it is best to depend on single functionalities but from a producer’s perspective, this is an unmanageable approach.
Let’s oversimplify everything for a second and say that:
- methods provide functionalities
- classes are collections of methods
- packages are collections of classes
- JAR files are collections of packages
- applications are collections of JAR files
From a consumer perspective, it does not make sense to depend on anything above “uniquely named classes.” From a provider perspective, it does not make sense to independently distribute anything below “JAR files.” The solution to satisfy both is right in between: packages, perhaps Java’s most ignored, misunderstood and misused concept.
Packages must group classes in a coherent way. The keyword here is “coherent”. Grouping classes into coffeemachines and watertanks, for example, might seem great to the producer but consumers would appreciate homeuser and technician more.
Thinking of packages this way highlights how important and powerful of a concept they are. Properly designed packages do not prevent producers from packing everything into one JAR file. However, doing so does allow consumers to pick only the packages that make sense for them and ensure they avoid deploying others that may cause unwanted side effects.
It takes some time and experience to realize that dependencies between packages make much more sense than dependencies between JAR files.
Modules are an abstract concept that not only group packages (and other abstract things like services that we’ll not talk about here) but also provide important meta-information about them and their relationships. A module’s meta information is where dependencies (ideally between packages) should be defined. This may sound strange at first but if all Java libraries adopted this concept years ago, the external dependency model that made Maven Central so popular wouldn’t be needed! The code is really the only source of truth!
Of course, for this to work, you need something that can process this meta information and wire things together: module systems. Before Java 9, the standard module system in Java was OSGi. Since Java 9 we have an additional standard, JPMS (a.k.a. Jigsaw), which is used to modularize Java itself. There are also other module systems, like JBoss Modules for example.
While module systems differ significantly in the assumptions they make and the restrictions they impose, they all serve the same purpose: to ensure that everything is properly wired. In a sense, module systems are your last and most reliable layer of integration tests!
How JPMS in Java 10 wiring (in its boot layer) looks can easily be visualized. There is a similar visualization that shows OSGi runtime with some common modules. In this case, the visualization only shows the wiring at the moment of the check due to the dynamic nature of OSGi. What is important in both cases is that all graphs are generated solely from the information provided in the modules’ metadata!
While both JPMS and OSGi can tell if a given set of modules can be wired together, the difference is in how they use metadata to empower external tools and to prevent things from breaking at runtime. But to understand that, we need to explore some more generic concepts first.
As functionalities evolve there must be a way to manage changes. The universal solution is to use versions. We’ve been doing this for decades on the wrong level, with — JAR files. Versioning JAR files does not tell what has changed internally. The functionality that consumer cares about could have changed in every single version of the JAR or it could have remained the same for years. On the other hand, versioning packages provides precious information to the consumer.
Another important aspect is where the version information is stored. We are so used to using external tools for that purpose, that for many people the idea that versions should be declared in the module’s metadata seems strange.
Versions are great, except when they’re not. Consider these two versions: 2016.03 and 2016.09. What do they tell you? Taken on their own, not much. Now imagine that somehow a tool knows that 2016.09 contains several new features but everything else remains exactly the same as in 2016.03. Let’s also imagine the tool knows the consumer does not use the new features. It could then confidently decide to assemble a runtime with the older version, saving not only disk space but also preventing potential classloading conflicts.
Tools can know this if programmers use semantic versioning:
- A version has 4 parts: major.minor.micro.qualifier
- A version policy states that change in:
- major indicates a breaking change
- minor indicates a backward compatible change (backward compatible API changes)
- micro indicates a bug fix (no API changes)
- qualifier indicates a new build (optional and not significant in version comparison)
Semantically versioning functionality (method, class, ..) is impractical. Semantically versioning JAR files or modules alone is useless as the scope is too broad. Semantically versioning packages is the middle ground which over the years has proven to be very powerful.
Consumers and Providers
If the nit-picky part of you reacted with “breaking who?” to the previous section, you’re spot-on.
Imagine a logging API having a Logger interface in a org.awesome.logging package in version 1.0.0 and then a new method is added to said interface. The few implementations of that API out there will break the moment someone calls that new method. However, hundreds of libraries using the logging API will not be affected at all; they can continue using the methods which are already in place.
Now consider a scheduler API having a Task interface in a org.awesome.scheduling package at version 1.0.0. If a new method is added, it will break hundreds of libraries implementing that interface to register tasks scheduled to be executed at a given time.
This example demonstrates why identical changes can’t be handled in the same way. Logger is an example of provider types while Task is an example of consumer types. Those distinctions are very important when determining what constitutes a breaking change. The new version of org.awesome.logging should be 1.1.0. — the minor version update indicates the change is NOT breaking for the consumers. The new version of org.awesome.scheduling should be 2.0.0. — a major version update indicating that the change breaks everyone.
Maven supports version ranges, but given the lack of metadata, all it can offer is some heuristics. On the other hand, semantically versioned packages, compiling against APIs (if there is no API the implementation is the API) and knowing how changes impact providers and consumers, allow us to confidently express dependencies with version ranges.
Say my.great.module uses some.great.api at version 1.0.0. Here is what we know for sure:
- if great.api contains Provider Type and my.great.module is a provider (implements or extends types) it can work with any implementation of some.great.api from 1.0.0 included to 1.1.0 excluded.
- if great.api contains Provider Type and my.great.module is a consumer it can work with any implementation of some.great.api form 1.0.0 included to 2.0.0 excluded.
- if great.api contains only Consumer Types then my.great.module can work with any implementation of some.great.api form 1.0.0 included to 2.0.0 excluded.
By using a module’s metadata from dependent APIs and applying the above rules, tools can resolve dependencies and update the metadata of our own module.
Untangling Dependencies in JPMS
Unfortunately, JPMS decided to not only ignore versioning but also to totally ignore package dependencies and rely on module dependencies instead. JPMS authors claim this is easier but I think they’re confusing ease with familiarity. The bottom line is that JPMS does not help much in escaping from dependency hell. JPMS will report a missing module in the same way that Maven reports a missing JAR file. But JPMS will not guarantee that wired modules can, in fact, work together, as this small demo illustrates. On the flip side, its design introduces constraints, the first casualties of which have been reported:
The "bytelist" library, a single-class library that provides a byte wrapper similar to StringBuilder, is our first #JPMS casualty. It split the org.jruby.util package, but since it's part of our public API we can't move it. So it has been absorbed back into JRuby.
— Charles Nutter (@headius) September 14, 2018
Untangling Dependencies in OSGi
Handling dependencies in OSGi requires a change in mindset. It is crucial to understand the topics discussed above and have the right tools at hand. Then it all boils down to two basic things:
Provide proper and complete metadata.
Very hard and time-consuming if done by hand but super easy with proper tooling. Package versions are typically provided via @Version. Semantic versioning is enforced via Bnd’s baselining (which is also available as a Maven plugin). Exported packages are marked with @Export and providers and consumers with @ProviderType and @ConsumerType. Version ranges of dependent packages are automatically generated by build tools using Bnd for bytecode analysis.
Understanding the resolver errors.
This is what deters most programmers. It’s often hard to make sense of error messages, let alone figure out a fix. However, in most cases the reason is one of the following:
- Missing or incorrect metadata: This is often the case with legacy libraries. It can be solved by augmenting or wrapping Alternatively, there are repositories of already modularized popular JARs.
- Missing module that provides what’s expected: Similar to JPMS finding the missing modules is a developers’ task. Unlike JPMS, thanks to verbose metadata and package level dependencies, OSGi will not allow incompatible modules to pretend it’s all good.
“One Man’s Dream … Is Another Man’s Nightmare”
Module systems are not something one can learn overnight and are likely to cause a lot of frustration at some point. Many people think the equation “problems module systems solve — problems module systems introduce” results in a negative number.
JPMS bets on familiarity and therefore appears easier at first glance. However, it allows for the wiring of systems that will surely break at runtime even though all modules follow the rules. On the other hand, OSGi tries really hard to prevent things it knows will break at runtime from running. Once mastered it is really powerful and becomes hard to give up on, but the road to that level is full of frustrations and ridiculous error messages.
So you’ll have to pick your own poison! Whatever path you choose is, I hope this article will help you escape from Dependency hell!
Feature image via Pixabay.
The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: MADE, Famous.