A Brilliant Use for Graph Databases: Mapping Legacy Software

Your team is modernizing an application in a long-established legacy environment. The software has been around for years, maybe even decades, and your company relies on it for daily operations. But the software has become so complex over time that it’s too difficult to add new features, functions, and capabilities — which is frustrating because you know you could easily add them to an application you built yourself with contemporary resources and methods.
Where do you start? First, you’ll need to map the venerable old software’s operations so you can clearly understand how it does its work. But, of course, that’s hard to do because developers at the time often didn’t differentiate between layers of responsibility such as data management, user interface, and business logic layers. Instead, they would often combine functionality into one mass of code. (And even if there is documentation, sadly, such documentation is rarely complete, up-to-date, or accurate.)
You’ll need to analyze the monolithic software so you can clearly understand the relationships and paths and data sources. Then you can rebuild those functions so they’re handled by independent libraries and logical frameworks.
Graph, also known as graph databases, can help your team make quicker work of this difficult first step.
I came across this idea when Stephan La Rocca from PITSS GMBH, an Oracle Partner, showed how he uses Oracle Graph to find linked software modules, clustered modules, and data flow patterns within legacy software. This article will give provides an overview of how and why to use graphs for this purpose.
Why Use a Graph?
Graphs contain nodes, edges, and properties; all of which are used to represent and store data in a way that relational databases are not equipped to do. Graph analytics is another commonly used term, and it refers specifically to the process of analyzing data in a graphical format using data points as nodes and relationships as edges. Graph analytics requires a database that can support graph formats; this could be a dedicated graph database like Neo4j or Cassandra, or a converged database that supports graphs along with multiple other data models.
Bounded Context
Bounded context is a process that breaks down monolithic applications into clusters of functionality and relationships in a bounded context. This enables you to develop and deploy the application’s functionalities in a way that affects only a few clusters, minimizing changes outside of those contexts. The more contextually independent changes to the application, the easier the effort—and the easier the testing later.
In Oracle Graph, for example, there are pre-built algorithms that gives cluster advice based on complexity of the application. This helps define the bounded context without domain knowledge. With the information, you can start to understand the reasons and sub-domains and create borders while forming bounded context for a cluster.

Figure 1: Illustration shows how bounded context is identified based on dependencies.
Below is a system flow of an application’s logic between different modules. The modules are clustered based on bounded context. Each cluster has its own bounded context that manages the cluster. As you can see in this scenario, the ‘Customers’ cluster consumes data from many clusters, while its own data is consumed by only a few other clusters.
Foreign Key Relationships
Many legacy applications fail to have well-defined foreign key relationships, so it’s necessary to create data model analytics. For example, to help you do this, Oracle Graph allows you to change the data model and assign foreign keys between tables; helping to identify the user interface tables with a ‘where’ clause, or tables that join Data Manipulation Language (DML) statements. Using this information, you can now identify where the user interface creates a master detail declaration between two tables. So, even without knowing the foreign keys yourself, you can still understand relationship between tables and identify important tables in the data model.

Figure 3: Bounded context and clusters defined using Oracle Graph.
Importantly, this information can help you understand user behavior. Oracle Graph helps visualize and analyze these complex relationships to identify user interaction and business processes, and how these runs through the application.
Conclusion
I hope this article demonstrates how graph databases and tools can help you understand a legacy monolithic application by providing a clear dependency analysis of various contexts, and by helping the team identify and visualize business processes. I cover this in more detail and look at another helpful use case here.