Modal Title
Data Science

How Graph Databases Uncover Patterns to Break Up Organized Crime

May 31st, 2014 11:31am by
Featued image for: How Graph Databases Uncover Patterns to Break Up Organized Crime
Editor’s Note: An example of how graph databases detect events to help predict an outcome — AW.

Banks and insurance companies lose billions of dollars every year to fraud. Traditional methods of fraud detection play an important role in minimizing these losses. However, increasingly sophisticated fraudsters have developed a variety of ways to elude discovery, both by working together and by utilizing various other means of constructing false identities.

Graph databases offer new methods of uncovering fraud rings and other sophisticated scams with a high-level of accuracy and are capable of stopping advanced fraud scenarios in real time.

While no fraud prevention measures can ever be perfect, significant opportunity for improvement can be achieved by looking beyond the individual data points to the connections that link them. Oftentimes, these connections go unnoticed until it is too late—something that is unfortunate, as these connections often hold the best clues.

Understanding the connections among data, and deriving meaning from these links, doesn’t necessarily mean gathering new data. Significant insights can be drawn from one’s existing data, simply by reframing the problem and examining it in a new way: as a graph.

Unlike most other ways of displaying data, graphs are designed to express relatedness. Graph databases can uncover patterns that are difficult to detect when using traditional representations such as tables. An increasing number of companies are using graph databases to solve a multitude of connected data problems, including fraud detection.

Example: First-Party Bank Fraud

First-party fraud involves fraudsters who apply for credit cards, loans, overdrafts, and unsecured banking credit lines with no intention of paying any of them back. It is a serious problem for banking institutions. American banks lose tens of billions of dollars every year to first-party fraud, which is estimated to account for as much as one-quarter or more of total consumer credit charge-offs in the United States. It is further estimated that 10-20 percent of unsecured bad debt at leading US and European banks is misclassified and is actually first-party fraud.

The surprising magnitude of these losses is likely the result of two factors. The first is that first-party fraud is extremely difficult to detect. Fraudsters behave much like legitimate customers do, until the moment the fraudsters “bust out,” cleaning out all their accounts and promptly disappearing. A second factor—which will also be explored later in greater detail—is the exponential nature of the relationship between the number of participants in the fraud ring and the overall dollar value controlled by the operation. This connected explosion is a feature often exploited by organized crime. However, while this characteristic makes these schemes potentially very damaging, it also renders them particularly susceptible to graph-based methods of fraud detection.

Typical Scenario

While the exact details behind each first-party fraud collusion vary from operation to operation, the pattern below illustrates how fraud rings commonly operate:

  1. A group of two or more people organize into a fraud ring.
  2. The ring shares a subset of legitimate contact information, for example, 
phone numbers and addresses, combining them to create a number 
of synthetic identities.
  3. Ring members open accounts using these synthetic identities.
  4. New accounts are added to the original ones: unsecured credit lines, 
credit cards, overdraft protection, personal loans, and so on.
  5. The accounts are used normally, with regular purchases and timely 
  6. Banks increase the revolving credit lines over time due to the 
observed responsible credit behavior.
  7. One day, the ring “busts out,” coordinating the members’ activity, maxing out all 
of the ring’s credit lines, and disappearing.
  8. Sometimes fraudsters will go a step further and bring all of their 
balances to zero using fake checks immediately before the prior step, 
doubling the damage.
  9. Collections processes ensue, but agents are never able to reach the 
  10. The uncollectible debt is written off.

To illustrate this scenario, let’s take a (small) ring of two people colluding to create synthetic identities:

  • Tony Bee lives at 123 NW 1st street, San Francisco, CA 94101 (his real address) and gets a prepaid phone at 415-123-4567
  • Paul Favre lives at 987 SW 1st Ave, San Francisco, CA 94102 (his real address) and gets a prepaid phone at 415-987-6543

Sharing only a phone number and address (two pieces of data), they can combine these to create 22 = 4 synthetic identities with fake names as described in Diagram 1 below.


Diagram 1 shows the resulting fraud ring, with 4–5 accounts for each synthetic identity, totaling 18 total accounts. Assuming an average of $4K in credit exposure per account, the bank’s loss could be as high as $72K.

As in the process outlined above, the phone numbers are dropped after the bust-out, and when the investigators come to check out these addresses, both Tony Bee and Paul Fabre (the fraudsters, who really live there) deny ever knowing John Smith, Frank Vero, Mike Grat, or Vincent Pourcent.

Detecting the Crime

Catching fraud rings and stopping them before they cause damage is a challenge. One reason for this is that traditional methods of fraud detection are either not geared to look for the right thing: in this case, the rings created by shared identifiers. Standard instruments—such as a deviation from normal purchasing patterns—use discrete data and not connections. Discrete methods are useful for catching fraudsters acting alone, but they fall short in their ability to detect rings. Furthermore, many such methods are prone to false positives, which creates undesired side effects in customer satisfaction and lost revenue opportunity.

Gartner proposes a layered model for fraud prevention, which can be seen below:


 It starts with simple discrete methods (at the left) and progresses to more elaborate “big picture” types of analysis. The rightmost layer, “Entity Link Analysis,” utilizes connected data to detect organized fraud. As will be shown in the following sections, collusion of the type described above can be very easily uncovered—with an extremely high probability of accuracy—using a graph database to carry out entity link analysis at key points in the customer life cycle.

Entity Link Analysis

We discussed earlier how fraudsters use multiple identities to increase the overall size of their criminal takings. It’s not just the dollar value of the impact that increases as the fraud ring grows, it’s also the computational complexity required to catch the ring. The full magnitude of this problem becomes clear as one considers the combinatorial explosion that occurs as the ring grows. In the diagram below, one can see how adding a third person to the ring expands the number of synthetic identities to nine:


A ring of n people (n≥2) sharing m elements of data (such as name, date of birth, phone number, address, SSN, etc.) can create up to nm synthetic identities, where each synthetic identity (represented as a node) is linked to m × (n-1) other nodes, for a total of (nm × m × (n-1)) / 2 relationships.

Likewise, four people can control 16 identities and so on. The potential loss in a ten-person fraud bust-out is $1.5M, assuming 100 false identities and three financial instruments per identity, each with a $5K credit limit.

How Graph Databases Can Help

Uncovering rings with traditional relational database technologies requires modeling the graph above as a set of tables and columns and then carrying out a series of complex joins and self-joins. Such queries are incredibly complex to build and expensive to run. Scaling them in a way that supports real-time access poses significant technical challenges, with performance becoming exponentially worse not only as the size of the ring increases but also as the total data set grows.

Graph databases have emerged as an ideal tool for overcoming these hurdles. Languages such as Cypher provide a simple semantic for detecting rings in the graph, navigating connections in memory and in real time.

The graph data model in Diagram 4 below represents how the data actually look to the graph database and illustrates how one can find rings by simply walking the graph:



Augmenting one’s existing fraud detection infrastructure to support ring detection can be done by running appropriate entity link analysis queries using a graph database and running checks during critical stages in the customer & account life cycle, such as:

  1. At the time the account is created.
  2. During an investigation.
  3. As soon as a credit balance threshold is hit.
  4. When a check is bounced.

Real-time graph traversals tied to the right kinds of events can help banks identify probable fraud rings: during or even before the bust-out occurs.


Sophisticated criminals have learned to attack systems where they are weak. Traditional technologies, while still suitable and necessary for certain types of prevention, are not designed to detect elaborate fraud rings.

This is where graph databases can add value. Uncovering fraud rings is an important part of any fraud detection strategy. Connected analysis using graph databases is a useful technique for uncovering rings: not only after the fact but also in real time.

An increasing number of companies are using graph databases to solve a variety of connected data problems, including fraud detection.

Philip Rathle is vice president of products for Neo4j. Neo4j is a leading graph database, with a ten-year history of 24×7 production deployments.

Gorka Sadowski is founder and CEO of akalak, which provides technology and cybersecurity solutions and services.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Real.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.