Build a Movie Database with Neo4j’s Knowledge Graph Sandbox

A lot has been written about the rise of the citizen developer or even the citizen data scientist, and when it comes to the creation and use of knowledge graphs, they can do some pretty amazing things.
Definitions of knowledge graphs vary, but for the purposes of this article in which we walk you through the build of a Neo4j graph, a knowledge graph is a visualization of the connected nature of different data sets. It can be thought of as an augmented model view of a master data-management solution that shows how different groups, objects or other data points are connected.
For this tutorial, we’ll show how to amass otherwise disparate databases of actors, movies and directors and show how they are connected. The idea is to provide users with simplicity for such graph tools such as those that Neo4j provides. Specifically, we’ll use Neo4j’s Sandbox with Neo4j’s Cypher language to visualize in data graphs movies released after 2000 while limiting the results to a specific number, such as five movies. Actor, producer and other connections to those movies are also visualized. This data graph can be generated in with just a few clicks on the Sandbox site, using Cypher to query the Neo4j Database.
Once these databases have been selected, Neo4j automates the sandbox’s build. It’s really that simple as we’ll see. So let’s get started creating our movie database by first accessing Neo4j’s Sandbox page and either registering or logging on.
Select Neo4j Sandbox Under Get Started:
Sign up or create an account with Google, Twitter or LinkedIn:
Select “For Developers,” which will automatically un-select “For Data Scientist” and then Select the Movies Dataset:
Select Create at the bottom left-hand of the screen:
Select Open with Browser:
1 2 3 |
MATCH (m:Movie) WHERE m.released > 2000 RETURN m LIMIT 5 |
The next steps in the Sandbox offer a summary of Cypher and related descriptions of knowledge graph terms such as Nodes and Relationships, Labels and Properties and how they are used. Click Next for each:
–Use the CREATE clause to create your personal node:
1 2 |
CREATE (p:Person {name: 'John Doe'}) RETURN p |
–Use the Match clause for Node matches with actor Tom Hanks:
1 2 |
MATCH (p:Person {name: 'Tom Hanks'}) RETURN p |
You can also use a WHERE clause which allows for more complex filtering including >, <, STARTS WITH, ENDS WITH, etc. with the Match clause:
1 2 3 |
MATCH (p:Person) WHERE p.name = "Tom Hanks" RETURN p |

Here, we find the movie Cloud Atlas by its title with the MATCH clause and movies released between 2010 and 2015 with the MATCH clause:
1 2 |
MATCH (m:Movie {title: "Cloud Atlas"}) RETURN m |

–Write a query using Merge to create a movie node with title “Greyhound.” As noted in the Sandbox’s sidebar documentation, if the node does not exist then set its released
property to 2020 and lastUpdatedAt
property to the current time stamp. If the node already exists, then only set lastUpdatedAt
to the current time stamp. Return the movie node:
1 2 3 4 |
<span style="font-weight: 400;">MERGE</span> <span style="font-weight: 400;">(</span><span style="font-weight: 400;">m</span><span style="font-weight: 400;">:</span><span style="font-weight: 400;">Movie </span><span style="font-weight: 400;">{</span><span style="font-weight: 400;">title</span><span style="font-weight: 400;">:</span> <span style="font-weight: 400;">'Greyhound'</span><span style="font-weight: 400;">}) </span><span style="font-weight: 400;">ON</span> <span style="font-weight: 400;">CREATE</span> <span style="font-weight: 400;">SET</span><span style="font-weight: 400;"> m</span><span style="font-weight: 400;">.</span><span style="font-weight: 400;">released </span><span style="font-weight: 400;">=</span> <span style="font-weight: 400;">"2020"</span><span style="font-weight: 400;">,</span><span style="font-weight: 400;"> m</span><span style="font-weight: 400;">.</span><span style="font-weight: 400;">lastUpdatedAt </span><span style="font-weight: 400;">=</span><span style="font-weight: 400;"> timestamp</span><span style="font-weight: 400;">() </span><span style="font-weight: 400;">ON</span> <span style="font-weight: 400;">MATCH</span> <span style="font-weight: 400;">SET</span><span style="font-weight: 400;"> m</span><span style="font-weight: 400;">.</span><span style="font-weight: 400;">lastUpdatedAt </span><span style="font-weight: 400;">=</span><span style="font-weight: 400;"> timestamp</span><span style="font-weight: 400;">() </span><span style="font-weight: 400;">RETURN</span><span style="font-weight: 400;"> m</span> |
Relationships have an outgoing or incoming relationship, denoted in Cypher by → or ←. In this query, Person (Tom Hanks) has an outgoing relationship and movie has an incoming relationship:
1 2 |
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p,r,m |

The results are zoomed out.

The results zoomed in.
Find the nodes Person and Movie that are connected by a REVIEWED relationship and is outgoing from the Person node and incoming to the Movie node:
1 2 |
MATCH (p:Person)-[r:REVIEWED]-(m:Movie) RETURN p,r,m |
Find all actors who have co-acted with Tom Hanks in any movie in Table mode:
1 2 |
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(p:Person) RETURN p.name |
A Range of Possibilities
These are just a sample of commands the Sandbox offers for the Movies Database. Other Sandbox databases and accompanying explanations to help get you started using Cypher for Neo4j total 20 in all, 13 of which are oriented for developers and the rest are geared towards data scientists. The datasets range from the Offshore leaks dataset and guide from the International Consortium of Investigative Journalists (ICIJ) to Stack Overflow questions, including answers, tags, and comments and the relationships between them.
Most of all, have fun!