Week 12 - Graph Data Science
A. Search Algorithms
Scenario: Siri is currently in Arad, and she wants to drive to Bucharest.

Your Turn:
What is the solution for BFS?
What is the solution for A*? The heuristic function is the straight-line distance to Bucharest.
B. Neo4j
Now, let's play with the search functions in Neo4j. In the lecture, the demo graph version is 5.3.0 with Graph Data Science Library is 2.4.6.
Dataset
The dataset we are using is a database of countries. Each country has a latitude
and longitude
, and are connected by a "road" with a distance property
(just like the transport graph from the lectures, though this one is ROAD
rather than EROAD
and is much larger).
countries.csv
: Each row represents a country, with acountry_code
,latitude
,longitude
, andname
.roads.csv
: Each row connectscountry_1
withcountry_2
and stores thedistance
between them (in km).
The dataset is sourced from both Google and Wikipedia:
Country data was sourced from Google. Some minor preprocessing was done on some of the country names to convert them to the same names in Wikipedia.
Roads data was generated via a Python script. To do this we obtained a list of bordering countries from Wikipedia, did some cleaning/preprocessing etc, and used the
geopy
Python library to calculate the geodesic distance between each bordering country.
Please note the data may not be completely accurate and is just for demonstration purposes.
Task 1. Creating a graph DB and enabling Graph Data Science
First, create a new database for this week in Neo4j. Once you've created it, you'll need to activate the Graph Data Science (GDS) plugin. To do this, click on the database in the list of databases, and in the panel that appears on the right click on "Plugins". The Graph Data Science library should be in the plugins list - click on it, then click "Install".
Task 2. Exploring the data and loading it into Neo4j
Before we run any graph data science algorithms, we need to load our data into Neo4j. Remember, you'll need to place the csv files into the import
directory of your Neo4j database.
Next, you'll need to write LOAD CSV
statements to import both countries and roads into your database. Here is one for countries to get you started:
// LOAD DATA FROM CSV
LOAD CSV WITH HEADERS FROM "file:///countries.csv" AS row
CREATE (c:Country)
SET c.country = row.country,
c.name = row.name,
c.latitude = toFloat(row.latitude),
c.longitude = toFloat(row.longitude)
Note how we have converted row.latitude
and row.longitude
to floats - this is important, otherwise Neo4j will treat them as strings.
Your turn: write a LOAD CSV
statement that imports the roads into your database. Remember, the Countries have already been created, so after LOAD CSV
you'll need to run a MATCH
statement to find the countries listed in each row of roads.csv
, and only CREATE
a road between those two countries. Don't forget the distance
property, which should also be a float.
Task 3. Projecting our graph for GDS
Before we can run any GDS algorithms, we must first project our graph for the Graph Data Science library. Here is the syntax for the projection:
// PROJECT GRAPH FOR GDS
CALL gds.graph.project("Countries", "Country",
{ ROAD: {orientation: "UNDIRECTED"} },
{ relationshipProperties: "distance",
nodeProperties: ["latitude", "longitude"] }
)
The first argument, "Countries", is the name of our GDS graph.
The second argument, "Country", is the name of the node label types we want to project.
The third argument specifies options for the relationships. In this case, we want the
ROAD
relationship to be undirected (as roads are two-way).The fourth argument specifies the properties that we want to include in our GDS graph. We need the
distance
property from the road relationships, and thelatitude
andlongitude
from the countries, for the pathfinding algorithms.
Note: If you ever need to delete your projection and start over, you can run:
// DELETE GRAPH
CALL gds.graph.drop("Countries")
Task 4: Running Graph Data Science Algorithms
4.1 Breath First Search and Depth First Search
Let's try running the Breadth First Search and Depth First Search algorithms on our data.
Here is an example BFS query:
// BFS
MATCH (source: Country {name: "Spain"})
CALL gds.bfs.stream("Countries", {
sourceNode: source
})
YIELD path
RETURN path
Try running this query in Neo4j. If the resulting graph visualisation looks a bit messy because of the roads between countries, uncheck the connect result nodes
option in the Options list and run the query again. You should see multiple Country nodes connected by an orange relationship called NEXT
.
Your turn: change the query to be a depth-first search (DFS) and run it. Are the results the same, or different - and why?
4.2 A* Source-target Shortest Path
Your turn: try running the A* source-target shortest path algorithm to find the shortest path between Spain and India. You can see the syntax in the lecture slides or read the following documentation page:
Now try deleting the relationshipWeightProperty: "distance"
line in the query and run the algorithm again. Has the shortest path changed now that the algorithm is fully relying on a heuristic?
4.3 Minimum Spanning Tree
The final algorithm we'll be running today is Minimum Spanning Tree (MST), which starts from a given node and finds all its reachable nodes and the set of relationships that connect the nodes together with the minimum possible weight.
Here's the syntax for MST taken from Neo4j documentation:
// Mimum Spanning Tree
MATCH (n:Place{id: 'D'})
CALL gds.beta.spanningTree.write('graph', {
startNode: id(n),
relationshipWeightProperty: 'cost',
writeRelationshipType: 'MINST',
writeProperty: 'writeCost'
})
YIELD preProcessingMillis, computeMillis, writeMillis, effectiveNodeCount
RETURN preProcessingMillis,computeMillis, writeMillis, effectiveNodeCount;
Your turn: Create relationships of the type MINST
between Canada and every country reachable from Canada. Write a Cypher query to display the minimum spanning tree. Hint: the relationship type is MINST
, not ROAD
!.
Last updated