This week's lab focuses on practicing advanced cipher features and using the APOC procedures and functions introduced in the lecture. APOC is not mandatory for Project 2 due to the lack of support for APOC in cloud Neo4j, and queries without APOC can still effectively address Project 2 questions. However, if you choose to incorporate APOC into your Project 2, you're more than welcome to do so.
In this lab, the solutions are deliberately hidden inside the expandable blocks. You are strongly encouraged to attempt the queries yourself first before expanding to check the sample answers.
A. Preparation
downloading the dataset: The dataset is the people-city dataset we used in the lecture demo, about relationships between people and cities people live in.
installing APOC: create a new database, and install the APOC plugin before starting the database.
Note: After installing APOC, it is necessary to restart your database.
To check if APOC is functioning correctly, execute the following code:
If the code runs without errors, proceed to the next step. If you encounter the error mentions "apoc.conf" settings, there are TWO solutions.
Solution 1 (Note: This solution only works for a database with version 4.xx.xx)
a. go to settings:
b. use "Ctrl+F" to search for "apoc," then add "apoc.import.file.enabled=true" in the indicated location. Click "Apply" and then restart your database.
c. Execute the above code again; you should now be able to use APOC.
Solution 2 (Note: This solution works for a database with version 4.xx.xx AND 5.xx.xx)
a. Download the apoc.conf file.
b. Open the conf folder.
c. Move the downloaded apoc.conf file to the conf folder.
d. Restart the database.
Thanks, Mr. Peter Millitz offered the solution!
B. Populating the Graph Database
Let's clear your database first:
MATCH (n) DETACH DELETE n
Make use of APOC dynamic label and dynamic relationship loading to import the three csv files and build a graph database.
Step 1: Import people and set a static label Person as well as dynamic label according to Occupation (people.csv)
//Step 1: Load people into the database
//with a dynamic label based on their occupation
LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row
CALL apoc.create.node(["Person", row.occupation], row)
YIELD node
RETURN node
Step 2: Dynamically load relationship between people using APOC (relationships.csv)
//Step 2: Load relationships between people into the database
LOAD CSV WITH HEADERS FROM 'file:///relationships.csv' AS row
MATCH (p1:Person {name: row.person_1}), (p2:Person {name: row.person_2})
CALL apoc.create.relationship(p1, row.relationship, {}, p2)
YIELD rel
RETURN rel
Step 3: Create cities as nodes (promoting a column as nodes) and establish relationships with people (cities.csv)
//Step 3: Create cities and relationships between people and cities
LOAD CSV WITH HEADERS FROM 'file:///cities.csv' AS row
MATCH (p:Person {name: row.person})
MERGE (c:City {name: row.city})
CREATE (p)-[:LIVES_IN]->(c)
C. Advanced Cypher and APOC
C.1. Aggregation and Data Profiling in Cypher
List the number of followers
// List number of followers
MATCH (p1:Person)-[:FOLLOWS]->(p2:Person)
RETURN p2.name, COUNT(p1) as NumOfFollowers
ORDER BY NumOfFollowers DESC
List the name of followers
// List name of followers
MATCH (p1:Person)-[:FOLLOWS]->(p2:Person)
RETURN p2.name, COLLECT(p1.name)
FOAF - Find Friend of A Friend for a person
// FOAF
MATCH (d {name: "Etta"})--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson.name
C.2. Path Expansion
Path Expansion in Cypher - note the syntax that starts with a * in pattern matching
//PATH Expansion
MATCH (me)-[:KNOWS*1..2]-(remote_friend)
WHERE me.name = "Kellsie"
RETURN me, remote_friend.name
Path Expander in APOC
//PATH Expansion in APOC
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS", null, 1,2)
YIELD path
RETURN path, count(path) as hops
ORDER BY hops;
Find the Mechanic that "Kellsie" KNOWS or who KNOWS "Kellsie" (no specified direction) who are one or two hops away
//PATH Expansion in APOC Mechanic
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS", "+Mechanic", 1,2)
YIELD path
RETURN path, count(path) as hops
ORDER BY hops;
Find people "Kellsie" FOLLOWS and KNOWS as well as those who KNOWS "Kellsie" that are 1 or 2 hops away.
//PATH Expansion in APOC More Relations
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS|FOLLOWS>", null, 1,2)
YIELD path
RETURN path, count(path) as hops
ORDER BY hops;
For more information about APOC path finding, please refer to this APOC documentation:
C.3. List and Pattern Comprehension
Use pattern comprehension to find the average, minimum and maximum number of relationships of all nodes.
// What kind of nodes exist
// Find all nodes, reporting on relationship counts per node.
MATCH (n)
WITH n, [(n)-[r]-() | r] as all_rels
RETURN
DISTINCT labels(n),
avg(size(all_rels)) as Avg_RelationshipCount,
min(size(all_rels)) as Min_RelationshipCount,
max(size(all_rels)) as Max_RelationshipCount
The queries below allow you test out integrating pattern comprehension with other clauses (e.g. UNWIND) and functions (e.g. collect()). What is UNWIND:
Find a Cook named "Jodi", use Pattern Comprehension to find the occupations of all persons related to him. Store it in a list and attach to entity "Jodi" as a property, make sure the list contain only `DISTINCT` occupations.
First let's find the list of occupations of all persons related to Jodi
//Pattern Comprehension and Storing for persons
MATCH (jobi:Cook {name: 'Jobi'})
WITH jobi, [(jobi)-->(b:Person) | b.occupation] AS jobi_circle_occupation
SET jobi.circle_occupations=jobi_circle_occupation
RETURN jobi.circle_occupations
However, the occupation list is not unique, let's try to use UNWIND clause and collect() function together to get the unique list of occupations and reset the property.
MATCH (jobi:Cook {name: 'Jobi'})
UNWIND jobi.circle_occupations AS occupations
WITH collect(DISTINCT occupations) AS distinct_occupations, jobi
SET jobi.circle_occupations = distinct_occupations
RETURN jobi
Modify the above query to add the set of occupations of one's relationship circles.
//Pattern Comprehension and Storing for ALL persons
MATCH (n:Person)
WITH n, [(n)-->(b:Person) | b.occupation] AS my_circle_occupations
UNWIND my_circle_occupations AS occupations
WITH collect(DISTINCT occupations) AS distinct_occupations, n
SET n.circle_occupations = distinct_occupations
RETURN n LIMIT 10
YOUR TURN: Combine the two above into one block of queries to reduce redundant code.
YOUR TURN: Modify the third code chunk and try to list the people with the number of occupation types related to them in desending order. e.g. (if the circle_occupation of Alice is ["Mechanic", "Actor"], the result should be Alice 2)
C.4. Virtual Relationships and Virtual Nodes (Graph Projections)
Virtual Nodes and Relationships do not exist in the graph, they are only returned to the UI/user for representing a graph projection. They can be visualised or processed. All virtual nodes and relationships have negative id’s.
For example, the following code creates a virtual node for all people living in Perth and created a virtual relationship between this virtual node with the city node Perth. Note how we can add the population count as a property when creating the virtual relationship.
MATCH (p:Person)-[:LIVES_IN]->(c:City{name:"Perth"})
WITH collect(p.name) AS residents, c, count(p) as count
CALL apoc.create.vNode(residents,{name:residents}) yield node as a
CALL apoc.create.vRelationship(a,"LIVES_IN",{count:count},c) yield rel
RETURN a, c, rel
YOUR TURN, adapt the above code to create such virtual nodes for all cities.
For more information on Virtual Node and Virtual Relationships, please read the APOC documentation:
importing CSVs: place the CSV files in the import directory of the database. If you do not know how to do it, refer to for instructions.