# Week 11 - Advanced Cypher and APOC

This week's lab focuses on practising advanced cipher features and using the APOC procedures and functions introduced in the lecture.

{% hint style="warning" %}
APOC is prohibited for CSV loading and queries (a)–(f), but must be used in at least one self-designed query.
{% endhint %}

{% hint style="info" %}
In this lab, the solutions are deliberately hidden inside the expandable blocks. You are strongly encouraged to attempt the queries yourself first before expanding to check the sample answers.&#x20;
{% endhint %}

## A. Preparation

1. downloading the dataset: The dataset is the `people-city dataset` we used in the lecture demo, about relationships between people and cities people live in.&#x20;

{% file src="/files/Je9GGncy5X2grSawegiJ" %}
Week 11 Lab Dataset
{% endfile %}

2. Installing APOC: create a new instance, and install the APOC plugin before creating a new database. Note: Restart the instance after installing.

<figure><img src="/files/nXglWixCtYrIzrgdumqB" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/ivqUvp6nVJQ80hsuZeVf" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/S41oikUutIv1DDFW1ZGg" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
**Note**: After installing APOC, it is necessary to restart your database.
{% endhint %}

3. Create a new database
4. importing CSVs: place the CSV files in the import directory of the database. If you do not know how to do it, refer to [Week 9 - Import data from a relational database into Neo4j](/data-warehousing-lab-sheets/week-9-import-data-from-a-relational-database-into-neo4j.md) for instructions.
5. To check if APOC is functioning correctly, execute the following code:

```cypher
CALL apoc.import.csv([{fileName: 'file:/people.csv', labels: ['Person']}], [], {})
```

If the code runs without errors, proceed to the next step. If you encounter the error that mentions "apoc.conf" settings, there is a solution.&#x20;

**Solution**

a. Download the `apoc.conf` file.

{% file src="/files/GSOSiJL9ua7b7QbyveV6" %}

b. Open the `conf` folder (click the path folder icon to open the folder).

<figure><img src="/files/jNhzYnXViARqgICE7qLb" alt=""><figcaption></figcaption></figure>

c. Move the downloaded `apoc.conf` file to the `conf` folder.

d. Restart the database.

## B. Populating the Graph Database

Let's clear your database first:

```cypher
MATCH (n) DETACH DELETE n
```

Make use of APOC dynamic label and dynamic relationship loading to import the three csv files and build a graph database.&#x20;

<details>

<summary>Step 1: Import people and set a static label <code>Person</code> as well as dynamic label according to Occupation (people.csv)</summary>

```cypher
//Step 1: Load people into the database
//with a dynamic label based on their occupation

    LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row
    CALL apoc.create.node(["Person", row.occupation], row)
    YIELD node
    RETURN node
```

</details>

<details>

<summary>Step 2: Dynamically load relationship between people using APOC (relationships.csv)</summary>

```cypher
//Step 2: Load relationships between people into the database

    LOAD CSV WITH HEADERS FROM 'file:///relationships.csv' AS row
    MATCH (p1:Person {name: row.person_1}), (p2:Person {name: row.person_2})
    CALL apoc.create.relationship(p1, row.relationship, {}, p2)
    YIELD rel
    RETURN rel
```

</details>

<details>

<summary>Step 3: Create cities as nodes (promoting a column as nodes) and establish relationships with people (cities.csv)</summary>

```cypher
//Step 3: Create cities and relationships between people and cities

    LOAD CSV WITH HEADERS FROM 'file:///cities.csv' AS row
    MATCH (p:Person {name: row.person})
    MERGE (c:City {name: row.city})
    CREATE (p)-[:LIVES_IN]->(c)
```

</details>

Key APOC procedures for checking metadata in Neo4j:

**Schema / Metadata Overview**

```cypher
CALL apoc.meta.schema()
```

**Quick Stats**

```cypher
CALL apoc.meta.stats()
```

**Sampling-Based Schema (faster on large graphs)**

```cypher
CALL apoc.meta.graph()
```

## C. Advanced Cypher and APOC

### C.1. Aggregation and Data Profiling in Cypher

<details>

<summary>List the number of followers</summary>

```cypher
// List number of followers
MATCH  (p1:Person)-[:FOLLOWS]->(p2:Person)
RETURN  p2.name, COUNT(p1) as NumOfFollowers
ORDER BY NumOfFollowers DESC
```

</details>

<details>

<summary>List the name of followers</summary>

```cypher
// List name of followers
MATCH  (p1:Person)-[:FOLLOWS]->(p2:Person)
RETURN  p2.name, COLLECT(p1.name)
```

</details>

<details>

<summary>FOAF - Find Friend of A Friend for a person</summary>

```cypher
// FOAF
MATCH (d {name: "Etta"})--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson.name 

```

</details>

### C.2. Path Expansion

<details>

<summary>Path Expansion in Cypher - note the syntax that starts with a * in pattern matching</summary>

```cypher
//PATH Expansion
MATCH (me)-[:KNOWS*1..2]-(remote_friend)
WHERE me.name = "Kellsie"
RETURN me, remote_friend.name 
```

</details>

<details>

<summary>Path Expander in APOC</summary>

```cypher
//PATH Expansion in APOC
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS", null, 1,2)
YIELD path
RETURN path, count(path) as hops
ORDER BY hops; 
```

</details>

<details>

<summary>Find the Mechanic that "Kellsie" KNOWS or who KNOWS "Kellsie" (no specified direction) who are one or two hops away</summary>

```cypher
//PATH Expansion in APOC Mechanic
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS", "+Mechanic", 1,2)
YIELD path
RETURN path, count(path) as hops
ORDER BY hops; 
```

</details>

<details>

<summary>Find people "Kellsie" FOLLOWS and KNOWS as well as those who KNOWS "Kellsie" that are 1 or 2 hops away. </summary>

<pre class="language-cypher"><code class="lang-cypher">//PATH Expansion in APOC More Relations
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS|FOLLOWS>", null, 1,2)
YIELD path
RETURN path, length(path) as hops
ORDER BY hops; 


//Compare the results with the below query
MATCH (me:Person {name: "Kellsie"})
CALL apoc.path.expand(me, "KNOWS|FOLLOWS>", null, 1,2)
YIELD path
RETURN path, count(path) as hops
ORDER BY hops; 

//What is your finding?
<strong>//length(path) → how many hops (relationships) in this path
</strong>//count(path) → how many rows/paths exist (aggregation) — not what you want here
</code></pre>

</details>

For more information about APOC path finding, please refer to this APOC documentation:

{% embed url="<https://neo4j.com/labs/apoc/4.4/graph-querying/expand-paths/>" %}

### C.3. List and Pattern Comprehension

<details>

<summary>Use pattern comprehension to find the average, minimum and maximum number of relationships of all nodes. </summary>

```cypher
// What kind of nodes exist
// Find all nodes, reporting on relationship counts per node.
MATCH (n) 
WITH n, [(n)-[r]-() | r] as all_rels
RETURN
DISTINCT labels(n),
avg(size(all_rels)) as Avg_RelationshipCount,
min(size(all_rels)) as Min_RelationshipCount,
max(size(all_rels)) as Max_RelationshipCount
```

</details>

&#x20;The queries below allow you test out integrating pattern comprehension with other clauses (e.g. `UNWIND`) and functions (e.g. `collect()`). What is UNWIND:

{% embed url="<https://neo4j.com/docs/cypher-manual/current/clauses/unwind/?utm_source=Google&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=APAC-Search-SEMCE-DSA-None-SEM-SEM-NonABM&utm_term=&utm_adgroup=DSA&gad_source=1&gclid=Cj0KCQjwxeyxBhC7ARIsAC7dS39yS0ESK7v4fG1S0G0XMLVLFjukR5Dx6bt8EPfTZT3_zJ-VcHHP1BwaAgigEALw_wcB>" %}

<details>

<summary>Find a Cook named "Jobi", use Pattern Comprehension to find the occupations of all persons related to him. Store it in a list and attach to entity "Jobi" as a property, make sure the list contain only `DISTINCT` occupations. </summary>

First let's find the list of occupations of all persons related to Jodi

```cypher
//Pattern Comprehension and Storing for persons
MATCH (jobi:Cook {name: 'Jobi'})
WITH jobi, [(jobi)-->(b:Person) | b.occupation] AS jobi_circle_occupation
SET jobi.circle_occupations=jobi_circle_occupation
RETURN jobi.circle_occupations
```

However, the occupation list is not unique, let's try to use `UNWIND` clause and `collect()` function together to get the unique list of occupations and reset the property.

```cypher
MATCH (jobi:Cook {name: 'Jobi'})
UNWIND jobi.circle_occupations AS occupations
WITH collect(DISTINCT occupations) AS distinct_occupations, jobi
SET jobi.circle_occupations = distinct_occupations
RETURN jobi
```

</details>

<details>

<summary>Modify the above query to add the set of occupations of one's relationship circles.</summary>

```cypher
//Pattern Comprehension and Storing for ALL persons
MATCH (n:Person)
WITH n, [(n)-->(b:Person) | b.occupation] AS my_circle_occupations
UNWIND my_circle_occupations AS occupations
WITH collect(DISTINCT occupations) AS distinct_occupations, n
SET n.circle_occupations = distinct_occupations
RETURN n LIMIT 10
```

</details>

{% hint style="warning" %}
**YOUR TURN:** Combine the two above into one block of queries to reduce redundant code.&#x20;
{% endhint %}

{% hint style="warning" %}
**YOUR TURN:**  Modify the third code chunk and try to list the people with the number of occupation types related to them in descending order. e.g. (if the circle\_occupation of Alice is \["Mechanic", "Actor"], the result should be Alice 2)
{% endhint %}

### C.4. Virtual Relationships and Virtual Nodes (Graph Projections)

Virtual Nodes and Relationships do not exist in the graph, they are only returned to the UI/user for representing a graph projection. They can be visualised or processed. All virtual nodes and relationships have negative id’s.

<figure><img src="/files/9fKrUiIOhMhHzE4GT4fn" alt=""><figcaption></figcaption></figure>

For example, the following code creates a virtual node for all people living in `Perth` and created a virtual relationship between this virtual node with the city node `Perth`. Note how we can add the population count as a property when creating the virtual relationship.&#x20;

```cypher
MATCH (p:Person)-[:LIVES_IN]->(c:City{name:"Perth"})
WITH collect(p.name) AS residents, c, count(p) as count
CALL apoc.create.vNode(residents,{name:residents}) yield node as a
CALL apoc.create.vRelationship(a,"LIVES_IN",{count:count},c) yield rel
RETURN a, c, rel
```

{% hint style="warning" %}
**YOUR TURN:** Adapt the above code to create such virtual nodes for all cities.&#x20;
{% endhint %}

For more information on Virtual Node and Virtual Relationships, please read the APOC documentation:

{% embed url="<https://neo4j.com/labs/apoc/4.4/virtual/virtual-nodes-rels/>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://csse-uwa.gitbook.io/data-warehousing-lab-sheets/week-11-advanced-cypher-and-apoc.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
