Week 9 - Import data from a relational database into Neo4j
Last updated
Last updated
In this lab, we'll learn how to import data from a relational database into Neo4j. By the end of this session, you should be able to design and create your graph for Project 2.
It is a star schema of a sales warehouse with one fact table and two dimension tables:
Each table contains a type of node.
Each row in a table is a node.
The foreign key and the primary key relations are relationships.
Your schema might look like this:
Relationship labels and property names are self-designed based on the data. Feel free to use different names.
Let's implement it in a Python dataframe.
After comparing the graph schema (Figure 1) with the data, we can find that "shop" nodes come from the "shops_table" and "product" nodes come from the "products_table". However, the "sale" nodes require some ETL processing:
Now, we have all CSV files for nodes. Let's import them into Neo4j.
The screenshots below are from Neo4j Desktop. If you're using the cloud version, the steps are similar.
Once the project is created, rename it to "week9" or other names of your choice. Add a local DBMS and configure the name and password:
After clicking "import", a folder should pop up. Copy your "sales_node.csv", "products.csv", and "shops.csv" into this folder. Close the pop-up folder and then click the "start" button:
Once the database status changes to "ACTIVE", click "Open".
Let's start by creating "Shop" nodes. Write the code below and click the "run" button on the right. The variable name "row" represents each row in the imported CSV file, in this case, "shops.csv". You can rename "row" to another variable name if you prefer. Use "row.HEADER_NAME" to refer to each header. For instance, row: "Shop ID" refers to the "Shop ID" column in "shops.csv". The HEADER_NAME must exactly match the CSV header, including case sensitivity. The properties before the ":" are the ones you've named in your schema design (Figure 1).
Click on the icon in red, and you should be able to see the "Shop" under "Node labels". Click on "Shop" under "Node labels" to explore further.
Follow a similar process for creating "Product" and "Sale" nodes:
Based on the graph schema (Figure 1), we would like to create the relationship "CONTAINS" from "Sale" to "Product", "SOLD_IN" from "Sale" to "Shop":
Now that you've created your graph database from a relation database, you might wonder if you can have additional node labels. For instance, separating "country" from "shop" or "category" and "subcategory" from "product". The answer is "Yes". There's no fixed answer in design; it depends on how you intend to use your graph.
Next week, we'll explore querying graph databases using Cypher.
Step 1: Design a graph database schema using the that reflects the warehouse's star schema. Think about the general principles that
Your turn: Assign different colours to each label. You can refer to the Week 8 lab sheet for guidance. For example:
Copy "rel_sold_in.csv" and "rel_contains.csv" into the "import" folder (as in ) and run the following code. "->" indicates the direction of the relationship. If you want to create an undirected relationship, use CREATE (s)-[:CONTAINS]-(p)
.