Week 5 - Mini End-to-End Data Warehouse Project
Last updated
Last updated
This lab contributes 3% to your final grade.
In the Week 2 lab, we have already loaded the dataset to PostgreSQL. In Week 3, we practised ETL with databases. In this Lab, we will (Layers 3 and 4 in the OLAP Architecture below):
go through the process of designing a new data warehouse using Power BI or Tableau
build a cube for drill-down and roll-up analysis in Power Bi or Tableau
visualize the data using Tableau or Power BI
Woolworths have different shopping centers in Perth, where daily sales take place for various products. Higher management is facing an issue while decision making due to non-availability of integrated data they can’t do study on their data as per their requirement. So they need to design a system which can help them quickly in decision making and provide Return on Investment (ROI). Let us start designing of the data warehouse, we need to follow a few steps before we start our data warehouse design. First, try to identify the dimension table, fact tables and the primary key, foreign key relationships. One of the possible designs is shown in the following figure. You need to identify the Fact table and the dimension tables with their relationships from the diagram below.
If you use the local solution, please download the dataset.
If you use the Docker solution, please use the AdventureworksDWDemo
database.
As we mentioned in Lab4, there are two options for creating a database.
Option 1: Using the pgAdmin 4 GUI tool.
Right-click the server name to expand the list.
Select "Create" then "Database...".
Enter AdventureworksDWDemo
as the database name.
Option 2: Using the pgAdmin 4 query tool
Click 'Query Tool Workspace' to open the PostgreSQL editor.
Select an existing server from the list, then click 'Connect & Open Query Tool'. Leave all other parameters at their default values.
Finally, you can check the Object Explore panel to find the created database.
The next step is to create dimension tables and fact table(s). You have two options for setting up your database tables. Option 1 is to create tables directly in PostgreSQL and insert data row by row. Option 2 is to create tables and then import data from structured CSV files.
Step 1: Manually define and create each table within your PostgreSQL environment using SQL commands, which involves defining the structure of each table using SQL commands and then inserting data into those tables. Those are some examples for create fact table and dimension tables.
Step2 : After creating the table, you can insert data into it. Here's an example of inserting a single record into the dimcustomer
table:
There is no need to insert a value for customerid
when using the SERIAL
type in PostgreSQL.
Import data from pre-structured CSV files directly into PostgreSQL. This can be faster if the CSV files are properly formatted with headers that match the intended table schema.
Step 1: Follow the same steps as in Option 1 to create a table. Make sure the table schema matches the structure of the CSV file you want to import.
Step 2: The COPY
command is used when you have access to the server's file system where PostgreSQL is running. The syntax is:
or you can use \copy Command in psql (does not require superuser privileges)
Please Note:
replace your table name with dimcustomer
File names are case-sensitive.
replace '/path/to/DimCustomer.csv'
or '/path/to/Dimcustomer.csv'
with the actual path to your CSV file to bulk insert data.
For example, \copy dimcustomer FROM '/tmp/DimCustomer.csv' WITH (FORMAT csv, HEADER false);
After importing the data, it is crucial to verify if the data has been imported correctly. Execute some SELECT queries to check if the data looks correct.
We can generate the schema for the fact and dimension tables using pgAdmin 4 GUI. Below is the expected database schema. You can modify both the table details and their layout as needed. (use ERD For Database to view this diagram)
In PostgreSQL, the ROLLUP
and CUBE
functions are used in the GROUP BY
clause to produce aggregated results that include subtotals and grand totals. These functions are particularly useful in generating reports that require multi-level aggregation across several dimensions, such as sales data analytics. Here's some examples of both queries:
Open Power BI. Select 'Get Data' and choose 'PostgreSQL database'. Enter the credentials and the server details to connect.
Step 2 : Load Data:
Select the tables you've created and load them into Power BI.
Step 3 : Create Relationships:
Go to the 'Modeling' view in Power BI.
Ensure all relationships are correctly set up between the fact and dimension tables as per your schema.
Step 4 : Create Visualizations:
Drag dimensions (e.g., productmane
) and measures (e.g., productsalestotalcost
) to the X and Y axes of a chart.
Choose appropriate chart types, e.g., bar charts for sales data, pie charts for demographic distributions.
Step 5 : Publish and Share
Publish the report to the Power BI service.
Share it with stakeholders for interactive access.
In the Power BI service, select Export > Export to PDF from the menu bar.
Start Tableau and go to the start page. Under the "Connect" section on the left, choose "More..." under "To a Server" and then select "PostgreSQL". Enter the details of your PostgreSQL database, and click "Sign In" to establish the connection.
Step 2: Import Data
After connecting, Tableau will display the list of tables in the database.
Drag the tables (factproductsales, dimcustomer, dimproduct, dimsalesperson, dimstores) into the join area.
Set up the joins according to the foreign keys defined in your database schema. For instance, join factproductsales
to dimstores
on storeid
, and so on.
Once the joins are configured, click on "Sheet 1" to start creating visualizations.
Step 3: Create Visualizations
Step 4: Publish, Share, and Export
Once your dashboard is ready, you can share it with stakeholders in multiple ways:
Publish to Tableau Server or Tableau Online: Go to the "Server" menu, select "Publish Workbook," choose the server you're connected to, configure the permissions, and publish your dashboard to share it with others.
Export as a PDF: If you need to share your dashboard as a document, you can generate a PDF file. Go to the "File" menu, select "Print", and then select "Save as PDF". This is useful for presentations.
Please Note:
If the system shows that no printer is available, please add a printer first (virtual printer also works.
Demonstrate your ability to design and work with a star schema data model, query data for specific business questions, and visualize results using Tableau or Power BI. Ensure correct understanding and application of concepts learned during Weeks 1–5.
Reproduce the schema on your machine.
Draw a starnet with footprints to illustrate the business queries below:
What is the total sales cost of Nirma Soap in Maninagar?
What is the total product actual cost of Arial Washing Powder 1kg sold by salesperson Jacob at X-Mart stores?
Write SQL queries using GROUP BY
a clause CUBE
to answer the business queries above.
Create visualisations to display the query results
Present your solutions to a lab facilitator during Week 5 or Week 6 lab sessions.
Multiple submissions are permitted for feedback and improvement.
Lab facilitators will provide guidance to help you correct any errors.
Marking Rubric
Assessment tasks
5 tasks have been completed correctly.
Missing any 1 task.
Due date/time: 23:59 on 28 March, Friday.
Submission Procedure: During weeks 5 and 6 labs.
Submissions for lab demos through cssubmit, LMS, emails, and Teams will not be accepted.
If you are not familiar with this step, please refer to .
For more details, please refer to .
For more details, you can refer to .
Create a database and populate it with .