# Week 5 - Mini End-to-End Data Warehouse Project

This lab contributes **5%** to your final grade.

In the Week 2 lab, we have already loaded the dataset to PostgreSQL. In Week 3, we practised ETL with databases. In this Lab, we will (**Layers 3 and 4 in the OLAP Architecture below**):

1. go through the process of designing a new data warehouse using Power BI or Tableau
2. build a cube for drill-down and roll-up analysis in Power Bi or Tableau
3. visualize the data using Tableau or Power BI

## Part 1: OLAP Architecture

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FyyB0TilPjk18meYGNy7H%2FNew%20DW%20structure.png?alt=media&#x26;token=e871cfab-ffdc-43d8-a8d3-28f3e433f5bb" alt=""><figcaption><p>OLAP Architecture</p></figcaption></figure>

## Part 2 : Business Requirement

Woolworths have different shopping centers in Perth, where daily sales take place for various products. Higher management is facing an issue while decision making due to non-availability of integrated data they can’t do study on their data as per their requirement. So they need to design a system which can help them quickly in decision making and provide Return on Investment (ROI). Let us start designing of the data warehouse, we need to follow a few steps before we start our data warehouse design. First, try to identify the dimension table, fact tables and the primary key, foreign key relationships. One of the possible designs is shown in the following figure. You need to identify the Fact table and the dimension tables with their relationships from the diagram below.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FkcYnoI4Xo01Q46a78VcT%2Fimage.png?alt=media&#x26;token=f45b9405-f953-43e0-a19e-0b145fce7be8" alt=""><figcaption><p>What is this schema?</p></figcaption></figure>

## Part 3: Data Warehouse Creation using PostgreSQL

{% hint style="info" %}
If you are using the Docker solution, **please skip this sections 3.0, 3.1 and 3.2**. We have already completed this part with Docker.
{% endhint %}

### 3.0 Dataset for Week 5 Lab

* If you use the local solution, please download the dataset.

{% file src="<https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FqSWICGrAF2rB6oVkobjj%2FAdventureworksDWDemo.zip?alt=media&token=3ece108e-9d69-49b3-bbda-ab9450987a75>" %}
Week 5 Lab Dataset
{% endfile %}

* If you use the Docker solution, please use the `AdventureworksDWDemo` database.

### 3.1 Create a PostgreSQL Database

As we mentioned in Lab4, there are two options for creating a database.

* **Option 1: Using the pgAdmin 4 GUI tool.**
  * Right-click the **server name** to expand the list.
  * Select "**Create**" then "**Database...**".&#x20;
  * Enter `AdventureworksDWDemo` as the database name.
* **Option 2: Using the pgAdmin 4 query tool**

  * Click '**Query Tool Workspace**' to open the PostgreSQL editor.
  * Select an existing server from the list, then click '**Connect & Open Query Tool'**. Leave all other parameters at their default values.

  Finally, you can check the **Object Explore** panel to find the created database.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FtAWw8u3r8fsosWXM9nl6%2Fimage.png?alt=media&#x26;token=9cf38574-a621-4053-a133-4cec411b0aef" alt=""><figcaption></figcaption></figure>

### 3.2 Create Database Tables and Insert Data

The next step is to create dimension tables and fact table(s). You have two options for setting up your database tables. Option 1 is to create tables directly in PostgreSQL and **insert data row by row**. Option 2 is to create tables and then **import data from structured CSV files**.

#### Option 1: Create Tables Directly in PostgreSQL and I**nsert Data Row By Row**

**Step 1:** Manually define and create each table within your PostgreSQL environment using SQL commands, which involves defining the structure of each table using SQL commands and then inserting data into those tables. Those are some examples for create fact table and dimension tables.

```sql
-- Create dimension table for customers
CREATE TABLE dimcustomer (
    customerid INTEGER PRIMARY KEY,
    customeraltid VARCHAR(10),
    customername VARCHAR(50),
    gender VARCHAR(2)
);

-- Create dimension table for products
CREATE TABLE dimproduct (
    productkey INTEGER PRIMARY KEY,
    productaltkey VARCHAR(10),
    productname VARCHAR(100)
);

-- Create dimension table for stores
CREATE TABLE dimstores (
    storeid INTEGER PRIMARY KEY,
    storealtid VARCHAR(10),
    storename VARCHAR(100),
    storelocation VARCHAR(100),
    city VARCHAR(100),
    state VARCHAR(100),
    country VARCHAR(100)
);

-- Create dimension table for salespersons
CREATE TABLE dimsalesperson (
    salespersonid INTEGER PRIMARY KEY,
    salespersonaltid VARCHAR(10),
    salespersonname VARCHAR(100),
    storeid INTEGER,
    city VARCHAR(100),
    state VARCHAR(100),
    country VARCHAR(100)
);

-- Create fact table for product sales
CREATE TABLE factproductsales (
    transactionid BIGINT PRIMARY KEY,
    salesinvoicenumber INTEGER,
    storeid INTEGER,
    customerid INTEGER,
    productid INTEGER,
    salespersonid INTEGER,
    quantity DOUBLE PRECISION,
    salestotalcost NUMERIC,
    productactualcost NUMERIC,
    deviation DOUBLE PRECISION,
    FOREIGN KEY (storeid) REFERENCES dimstores(storeid),
    FOREIGN KEY (customerid) REFERENCES dimcustomer(customerid),
    FOREIGN KEY (productid) REFERENCES dimproduct(productkey),
    FOREIGN KEY (salespersonid) REFERENCES dimsalesperson(salespersonid)
);
```

**Step2 :** After creating the table, you can insert data into it. Here's an example of inserting a single record into the `dimcustomer` table:

```sql
INSERT INTO dimcustomer (customeraltid, customername, gender) 
VALUES ('CT123', 'John Doe', 'M');
```

There is no need to insert a value for `customerid` when using the `SERIAL` type in PostgreSQL.

#### Option 2: Create Tables Directly in PostgreSQL and Upload Structured CSV Files

Import data from pre-structured CSV files directly into PostgreSQL. This can be faster if  the CSV files are properly formatted with headers that match the intended table schema.

**Step 1:** Follow the same steps as in Option 1 to create a table. Make sure the table schema matches the structure of the CSV file you want to import.

**Step 2:** The `COPY` command is used when you have access to the server's file system where PostgreSQL is running. The syntax is：

```
COPY dimcustomer FROM '/path/to/DimCustomer.csv' WITH (FORMAT csv, HEADER false)
```

or you can use \copy Command in psql (does not require superuser privileges)

```
\copy dimcustomer FROM '/path/to/DimCustomer.csv' WITH (FORMAT csv, HEADER false);
```

{% hint style="warning" %}
Please Note:

1. replace your table name with ***`dimcustomer`***&#x20;
2. ***File names are case-sensitive.***
3. replace `'/path/to/DimCustomer.csv'`  or `'/path/to/Dimcustomer.csv'`with the actual path to your CSV file to bulk insert data.
4. For example, `\copy dimcustomer FROM '/tmp/DimCustomer.csv' WITH (FORMAT csv, HEADER false);`
   {% endhint %}

After importing the data, it is crucial to verify if the data has been imported correctly. Execute some SELECT queries to check if the data looks correct.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FMRyI9qZjqwaIpma67jrn%2Fimage.png?alt=media&#x26;token=6a41740f-e5ce-48fd-a57e-ad5193da59e8" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
If you are not familiar with this step, please refer to [Week 4 lab: 1.1](https://csse-uwa.gitbook.io/data-warehousing-lab-sheets/week-4-designing-a-new-data-warehouse-using-postgresql#id-1.2.2-bulk-insert).
{% endhint %}

### 3.3 Create a schema

We can generate the schema for the fact and dimension tables using pgAdmin 4 GUI. Below is the expected database schema. You can modify both the table details and their layout as needed. (use ***ERD For Database*** to view this diagram)

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FBfJ4cKrrlH4ZyWiQ64pl%2Fspaces%252FE6tM8okJTaOtct7O9mvr%252Fuploads%252FINFA074nq9D3GyT89LPI%252Fimage.webp?alt=media&#x26;token=dfacb6bc-9e2c-47cf-acf3-35b0be4db43e" alt=""><figcaption></figcaption></figure>

## Part 4: Use PostgreSQL Queries for Rollup and Cube Aggregation

In PostgreSQL, the `ROLLUP` and `CUBE` functions are used in the `GROUP BY` clause to produce aggregated results that include subtotals and grand totals. These functions are particularly useful in generating reports that require multi-level aggregation across several dimensions, such as sales data analytics. Here's some examples of both queries:

```
-- Rollup to see sales by store and product with subtotals
SELECT storeid, productid, SUM(salestotalcost) AS total_sales
FROM factproductsales
GROUP BY ROLLUP (storeid, productid);
```

```
-- Cube to see sales by store, product and salesperson with all possible aggregates
SELECT storeid, productid, salespersonid, SUM(salestotalcost) AS total_sales
FROM factproductsales
GROUP BY CUBE (storeid, productid, salespersonid);
```

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FzVZ8LANC4gFpuZ2oWr8H%2Fimage.png?alt=media&#x26;token=ec898de8-bc08-4949-a909-78c496138e55" alt=""><figcaption><p>Sales data summary using SQL ROLLUP with store/product grouping and totals.</p></figcaption></figure>

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FXQSEmGSyUlVYhLgR2qWr%2Fimage.png?alt=media&#x26;token=f04be64d-d15f-4512-818c-ab931a9534f1" alt=""><figcaption><p>Sales data analysis using SQL CUBE function.</p></figcaption></figure>

## Part 5: Visualising the Multidimensional Solution using Power BI/Tableau

### 5.1  Perform Visualization in Power BI

#### Step 1 : **Connect to PostgreSQL Database:**

Open Power BI. Select '**Get Data**' and choose '**PostgreSQL database**'. Enter the credentials and the server details to connect. &#x20;

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FmIfQaSPnS3qHxUVdwLst%2Fimage.png?alt=media&#x26;token=d2136f92-1c58-4b66-8dd3-3f4c016f44bd" alt=""><figcaption></figcaption></figure>

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FiIx8C9yHsZ6zsYOatBLk%2Fimage.png?alt=media&#x26;token=a0e5fc41-e794-4be5-8fe0-a17b2bb06849" alt=""><figcaption></figcaption></figure>

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FVt7e0tMsy6pYhHqtIyf3%2F1742741170830.jpg?alt=media&#x26;token=fd9fa3ef-1c75-48db-a980-f7d900e5127d" alt=""><figcaption></figcaption></figure>

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FM4mUfo904vzsB1dgcBcS%2F1742741308283.jpg?alt=media&#x26;token=e15f73bb-8739-46d1-b0ba-997e7765c473" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
For more details, please refer to [Week 4 Lab: Task 2](https://csse-uwa.gitbook.io/data-warehousing-lab-sheets/week-4-designing-a-new-data-warehouse-using-postgresql#task2.-connecting-to-power-bi-tableau-for-visualization).
{% endhint %}

**Step 2 : Load Data:**

* Select the tables you've created and load them into Power BI.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FzD1YFcVjU0ip3sUaFTx4%2Fimage.png?alt=media&#x26;token=1ae30235-b6d6-4d5d-a21c-baec5ef42234" alt=""><figcaption></figcaption></figure>

**Step 3 : Create Relationships:**

* Go to the 'Modeling' view in Power BI.
* Ensure all relationships are correctly set up between the fact and dimension tables as per your schema.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2Fkk83EfcTPDt9zkloK8tO%2Fimage.png?alt=media&#x26;token=9192dcf0-f1db-4471-8c99-563956b0f944" alt=""><figcaption></figcaption></figure>

**Step 4 : Create Visualizations:**

* Drag dimensions (e.g., `productmane`) and measures (e.g., `productsalestotalcost`) to the X and Y axes of a chart.
* Choose appropriate chart types, e.g., bar charts for sales data, pie charts for demographic distributions.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FWMGluQSgRVzEcG87fiIe%2Fimage.png?alt=media&#x26;token=c51a32d5-06f2-4068-beec-039c232cadd7" alt=""><figcaption></figcaption></figure>

**Step 5 : Publish and Share**

* Publish the report to the Power BI service.
* Share it with stakeholders for interactive access.
* In the Power BI service, select **Export** > **Export to PDF** from the menu bar.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FQvLprLSg5Oix7sw12l1w%2Fimage.png?alt=media&#x26;token=8308476f-70a9-4e1b-9790-9cb15b2c3c7c" alt=""><figcaption></figcaption></figure>

### 5.2  Perform Visualization in Tableau

#### &#x20;Step 1: Connect to PostgreSQL Database

Start Tableau and go to the start page. Under the "**Connect**" section on the left, choose "**More...**" under "To a Server" and then select "**PostgreSQL**". Enter the details of your PostgreSQL database, and click "Sign In" to establish the connection.&#x20;

{% hint style="warning" %}
For more details, you can refer to [Week 4 Lab: 2.3](https://csse-uwa.gitbook.io/data-warehousing-lab-sheets/week-4-designing-a-new-data-warehouse-using-postgresql#id-2.3-visualise-the-multidimension-solution-with-tableau).
{% endhint %}

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FXx3yVdqlBJNeU8KbfngC%2Fimage.png?alt=media&#x26;token=d78cbed1-16fd-4857-9626-4ecd3bc92196" alt=""><figcaption></figcaption></figure>

**Step 2: Import Data**

* After connecting, Tableau will display the list of tables in the database.
* Drag the tables (factproductsales, dimcustomer, dimproduct, dimsalesperson, dimstores) into the join area.
* Set up the joins according to the foreign keys defined in your database schema. For instance, join `factproductsales` to `dimstores` on `storeid`, and so on.
* Once the joins are configured, click on "Sheet 1" to start creating visualizations.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FSzwuiIJfHg60fQFsnroM%2Fimage.png?alt=media&#x26;token=70c1e002-4100-4931-8a92-14e2d8da5d9e" alt=""><figcaption></figcaption></figure>

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FE6AOA66QwBrwRygwMEiN%2Fimage.png?alt=media&#x26;token=b322e03b-11c5-4e3d-9863-0275b2bfa2e1" alt=""><figcaption></figcaption></figure>

**Step 3: Create Visualizations**

* [x] **Build a Dashboard:**
  * Drag and drop different visualization components from the "Sheets" onto the dashboard area.
  * For example, to analyze sales performance:

    * **Sales by Store:** Use a bar chart to display total sales and quantity per store.

    <figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FrFCZXE71V4R1j5agMc05%2Fimage.png?alt=media&#x26;token=c6d8f542-1020-4917-ad83-4fb20e9c61aa" alt=""><figcaption></figcaption></figure>
* [x] **Filter and Drill-Down Capabilities:**

  * Add filters for dimensions like `storename`, `productname`, `date`, etc., to allow interactive exploration of the data.
  * Enable drill-down features by setting up hierarchies in the product and store dimensions, allowing more detailed analysis at different levels (e.g., city, state).

  <figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2F2PtiPglocsXRrtp9bQhN%2Fimage.png?alt=media&#x26;token=8fa45aff-98ce-4269-92e5-f61d78226415" alt=""><figcaption></figcaption></figure>
* [x] **Tool Tips and Info:**
  * Customize tool tips to show additional information when hovering over parts of the visualizations, such as exact sales figures, differences between periods, or details about the salesperson.

**Step 4: Publish, Share, and Export**

Once your dashboard is ready, you can share it with stakeholders in multiple ways:

* **Publish to Tableau Server or Tableau Online**:\
  Go to the "Server" menu, select "Publish Workbook," choose the server you're connected to, configure the permissions, and publish your dashboard to share it with others.
* **Export as a PDF**:\
  If you need to share your dashboard as a document, you can generate a PDF file. Go to the "File" menu, select "Print", and then select "Save as PDF". This is useful for presentations.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FSvp8BVrnI5Z7vVhk4LRG%2Fimage.png?alt=media&#x26;token=bac22d63-2bdf-4c38-bc8f-cbb02dcc960b" alt=""><figcaption></figcaption></figure>

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2FDVgL8S3VoLpq93odBaPy%2Fimage.png?alt=media&#x26;token=451fa464-31fc-44be-812d-7d29714d4c1e" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
**Please Note:**

If the system shows that no printer is available, please add a printer first (virtual printer also works.
{% endhint %}

## Part 6: Assessment

Demonstrate your ability to design and work with a star schema data model, query data for specific business questions, and visualize results using Tableau or Power BI. Ensure correct understanding and application of concepts learned during Weeks 1–5.

1. Create a database and populate it with [sample data](#id-3.0-dataset-for-week-5-lab).
2. Reproduce the schema on your machine.

<figure><img src="https://374096590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FE6tM8okJTaOtct7O9mvr%2Fuploads%2F5x6znzsbRSGoOe5InLQh%2Fimage.png?alt=media&#x26;token=5f24e839-42e7-4aff-a9bb-77c9ce7798da" alt=""><figcaption></figcaption></figure>

3. Draw a starnet with footprints to illustrate the business queries below:

* *What is the total sales cost of Nirma Soap in Maninagar?*
* *What is the total product actual cost of Arial Washing Powder 1kg sold by salesperson Jacob at X-Mart stores?*

4. Write SQL queries using `GROUP BY CUBE` to answer the business queries above.
5. Create visualisations to display the query results
6. Instead of the current design, Salesperson and Store could each be a separate dimension, with Location extracted into its own independent dimension. What are the advantages and disadvantages of this alternative approach compared to the existing design?

{% hint style="danger" %}

* Present your solutions to a lab facilitator during Week 5 or Week 6 lab sessions.
* Multiple submissions are permitted for feedback and improvement.
* Lab facilitators will provide guidance to help you correct any errors.

&#x20;:smile:
{% endhint %}

**Marking Rubric**

| Items                | Completed (5)                          | Uncompleted (0)     |
| -------------------- | -------------------------------------- | ------------------- |
| **Assessment tasks** | 6 tasks have been completed correctly. | Missing any 1 task. |

Due date/time: 23:59 on 2 April, Thursday.

Submission Procedure: During weeks 5 and 6 labs.

{% hint style="danger" %} <mark style="color:red;">**Submissions for lab demos through cssubmit, LMS, emails, and Teams will not be accepted.**</mark>
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://csse-uwa.gitbook.io/data-warehousing-lab-sheets/week-5-mini-end-to-end-data-warehouse-project.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
