Post

MANAGING BIG DATA WITH MYSQL WORKBENCH | A RELATIONAL DATABASE PROJECT

The following article is part of my Big Data Management RDBMS Project, which is written in SQL and executed in MySQL Workbench. The project’s requirements included acquiring a relatively large dataset, constructing a relational database system, and performing data analysis queries using SQL.

Data Collection

The Adventure works dataset is obtained from Kaggle, an open-source platform which provides a large number of datasets and problems for users to work on, and it also provides tools for users to submit and evaluate their models.
The Adventure Works dataset consists of 18,000 customers, both women and men are involved in purchasing the company’s products. Each of the customers' data recorded consists of their personal information including customer Key, first Name, last Name, email address, and annual income.
The dataset also involves products and its subcategories that includes different brands of bikes that are categorized into four groups: bikes, components, clothing, and accessories. Since the company manufactures its own components, accessories, and clothing. The primary selling products of AW are several different brands of bikes including mountain, road, and touring bikes.

It also has more than 56,000 sales records from years 2015, 2016, and 2017 with their returned orders and the territories where it has shipped from and into.

Business Background

Adventure Works Cycles is a large, multinational manufacturing company on which the Adventure Works sample databases are based. The company manufactures and sells metal and composite bicycles to ten territories, including North American, European, and Asian commercial markets. They focus on four different categories: Bikes, Components, Clothing, and Accessories. While their base operation is located in Bothell, several regional sales teams are situated throughout their market base.
In early 2001, Adventure Works Cycles purchased a small manufacturing plant in Mexico. Most of the products were produced by outside vendors. In such instances, AW acts as a reseller. The company began manufacturing several critical subcategories for AW Cycles' products. These subcategories are shipped to those vendors' locations for final product assembly. In late 2001, the company's stock began storing different assembled products, which are then shipped to customers throughout the United States, Canada, France, Germany, Australia, and the United Kingdom.

Entity Relationship Diagram (ERD)

After collecting data, the first step involved data cleaning to ensure that the dataset was prepared for the creation of the Adventure Works Database Schema. The data cleaning process included the following tasks:

  • Ensuring the correct data types for all columns. For example, DATE type columns were converted to the ‘YYYY-MM-DD’ format.

  • Removing special characters or symbols from data, such as the ‘$’ symbol in the ‘Annual Income’ column, to ensure it matches the integer data type. This step was crucial for enabling statistical queries and ordering data in ascending or descending order.

  • Ensuring that primary keys were unique. If any duplicated IDs were found, they were renamed to ensure uniqueness.

Once the data cleaning process was completed, we proceeded to create the Adventure Works Database Schema. This schema combined various attributes, including different types of products and their subcategories with the four main product categories. Additionally, we collected sales data over a three-year period, including sales returns, and associated it with sales territories such as regions and countries. Furthermore, customer data, including personal information such as names and email addresses, along with annual income, was integrated into the database for comprehensive analysis.

For more understanding, The relationships of the tables can be summarized in Diagram: 1 with the Primary Key (PK), Foreign Key (FK) and in Table: 1 sequentially.

Figure 1 Diagram 1: Entity Relationship Diagram

Diagram above shows the relational schema diagram of the Adventure Works database, which elucidates that the selected database consists of ten relations tables namely, customers, products, products_subcategories, product_categories, calendar, territories, sales_2015, sales_2016, sales_2017 and returns.

In each of these sales tables, we had to change the order numbers in the OrderNumber table due to duplicates found in the dataset, and as it is a primary key it must involves unique values. These sales table have combined most of the other tables primary keys as a foreign key’s in it, as well as sharing one-to-many relations from them.

No.SchemaContains objects related toExamples
1ProductsProduct Keys (PK)Products table
  Product subcategories (FK)Subcategories table
  Product names, model namesCategories table
  Product price 
2Sales returnReturn quantityReturn table
  Product key and Territory keyProducts table
   Territories table
3Sales in 2015Order Numbers (PK)Sales_2015 table
 Sales in 2016Customer Keys (PK)Sales_2016 table
 Sales in 2017Territory Keys (FK)Sales_2017 table
  Product Keys (FK)Customer table
  Order Date (FK)Products table
   Territories table
   Calendar table
4CustomersCustomer Key (PK)Sales in 2016
  Customers personal dataCustomer table
   Products table
   Territories table
   Calendar table

The SQL queries in defining the database and the entities are shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
/*CREATE THE DATABASE AdventureWorks*/
DROP DATABASE IF EXISTS adventureworks;
CREATE DATABASE AdventureWorks;

/*Use Database*/
USE AdventureWorks;

/*Table structure for table calendar*/
DROP TABLE IF EXISTS `calendar`;
CREATE TABLE `calendar` (
  `OrderDate` DATE NOT NULL,
  PRIMARY KEY (`OrderDate`)
);

/*Define entity customers*/
DROP TABLE IF EXISTS `customers`;
CREATE TABLE `customers` (
  `CustomerKey` INT NOT NULL,
  `Prefix` VARCHAR(100) NOT NULL,
  `FirstName` VARCHAR(100) NOT NULL,
  `LastName` VARCHAR(100) NOT NULL,
  `BirthDate` date NOT NULL,
  `MaritalStatus` VARCHAR(100) NOT NULL,
  `Gender` VARCHAR(100) NOT NULL,
  `EmailAddress` VARCHAR(100) NOT NULL,
  `AnnualIncome` INT NOT NULL,
  `TotalChildren` INT NOT NULL,
  `EducationLevel` VARCHAR(100) NOT NULL,
  `Occupation` VARCHAR(100) NOT NULL,
  `HomeOwner` VARCHAR(100) NOT NULL,
  PRIMARY KEY (`CustomerKey`)
)

-- Define entity product_categories
DROP TABLE IF EXISTS `product_categories`;
CREATE TABLE `product_categories` (
  `ProductCategoryKey` INT NOT NULL,
  `CategoryName` VARCHAR(50) DEFAULT NULL,
  PRIMARY KEY (`ProductCategoryKey`)
);

-- Create Table `product_subcategories`
DROP TABLE IF EXISTS `product_subcategories`;
CREATE TABLE `product_subcategories` (
  `ProductSubcategoryKey` INT NOT NULL,
  `SubcategoryName` VARCHAR(100) NOT NULL,
  `ProductCategoryKey` INT NOT NULL,
  PRIMARY KEY (`ProductSubcategoryKey`),
  KEY `ProductCategoryKey` (`ProductCategoryKey`),
  CONSTRAINT `product_subcategories_ibfk_1` FOREIGN KEY (`ProductCategoryKey`) REFERENCES `product_categories` (`ProductCategoryKey`)
); 

-- Table structure for table products
DROP TABLE IF EXISTS `products`;
CREATE TABLE `products` (
  `ProductKey` INT NOT NULL,
  `ProductSubcategoryKey` INT NOT NULL,
  `ProductSKU` VARCHAR(100) NOT NULL,
  `ProductName` VARCHAR(100) NOT NULL,
  `ModelName` VARCHAR(100) NOT NULL,
  `ProductDescription` VARCHAR(250) NOT NULL,
  `ProductColor` VARCHAR(100) NOT NULL,
  `ProductSize` VARCHAR(100) NOT NULL,
  `ProductStyle` VARCHAR(100) NOT NULL,
  `ProductCost` DECIMAL(10,4) NOT NULL,
  `ProductPrice` DECIMAL(10,4) NOT NULL,
  PRIMARY KEY (`ProductKey`),
  KEY `ProductSubcategoryKey` (`ProductSubcategoryKey`),
  CONSTRAINT `products_ibfk_1` FOREIGN KEY (`ProductSubcategoryKey`) REFERENCES `product_subcategories` (`ProductSubcategoryKey`)
);

-- Table structure for table territories
DROP TABLE IF EXISTS `territories`;
CREATE TABLE `territories` (
  `TerritoryKey` INT NOT NULL,
  `Region` VARCHAR(100) NOT NULL,
  `Country` VARCHAR(100) NOT NULL,
  `Continent` VARCHAR(100) NOT NULL,
  PRIMARY KEY (`TerritoryKey`)
);

-- Table structure for table returns
DROP TABLE IF EXISTS `returns`;
CREATE TABLE `returns` (
  `ReturnDate` DATE NOT NULL,
  `TerritoryKey` INT NOT NULL,
  `ProductKey` INT NOT NULL,
  `ReturnQuantity` INT NOT NULL,
  KEY `ProductKey` (`ProductKey`),
  KEY `TerritoryKey` (`TerritoryKey`),
  CONSTRAINT `returns_ibfk_1` FOREIGN KEY (`ProductKey`) REFERENCES `products` (`ProductKey`),
  CONSTRAINT `returns_ibfk_2` FOREIGN KEY (`TerritoryKey`) REFERENCES `territories` (`TerritoryKey`)
);

-- Table structure for table sales_2015
DROP TABLE IF EXISTS `sales_2015`;
CREATE TABLE `sales_2015` (
  `OrderDate` DATE NOT NULL,
  `StockDate` DATE NOT NULL,
  `OrderNumber` VARCHAR(100) NOT NULL,
  `ProductKey` INT NOT NULL,
  `CustomerKey` INT NOT NULL,
  `TerritoryKey` INT NOT NULL,
  `OrderLineItem` INT NOT NULL,
  `OrderQuantity` INT NOT NULL,
  PRIMARY KEY (`OrderNumber`),
  KEY `ProductKey` (`ProductKey`),
  KEY `CustomerKey` (`CustomerKey`),
  KEY `TerritoryKey` (`TerritoryKey`),
  KEY `OrderDate` (`OrderDate`),
  CONSTRAINT `sales_2015_ibfk_1` FOREIGN KEY (`ProductKey`) REFERENCES `products` (`ProductKey`),
  CONSTRAINT `sales_2015_ibfk_2` FOREIGN KEY (`CustomerKey`) REFERENCES `customers_details` (`CustomerKey`),
  CONSTRAINT `sales_2015_ibfk_3` FOREIGN KEY (`TerritoryKey`) REFERENCES `territories` (`TerritoryKey`),
  CONSTRAINT `sales_2015_ibfk_4` FOREIGN KEY (`OrderDate`) REFERENCES `calendar` (`OrderDate`)
);

-- Table structure for table sales_2016
DROP TABLE IF EXISTS `sales_2016`;
CREATE TABLE `sales_2016` (
  `OrderDate` DATE NOT NULL,
  `StockDate` DATE NOT NULL,
  `OrderNumber` VARCHAR(100) NOT NULL,
  `ProductKey` INT NOT NULL,
  `CustomerKey` INT NOT NULL,
  `TerritoryKey` INT NOT NULL,
  `OrderLineItem` INT NOT NULL,
  `OrderQuantity` INT NOT NULL,
  PRIMARY KEY (`OrderNumber`),
  KEY `ProductKey` (`ProductKey`),
  KEY `CustomerKey` (`CustomerKey`),
  KEY `TerritoryKey` (`TerritoryKey`),
  KEY `OrderDate` (`OrderDate`),
  CONSTRAINT `sales_2016_ibfk_1` FOREIGN KEY (`ProductKey`) REFERENCES `products` (`ProductKey`),
  CONSTRAINT `sales_2016_ibfk_2` FOREIGN KEY (`CustomerKey`) REFERENCES `customers_details` (`CustomerKey`),
  CONSTRAINT `sales_2016_ibfk_3` FOREIGN KEY (`TerritoryKey`) REFERENCES `territories` (`TerritoryKey`),
  CONSTRAINT `sales_2016_ibfk_4` FOREIGN KEY (`OrderDate`) REFERENCES `calendar` (`OrderDate`)
);

-- Table structure for table sales_2017
DROP TABLE IF EXISTS `sales_2017`;
CREATE TABLE `sales_2017` (
  `OrderDate` DATE NOT NULL,
  `StockDate` DATE NOT NULL,
  `OrderNumber` VARCHAR(100) NOT NULL,
  `ProductKey` INT NOT NULL,
  `CustomerKey` INT NOT NULL,
  `TerritoryKey` INT NOT NULL,
  `OrderLineItem` INT NOT NULL,
  `OrderQuantity` INT NOT NULL,
  PRIMARY KEY (`OrderNumber`),
  KEY `ProductKey` (`ProductKey`),
  KEY `CustomerKey` (`CustomerKey`),
  KEY `TerritoryKey` (`TerritoryKey`),
  KEY `OrderDate` (`OrderDate`),
  CONSTRAINT `sales_2017_ibfk_1` FOREIGN KEY (`ProductKey`) REFERENCES `products` (`ProductKey`),
  CONSTRAINT `sales_2017_ibfk_2` FOREIGN KEY (`CustomerKey`) REFERENCES `customers_details` (`CustomerKey`),
  CONSTRAINT `sales_2017_ibfk_3` FOREIGN KEY (`TerritoryKey`) REFERENCES `territories` (`TerritoryKey`),
  CONSTRAINT `sales_2017_ibfk_4` FOREIGN KEY (`OrderDate`) REFERENCES `calendar` (`OrderDate`)
);

When attempting to import data for each individual entity into MySQL Workbench, I initially tried to manually import Excel datasets. Unfortunately, this method often failed due to the large volume of data. After numerous attempts, I eventually converted the CSV datasets to JSON format and successfully imported them into the database using the JSON format.

Later, after completing the project, I discovered a more efficient and time-saving method for bulk data transfer. I resorted to using a custom Python script, which significantly reduced the time and effort required for the data import process. The Python script for importing data from the CSV file into the ‘sales_2015’ table is provided below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from tqdm import tqdm
import csv
import sys
import MySQLdb

conn = MySQLdb.connect(host="127.0.0.1", user="username", 
password="password", database="adventureworks")

cursor = conn.cursor()
csv_data = csv.reader(open('AdventureWorks_Sales_2015.csv'))
header = next(csv_data)

print('Importing the CSV Files')
for row in tqdm(csv_data):
    cursor.execute(
        "INSERT INTO sales_2015 (OrderDate, StockDate, OrderNumber, ProductKey, CustomerKey, TerritoryKey, OrderLineItem, OrderQuantity ) \
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s)",
        row)
   
conn.commit()
cursor.close()
conn.close()
print('Done')

Replace ‘username’ and ‘password’ with your own database credentials. ‘database’ should be the name of your target database. If you have multiple CSV files to import, make sure to repeat this process for each of them.

Statistical queries

Statistical queries were performed for a better understanding of the selected dataset. We have created and performed statistical queries to examine the dataset which is able to produce count, mean, median, and average values. For instance, average sales for the most purchased products in 2015, 2016, and 2017, Median value of annual income of our customers and mean value of product sales from years 2015, 2016, and 2017.
1
2
3
4
5
6
-- 1/ COUNT all sales in each year
SELECT '2015' AS Year, COUNT(*) AS Total_Sales FROM sales_2015
UNION ALL
SELECT '2016' AS Year, COUNT(*) AS Total_Sales FROM sales_2016
UNION ALL
SELECT '2017' AS Year, COUNT(*) AS Total_Sales FROM sales_2017;

Figure 2 Figure 2: staistical queries 1

1
2
3
4
5
6
7
8
9
-- 2/ MAX return quantity in each year
SELECT '2015' AS Year, SUM(ReturnQuantity) AS Total_Returns FROM returns
WHERE ReturnDate BETWEEN '2015-01-01' AND '2015-12-31' 
UNION ALL
SELECT '2016' AS Year, SUM(ReturnQuantity) AS Total_Returns FROM returns
WHERE ReturnDate BETWEEN '2016-01-01' AND '2016-12-31' 
UNION ALL 
SELECT '2017' AS Year, SUM(ReturnQuantity) AS Total_Returns FROM returns
WHERE ReturnDate BETWEEN '2017-01-01' AND '2017-12-31';

Figure 3 Figure 3: staistical queries 2

1
2
3
-- 3/ Calculate the average age of all customers
SELECT AVG(EXTRACT( YEAR FROM DATE('2023-01-17')) - (EXTRACT(YEAR FROM BirthDate))) AS average_age
FROM customers;

Figure 4 Figure 4: staistical queries 3

1
2
3
4
-- 4/ Find minimum product profit
SELECT products.ProductName, ProductPrice - ProductCost as profit
FROM products
WHERE (ProductPrice - ProductCost) = (SELECT MIN(ProductPrice - ProductCost) FROM products);

Figure 5 Figure 5: staistical queries 4

Data Analysis SQL Queries

Question 1: Find all the products and identify them by their unique key values in ascending order.*/

1
SELECT * FROM products ORDER BY ProductKey ASC;

Figure 6 Figure 6: Output from Question 1

Question 2: Find all the products profit and identify them by their names in ascending order.

1
2
3
SELECT ProductName, ProductCost, ProductPrice, 
ProductPrice-ProductCost AS Profit
FROM products ORDER BY profit DESC;

Figure 7 Figure 7: Output from Question 2

Question 3: Find the 10 most expensive products in descending order.

1
2
SELECT ProductName, ProductPrice FROM products
ORDER BY ProductPrice DESC LIMIT 10;

Figure 8 Figure 8: Output from Question 3

Question 4: Find the 10 cheapest products in ascending order:

1
2
 SELECT ProductName, ProductPrice FROM products
 ORDER BY ProductPrice ASC LIMIT 10;

Figure 9 Figure 9: Output from Question 4

Question 5: Find the average price from products and products greater than the average:

1
2
3
4
SELECT  ProductName, ProductPrice FROM products
HAVING ProductPrice >  (SELECT
 AVG(ProductPrice) FROM products)
ORDER BY ProductPrice ASC;

Figure 10 Figure 10: Output from Question 5

Question 6: List all products whose size is medium, red in color and the product cost less than 800:

1
2
3
4
5
6
7
 SELECT ProductKey, ProductName,
 ProductSize,
 ProductColor, ProductCost
 FROM products
 WHERE ProductSize > 20 
 AND ProductColor='red' 
 AND ProductCost < 800;

Figure 11 Figure 11: Output from Question 6

Question 7: List all products based on subcategories:

1
2
3
4
5
SELECT ProductKey, ProductName, subcategoryName
FROM products
JOIN product_subcategories 
ON products.ProductSubcategoryKey
=product_subcategories.ProductSubcategoryKey;

Figure 12 Figure 12: Output from Question 7

Question 8: List all customers who owns house by gender by DESC order of Annual Income:

1
2
3
4
5
SELECT gender, FirstName, LastName, 
AnnualIncome, HomeOwner
FROM customers
WHERE HomeOwner = 'Y'
ORDER BY AnnualIncome DESC;

Figure 13 Figure 13: Output from Question 8

Question 9: Find married customers that own a house and their occupation by ascending order of birth date:

1
2
3
4
5
6
SELECT FirstName,BirthDate, MaritalStatus,
EducationLevel, Occupation
FROM customers 
WHERE MaritalStatus= 'm' 
AND HomeOwner ='Y'
ORDER BY BirthDate ASC;

Figure 14 Figure 14: Output from Question 9

Question 10: Find customers that are single and whose annual income is greater than 50,000 in ascending order:

1
2
3
4
5
6
SELECT FirstName, LastName,
MaritalStatus, AnnualIncome 
FROM customers
WHERE MaritalStatus='s' AND
AnnualIncome > 50000 
ORDER BY AnnualIncome ASC;

Figure 15 Figure 15: Output from Question 10

Question 11: Among the female customers who are married, find the ones that have houses and their annual income is greater than average income:

1
2
3
4
5
6
7
8
SELECT CustomerKey, FirstName,
LastName, MaritalStatus,gender, HomeOwner, AnnualIncome
FROM customers
WHERE MaritalStatus = 'M' 
AND gender = 'F' 
AND HomeOwner='Y'
AND AnnualIncome > (select avg(AnnualIncome)from customers)
ORDER BY AnnualIncome;

Figure 16 Figure 16: Output from Question 11

Question 12: List all the customers that their annual income is less than 20,000 and bought products in 2015:

1
2
3
4
5
6
7
SELECT  FirstName, LastName, 
AnnualIncome, ProductName,
YEAR(OrderDate) AS Year
FROM sales_2015
JOIN products ON sales_2015.ProductKey = products.ProductKey
JOIN customers ON sales_2015.CustomerKey = customers.CustomerKey
HAVING AnnualIncome < 20000;

Figure 17 Figure 17: Output from Question 12

Question 13: List all sales from 2015 in ascending order by order Number, product key and customer Key and in day/month/year format:

1
2
3
4
5
6
7
8
9
SELECT OrderNumber,products.ProductKey, 
customers.CustomerKey,DAY(OrderDate) AS Day, 
MONTH(OrderDate) AS MONTH, YEAR(OrderDate) AS Year,
sales_2015.OrderQuantity * products.ProductPrice AS Sales
from sales_2015 join products 
ON sales_2015.ProductKey = products.ProductKey
JOIN customers ON
sales_2015.CustomerKey = customers.CustomerKey 
ORDER BY Sales ASC;

Figure 18 Figure 18: Output from Question 13

Question 14: List all sales from 2016 order by orderNumber and in day/month/year format:

1
2
3
4
5
6
7
8
9
SELECT OrderNumber,products.ProductKey, 
customers.CustomerKey,DAY(OrderDate) AS Day, 
MONTH(OrderDate) AS MONTH, YEAR(OrderDate) AS Year,
sales_2016.OrderQuantity * products.ProductPrice AS Sales
from sales_2016 join products 
ON sales_2016.ProductKey = products.ProductKey
JOIN customers ON
sales_2016.CustomerKey = customers.CustomerKey 
ORDER BY Sales DESC;

Figure 19 Figure 19: Output from Question 14

Question 15: List all sales from 2017 order by orderNumber and in day/month/year format:

1
2
3
4
5
6
7
8
9
SELECT OrderNumber,products.ProductKey, 
customers.CustomerKey,DAY(OrderDate) AS Day, 
MONTH(OrderDate) AS MONTH, YEAR(OrderDate) AS Year,
sales_2017.OrderQuantity * products.ProductPrice AS Sales
from sales_2017 join products 
ON sales_2017.ProductKey = products.ProductKey
JOIN customers ON
sales_2017.CustomerKey = customers.CustomerKey 
ORDER BY Sales DESC;

Figure 20 Figure 20: Output from Question 15

Question 16: List all the customers that purchased the most sold products in the year that has higher sales 2017:

1
2
3
4
5
6
7
8
9
10
SELECT customers.CustomerKey, 
FirstName, LastName, 
ProductName, OrderQuantity, OrderDate
FROM sales_2017
JOIN customers ON 
sales_2017.CustomerKey = customers.CustomerKey
JOIN products ON 
sales_2017.ProductKey = products.ProductKey
WHERE OrderQuantity > (SELECT AVG(OrderQuantity)
FROM sales_2017);

Figure 21 Figure 21: Output from Question 16

Question 17: Count the products that purchased the same item in 2016:

1
2
3
4
5
6
SELECT count(*) as quantity_sold, ProductName
FROM sales_2016
JOIN customers ON sales_2016.CustomerKey = customers.CustomerKey
JOIN products ON sales_2016.ProductKey = products.ProductKey
GROUP BY ProductName
ORDER BY quantity_sold DESC;

Figure 22 Figure 22: Output from Question 17

Question 18: List all products that have been returned based on continent, country and region and order by the return date:

1
2
3
4
5
6
7
8
9
SELECT products.ProductKey,
ProductName,ReturnDate,
Continent, Country, Region
FROM returns
JOIN products ON 
returns.ProductKey = products.ProductKey
JOIN territories ON 
returns.TerritoryKey = territories.TerritoryKey
ORDER BY ReturnDate;

Figure 23 Figure 23: Output from Question 18

Question 19: Count the returned products group by region:

1
2
3
4
5
SELECT count(*) AS Total_Return, Region
FROM returns
JOIN territories ON
returns.TerritoryKey = territories.TerritoryKey
GROUP BY region;

Figure 24 Figure 24: Output from Question 19

Question 20: Find out the profit of the top 5 products for 2017:

1
2
3
4
5
SELECT products.ProductKey, ProductName,ProductCost,
ProductPrice, ProductPrice - ProductCost AS Profit, OrderDate
FROM sales_2017
JOIN products ON sales_2017.ProductKey = products.ProductKey
LIMIT 5;

Figure 25 Figure 25: Output from Question 20

Question 21: Find the average returns in each year:

1
2
3
4
5
6
7
8
SELECT '2017' AS Year, AVG(ReturnQuantity) AS Average_returns FROM returns
WHERE ReturnDate BETWEEN '2017-01-01' AND '2017-12-31'
UNION ALL
SELECT '2016' AS Year, AVG(ReturnQuantity) AS Average_returns FROM returns
WHERE ReturnDate BETWEEN '2016-01-01' AND '2016-12-31'
UNION ALL
SELECT '2015' AS Year, AVG(ReturnQuantity) AS Average_returns FROM returns
WHERE ReturnDate BETWEEN '2015-01-01' AND '2015-12-31';

Figure 26 Figure 26: Output from Question 21

Question 22: Find the total quantities orded in each year and at all times within each region:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
WITH cte2015 AS (
    SELECT Region, territories.TerritoryKey, territories.Country, SUM(OrderQuantity) as total_quantity
    FROM territories
    JOIN sales_2015
    ON territories.TerritoryKey = sales_2015.TerritoryKey
    GROUP BY Region, territories.Country, territories.TerritoryKey
), cte2016 AS (
    SELECT Region, territories.TerritoryKey, territories.Country, SUM(OrderQuantity) as total_quantity
    FROM territories
    JOIN sales_2016
    ON territories.TerritoryKey = sales_2016.TerritoryKey
    GROUP BY Region, territories.Country, territories.TerritoryKey
), cte2017 AS (
    SELECT Region, territories.TerritoryKey, territories.Country, SUM(OrderQuantity) as total_quantity
    FROM territories
    JOIN sales_2017
    ON territories.TerritoryKey = sales_2017.TerritoryKey
    GROUP BY Region, territories.Country, territories.TerritoryKey
), cte_all_times AS (
    SELECT Region, territories.TerritoryKey, territories.Country, SUM(OrderQuantity) as total_quantity
    FROM territories
    JOIN (SELECT * FROM sales_2015
    UNION ALL
    SELECT * FROM sales_2016
    UNION ALL
    SELECT * FROM sales_2017) s
    ON territories.TerritoryKey = s.TerritoryKey
    GROUP BY Region, territories.Country, territories.TerritoryKey
)
SELECT cte2015.Region, cte2015.TerritoryKey,
       MAX(cte2015.total_quantity) as total_quantities2015,
       MAX(cte2016.total_quantity) as total_quantities2016,
       MAX(cte2017.total_quantity) as total_quantities2017,
       MAX(cte_all_times.total_quantity) as total_quantities_all_times
FROM cte2015
JOIN cte2016
ON cte2015.Region = cte2016.Region and cte2015.TerritoryKey = cte2016.TerritoryKey
JOIN cte2017
ON cte2016.Region = cte2017.Region and cte2016.TerritoryKey = cte2017.TerritoryKey
JOIN cte_all_times
ON cte2017.Region = cte_all_times.Region and cte2017.TerritoryKey = cte_all_times.TerritoryKey
GROUP BY cte2015.Region, cte2015.TerritoryKey
ORDER BY total_quantities2015 DESC;

Figure 27 Figure 27: Output from Question 22

The script for these queries can be found in this repo and the Adventure work database is free to download from here.

This post is licensed under CC BY 4.0 by the author.