TRANSITIONING FROM SQL TO NOSQL | MONGODB DOCUMENT-BASED DATABASE PROJECT
The following article is a part of my Big Data Management Project for NoSQL via MongoDB. The first part of the project was done entirely in SQL and the details for that can be accessed here.
Introduction
After completed the Adventure works relational database in revious project,and applied SQL queries based on a given scenario that was proposed, now we’d like to develop a No-SQL document-based database using MongoDB Compass application and apply similar aggregations.
Several terms need to be defined clearly to get a clear understanding on the operations of MongoDB, which are,
- Documents: Similar to records or rows in a relational database tables
- Collections: A collection holds one or more documents, similar like a tables
- Database: Stores one or more collections of documents
For reference, Figure 1 illustrates the Entity Relationship Diagram (ERD) used in the previous SQL Project.
Figure 1: Entity Relationship Diagram
From Figure 1, it is understood that there are 10 tables and the process of converting these tables from MySQL into MongoDB Compass can be broken down into the steps below,
The very first step was exporting our tables as a JSON format from ‘adventureworks’ database stored in MySQL Workbench graphical tool.
The following steps would be to import the exported data from MySQL into MongoDB Compass, start by creating a database in MongoDB Compass with the specified name. Next, establish a collection within this database. After setting up the database and collection, proceed to import the data in JSON format into the newly created collection. Repeat these steps for each collection. These instructions are illustrated in Figure 2 and Figure 3.
Figure 2: Create Database and Collection in MongoDB Compass
Figure 2: Import .json files into Collections
Document design
The tables “sales_2015”, “sales_2016” and “sales_2017” have been stored as separate documents in a “sales” collection and referenced from the “customers” and products” collections.
However, as we had ten different tables in SQL database, now we only have five collections in our document-based database named “adventureworks” that involves our nine tables we discussed above. These collections can be summarized in the Figure 1. Below
Figure 4: Total of five Collections
Aggregation Pipeline Code
- Question One:
- Find all the products profit and identify them by their names in ascending order
adventureworks.products.aggregate([
{
$project: {
ProductName: 1,
ProductCost: 1,
ProductPrice: 1,
Profit: {
$subtract: [
"$ProductPrice",
"$ProductCost",
],
},
},
},
{
$sort: {
Profit: -1,
},
},
])
- Question Two:
- List all the customers that their annual income is less than 20,000 and bought products in 2015.
adventureworks.sales.aggregate([
{
$lookup: {
from: "products",
localField: "ProductKey",
foreignField: "ProductKey",
as: "product_info",
},
},
{
$unwind: "$product_info",
},
{
$lookup: {
from: "customers",
localField: "CustomerKey",
foreignField: "CustomerKey",
as: "customer_info",
},
},
{
$unwind: "$customer_info",
},
{
$match: {
"customer_info.AnnualIncome": {
$lt: 20000,
},
},
},
{
$addFields: {
OrderDate: {
$toDate: "$OrderDate",
},
},
},
{
$project: {
FirstName: "$customer_info.FirstName",
LastName: "$customer_info.LastName",
AnnualIncome: "$customer_info.AnnualIncome",
ProductName: "$product_info.ProductName",
Year: {
$year: "$OrderDate",
},
},
},
])
- Question 3:
- List all customers and their order quantities in the year 2017
adventureworks.sales.aggregate([
{
$lookup: {
from: "customers",
localField: "CustomerKey",
foreignField: "CustomerKey",
as: "customer_info",
},
},
{
$lookup: {
from: "products",
localField: "ProductKey",
foreignField: "ProductKey",
as: "product_info",
},
},
{
$unwind: "$customer_info",
},
{
$unwind: "$product_info",
},
{
$addFields: {
OrderDate: {
$toDate: "$OrderDate",
},
},
},
{
$match: {
OrderDate: {
$gte: ISODate("2017-01-01"),
$lt: ISODate("2018-01-01"),
},
},
},
{
$group: {
_id: "$customer_info.CustomerKey",
FirstName: {
$first: "$customer_info.FirstName",
},
LastName: {
$first: "$customer_info.LastName",
},
ProductName: {
$first: "$product_info.ProductName",
},
OrderQuantity: {
$sum: "$OrderQuantity",
},
OrderDate: {
$first: "$OrderDate",
},
},
},
{ $sort: { OrderQuantity: -1 } },
{
$project: {
FirstName: 1,
LastName: 1,
ProductName: 1,
OrderQuantity: 1,
Year: {
$year: "$OrderDate",
},
_id: 0,
},
},
])
- Question 4:
- Count the products that purchased the same item in all years.
adventureworks.sales.aggregate([
{
$lookup: {
from: "customers",
localField: "CustomerKey",
foreignField: "CustomerKey",
as: "customer_info",
},
},
{
$lookup: {
from: "products",
localField: "ProductKey",
foreignField: "ProductKey",
as: "product_info",
},
},
{
$unwind: "$customer_info",
},
{
$unwind: "$product_info",
},
{
$group: {
_id: "$product_info.ProductName",
quantity_sold: {
$sum: "$OrderQuantity",
},
},
},
{
$project: {
_id: 0,
ProductName: "$_id",
quantity_sold: 1,
},
},
{
$sort: {
quantity_sold: -1,
},
},
])
- Question 5:
- Count the returned products group by region.
adventureworks.returns.aggregate([
{
$lookup: {
from: "territories",
localField: "TerritoryKey",
foreignField: "TerritoryKey",
as: "territory_info",
},
},
{
$unwind: "$territory_info",
},
{
$group: {
_id: "$territory_info.Region",
Total_Return: {
$sum: 1,
},
},
},
{
$sort: {
Total_Return: -1,
},
},
{
$project: {
Total_Return: 1,
Region: "$_id",
_id: 0,
},
},
])
Question 6:
adventureworks.sales.aggregate([
{
$addFields: {
OrderDate: {
$toDate: "$OrderDate",
},
},
},
{
$match: {
OrderDate: {
$gte: Date("2017-01-01"),
$lt: Date("2018-01-01"),
},
},
},
{
$lookup: {
from: "products",
localField: "ProductKey",
foreignField: "ProductKey",
as: "product_info",
},
},
{
$unwind: "$product_info",
},
{
$addFields: {
Profit: {
$subtract: [
"$product_info.ProductPrice",
"$product_info.ProductCost",
],
},
},
},
{
$project: {
ProductKey: "$product_info.ProductKey",
ProductName: "$product_info.ProductName",
ProductCost: "$product_info.ProductCost",
ProductPrice: "$product_info.ProductPrice",
Profit: 1,
Year: {
$year: "$OrderDate",
},
_id: 0,
},
},
{
$limit: 5,
},
])
- Question 7:
- Find the total returns in each year (2015, 2016, 2017)
adventureworks.returns.aggregate([
{
$addFields: {
ReturnDate: {
$toDate: "$ReturnDate",
},
},
},
{
$facet: {
year_2015: [
{
$match: {
ReturnDate: {
$gte: new Date("2015-01-01"),
$lte: new Date("2015-12-31"),
},
},
},
{
$group: {
_id: null,
Total_Returns: {
$sum: "$ReturnQuantity",
},
},
},
{
$project: {
Year: {
$literal: "2015",
},
Total_Returns: 1,
_id: 0,
},
},
],
year_2016: [
{
$match: {
ReturnDate: {
$gte: new Date("2016-01-01"),
$lte: new Date("2016-12-31"),
},
},
},
{
$group: {
_id: null,
Total_Returns: {
$sum: "$ReturnQuantity",
},
},
},
{
$project: {
Year: {
$literal: "2016",
},
Total_Returns: 1,
_id: 0,
},
},
],
year_2017: [
{
$match: {
ReturnDate: {
$gte: new Date("2017-01-01"),
$lte: new Date("2017-12-31"),
},
},
},
{
$group: {
_id: null,
Total_Returns: {
$sum: "$ReturnQuantity",
},
},
},
{
$project: {
Year: {
$literal: "2017",
},
Total_Returns: 1,
_id: 0,
},
},
],
},
},
{
$project: {
results: {
$concatArrays: [
"$year_2015",
"$year_2016",
"$year_2017",
],
},
},
},
{
$unwind: "$results",
},
{
$replaceRoot: {
newRoot: "$results",
},
},
])
Data-Models Discussion
The data files, MongoDB queries and output files mentioned in this article are hosted in my Github Repo