OLIST E-COMMERCE REPORTING DASHBOARD
This article is a component of my Power BI project, which centers around OLIST, a fictitious Brazilian e-commerce platform. It utilizes Microsoft Power BI to create dashboards covering various aspects of OLIST’s Brazilian e-commerce operations, including Executive Insights, Exploratory Analysis, Customer Investigation and Satisfaction, Delivery Analysis, and Forecasting.
Executive Summary
- Data synopsis: Brazilian E-commerce Public Dataset: Retail datasets of 100k orders placed on Olist spanning between October’2016 and September’2018 across several states. Information is trickled with price, orders, order status, payment, freight and user review along with many other parameters.
Introduction
Company Background
The Olist store is an e-commerce business headquartered in Sao Paulo, Brazil. This firm acts as a single point of contact between various small businesses and the customers who wish to buy their products. Recently, they uploaded a dataset on Kaggle that contains information about 100k orders made at multiple marketplaces between 2016 to 2018. What we purchase on e-commerce websites is affected by the reviews which we read about the product posted on that website. This firm can certainly leverage these reviews to remove those products which consistently receive negative reviews. It could also advertise those items which are popular amongst the customers.
Formulation of business problem
Organizational structure (Stakeholder)
Dataset
The data consist of almost 100 000 customer id and order id. To summarize, the data consist of order detail, customer detail, product detail, seller detail, payment detail, geolocation detail and review detail. The entity-relationship model is shown in Figure 1 to help understand how the data interrelate with each other. Kaggle
Figure 1: Entitiy-relationship model diagram of Olist Dataset
Table | Description |
---|---|
olist_orders_dataset | Connected to 4 other tables; used for order details. |
olist_order_items_dataset | Contains item details: shipping date, price, etc. |
olist_order_reviews_dataset | Contains customer reviews on purchased products. |
olist_products_dataset | Product info: ID, category, measurements. |
olist_order_payments_dataset | Payment details related to orders. |
olist_customers_dataset | Customer base information for the firm. |
olist_sellers_dataset | Information on registered sellers. |
olist_geolocation_dataset | Geographical data for sellers and customers. |
Objectives of data stories
- How many customers, orders, and orders per customer does the company have?
- What is the number of customers by state?
- What is the number of orders by month?
- What are the top and bottom 5 product categories?
- Visualize the company’s customers’ demographics, sales trend, orders by categories, orders changes by year
Visualization Analytics
Data model
Entity | Relationship Type | Related Entity | Key |
---|---|---|---|
Sellers | One-to-Many | Geolocation | geolocation_zip_code_prefix |
Sellers | One-to-Many | Orders | orderID |
Sellers | Many-to-One | Brazil State | StateID |
Brazil State | One-to-Many | Customers & sellers | StateID |
Geolocation | Many-to-One | Sellers | zip code prefix |
Geolocation | Many-to-Many | Customers | zip code prefix |
Customers | One-to-One | Order Dates | customerID |
Products | Many-to-One | Product Category | ProductID |
Products | One-to-Many | Orders | ProductID |
Orders | Many-to-One | Order Dates | OrderID |
Order Dates | One-to-Many | Payments | OrderID |
Order Dates | One-to-Many | Reviews | ReviewID |
Order Dates | Many-to-One | Weekday | Date |
Table analysis
I will dive into some of these tables that were added, I have renamed them to make it easier catch and this will obtain as the final data model that allows us to make analytics.
- Orders Table is at the center of the data model. It represents the most important part of the dataset describing best what can be obtained from the data. Thus, in this case, it about the orders made by a customer buying a specific product. This table contains only primary keys that are necessary to infer knowledge about the data. I have created two formula expression using DAX through this table namely “% of sales” to divide the price of the total price and shows a percentage as a profit ratio, and “qt ordered” where this counts the unique orders.
1
% of sales = DIVIDE([price],SUM([price]),0)
1
qt order id ordered = CALCULATE(COUNT(orders[order_id]),ALLEXCEPT(orders,orders[order_id]))
The geolocation table is the Space table represented by geolocation that is now connected to the data model in order to use the information it contains for the two other tables: olist_customer and olist_seller. Indeed, with the olist_geolocation table, we are able to create secondary keys about customers and sellers to obtain knowledge about their postal code, city or state that were previously impossible to understand. Moreover, we created a hierarchy in order to go from the country and drill down to the state then the city
Order dates have a precise temporal relationship with the data time. Thanks to this table, it is possible to gain insights about temporal data by classifying it chronologically. In the “orders_date” table, we can distinguish the time difference in days between orders, allowing us to group them by the order date and obtain indicators such as deliveries or purchases. Furthermore, this table enables us to drill down from the year to the quarter, month, and day.
I have created a “delivery_days” expression that calculates the difference between the estimated date and the delivered date in days.
1
delivery_days = DATEDIFF('orders_dates'[order_estimated_DATE],'orders_dates'[order_delivered_DATE],DAY)
Additionally, there is a “delivery indicator” expression formula that indicates whether the delivery occurred before or after the estimated date. By utilizing both the “delivery_days” and “delivery indicator” expressions, we can determine how many days it takes for a delivery and how many deliveries took more than 100 days, for example, using the following DAX formula:
1
2
3
4
5
delivery_indicator = IF('orders_dates'[delivery_days]>0,"In advance",
IF('orders_dates'[delivery_days]=0,
"On time",
"Late"
))
Another clever set of expressions includes “time_day” and “time_hour,” which serve to differentiate the time between approval, delivery, and orders in days and hours, respectively.
1
time_day (approved vs delivered) = (DATEDIFF(orders_dates[order_approved_at],orders_dates[order_delivered_customer_date],DAY))
- In the review table, the goal was to generate meaningful indicators that could be analyzed to derive insights about the products. Therefore, I have introduced the “review_indicator,” which determines whether a comment has been made after a purchase or not. This indicator will provide us with information about how customers feel about the product, whether they liked it or not, and if they are inclined to share their experiences with others. The “review_indicator” categorizes the reviews based on recurrent keywords. This will enable us to identify the most descriptive aspects of the products when customers share similar opinions.
1
review_indicator = IF(reviews[review_comment_message]=="" || reviews[review_comment_message]=="-","No Comment","With Comment")
- The Brazil state table displays the state ID and city names and is modeled in a one-to-many relationship with both the sellers and customers tables. By utilizing this distinct table, we can perform analyses exclusively within Brazil, leveraging its extensive dataset, while excluding the rest of the world for descriptive analysis.
Dashboards
Figure 5: Executive Insights by Decisive Data
Frequently, the question “How are we performing?” can lead to a cascade of further questions, spinoffs, and investigative research. This is especially true for globally-oriented companies. I aimed to create a report that proactively addresses this kind of exploration. The purpose of this report is to facilitate data-driven decision-making while emphasizing user flexibility and visual analysis. As a result, this dashboard can adapt to the evolving needs of the global business.
The Executive Insights page highlights the strong focus of this dashboard on sales and customers, with the goal of fulfilling objectives, increasing customer satisfaction, and boosting sales by uncovering insights from the dashboard. This is achieved after creating three unique formulas to serve in this dashboard:
1
Total sales = SUM('orders'[price]) + SUM('orders'[freight_value])
1
count customer(unique ) = DISTINCTCOUNT(customers[customer_unique_id])
1
count orders = COUNT(orders[order_id])
Upon revisiting the figures, it becomes clear that Sao Paulo consistently leads in total sales. Despite a rapid increase in sales over the past three years, the waterfall graph does not indicate a decrease in values.
It is evident that a significant number of both late and early deliveries can be observed in the data. As illustrated, there are five orders that took more than 100 days to deliver, which undoubtedly had an impact on customer satisfaction.
Figure 6: Descriptive Analytics
Next, let’s delve deeper into descriptive analytics. We have multiple options at our disposal, such as selecting a specific day, month, or year from the slicer at the top of the page.
The table graph breaks down the product categories based on the features of their average price, the sum of prices (revenue), the profit ratio, and the number of quantities customers ordered in each category. The top-performing category is “health and beauty,” which has received the most orders. This isn’t surprising, given that females are known to spend more on fashion. This category has generated $772,238 in revenue with a profit ratio of 6%.
On the descriptive analysis page, we can observe that we’ve achieved approximately $16 million in sales, served 94,000 customers, and processed more than 100,000 orders.
Figure 6: Customer Investigation
Every business has wondered about the recent additions to their customer base. Customers are the driving force behind organizational growth. With their support, they can increase revenue, and without them, sustaining growth becomes a challenge. This is why conducting customer investigations is of paramount importance. This page illustrates that among nearly 100,000 customers.
Returning to the visuals in Figure 7, it’s observed that the majority of new customers tend to join between May and August over the course of three years. The preferred payment methods are credit card and boleto payment.
The top three customers have made purchases of more than 21 items, amounting to approximately four thousand dollars. On Mondays, the highest order quantities are observed, while on Sundays, the highest average spending per order is recorded. This pattern aligns with the common tendency for people to spend more on weekends.
Figure 7: Customer Satisfaction
When we examine the charts in Figure 8 above, it becomes evident that the overall average rating appears quite positive, with an average of 4 stars from approximately 99.5k customers. However, there is a notable decrease in the number of quantities ordered in December, following a slight increase three months prior.
Among the top five selling categories, there are approximately 40k orders out of the total 99k, while the bottom five categories haven’t exceeded 60 orders. This disparity underscores the variation in product popularity.
Furthermore, the ratio of recurrent customers has decreased by 1% compared to the two years prior. This may be attributed to the significant increase in orders during the last year, and there is no conclusive evidence to suggest that customers have become less loyal. The formula used for calculating the recurrent customers ratio is provided below:
1
recurrent customers ratio = CALCULATE(DIVIDE([recurrent count], [count customer(unique )]))
The delivery page in Figure 8, provides insight into the process of delivering products from the seller’s location to the customer’s desired destination. The freight value appears acceptable, and there were 96.4 orders delivered out of the total orders, even considering the canceled orders.
We can observe that the average review score is not very high during the first quarter. This could be attributed to longer delivery times during this period. In contrast, the third quarter exhibits higher review scores despite a relatively shorter delivery duration.
Figure 9: Forecast The question on the minds of Olist’s leaders is about their expected annual growth in the upcoming years. In any case, the annual growth appears promising, as indicated by the highest order quantities in all three years and the increased predictions for new customer acquisitions.
Insights
Recommendations
Monitor and analyze customer reviews regularly to identify trends and areas for improvement. This could involve using data analysis tools to identify common themes in customer feedback and using this information to make changes and improve the customer experience.
Investigate the causes of undelivered orders. This could involve analyzing the undelivered orders to identify common themes or factors that may be contributing to the problem. For example, are certain regions or customer demographics more likely to have undelivered orders? Are there particular products or types of orders that are more likely to be undelivered?
Communicate with customers about the delivery process. Olist should be transparent with customers about the delivery process and provide them with regular updates on the status of their orders. This will help to build trust and create a positive customer experience. It will also give customers the opportunity to provide feedback on their experiences with the delivery process, which can be used to identify areas for improvement.
Overall, the key is to continue providing high-quality products and services, while also being responsive to customer feedback and working to improve areas that may need attention. Also regularly monitor and review the delivery success rate and communicate with customers about the process.
Suggestions:
Special offerings to boost overall sales on low sales period .
Improve bottom selling categories by providing advertisements or promotions.
Outsourcing drivers for delivery during Sales or Festival periods.
Investigate and Review the partner company with low review score.
analysis customers comments and reviews provided in the dataset with NLP or any kind of language processing models.