Descriptive analytics: Finding the problem

Link to datasets: https://www.kaggle.com/olistbr/brazilian-ecommerce

Looking into delivery

Focus on delivered orders only

Focus on 2017 and 2018 only, since very little data from 2016.

Overall sales increased in 2018.

Counting delivery speed and delay

Problem identified: On average, delivery duration has decreased in 2018 and the delay is more than -10 days. At an average of 9 days of delivery, it can still be improved.

Descriptive analytics: Risks of long delivery time

Risk: None of the customers were returning customers.

Because all the customers have only one order and are not returning customers, revenue generated by each customer was mostly only around 50 to 150.

I have used Google Translate to translate the reviews from Portugese to English, then saved the output in a local html file. So, I am scraping from the html file.

Based on the number, I can assume that all comments were translated and successfully scraped.

Because Google Translate only accepts .xlsx, I have to read the excel back into a dataframe to map the translation back to its review id.

While the average review score is above 3, it tends to be lower for reviews that are related to delivery. Just like there is some improvement in delivery speed, there is slight improvement in this case too.

Average rating for reviews that mention about delivery has increased.

Average days taken to deliver has decreased for reviews that mentioned about delivery.

Diagnostic analytics: Why did the problem happen?

It usually takes around 45 minutes for an order to be approved. This duration has increased in 2018.

Majority of the payments were made through credit card. Credit card was also the fastest to be approved. Boleta takes around 5 hours to be approved, followed by more than 50 minutes in debit cards. In fact, more customers are using Boleto in 2018.

Although approval hours for payment method Boleto can take as much as 5 hours, majority of customer's waiting time goes into the delivery process.

Translating product category name

There is no 'others' category, so for missing category names, they are considered 'others'

The heaviest and biggest product categories are furniture.

Order weight and volume are not correlated with order delivery.

Majority of the customers live in Sao Paulo.

Majority of the sellers also live in Sao Paulo

Customized function to get distance between 2 coordinates

Distance is not a reliable predictor for delivery time taken