Analyzing Sales Performance Using SQL

Proejct Goals:

In this SQL project, we need to answer allowing questions:

  1. What is the most popular genre in USA?
  2. Which new albums the record store should purchase?
  3. Who is the most productive sales support agent?
  4. How much our customers spent on average per order in different countries?
  5. Shall we keep both purchase options open? Albums vs Individual Tracks

We’ll use the Chinook database in this project and here is a copy of the database schema is below:

Step 1.Connent to the Database

conda install -yc conda-forge ipython-sql
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.
%%capture
%load_ext sql
%sql sqlite:///chinook.db

Step 2. Overview of the Data And Views

To run SQL queries in Jupyter Notebook, we have to add %%sql on its own line to the start of our query. So to execute the query above:

%%sql
SELECT
    name,
    type
FROM sqlite_master
WHERE type IN ("table","view");
 * sqlite:///chinook.db
Done.
name type
album table
artist table
customer table
employee table
genre table
invoice table
invoice_line table
media_type table
playlist table
playlist_track table
track table

%%sql

WITH usa_track_sold AS
   (SELECT il.* from invoice_line il
    INNER JOIN invoice i on il.invoice_id = i.invoice_id
    INNER JOIN customer c on i.customer_id = c.customer_id
    WHERE c.country = "USA"
   )
    
SELECT g.name genre, 
       count(uts.invoice_line_id) track_sold,
       cast(count(uts.invoice_line_id) AS FLOAT)/(
         SELECT COUNT(*) from usa_track_sold) percentage_sold
from usa_track_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id 
GROUP BY genre
ORDER BY track_sold DESC;

 * sqlite:///chinook.db
Done.
genre track_sold percentage_sold
Rock 561 0.5337773549000951
Alternative & Punk 130 0.12369172216936251
Metal 124 0.11798287345385347
R&B/Soul 53 0.05042816365366318
Blues 36 0.03425309229305423
Alternative 35 0.03330161750713606
Pop 22 0.02093244529019981
Latin 22 0.02093244529019981
Hip Hop/Rap 20 0.019029495718363463
Jazz 14 0.013320647002854425
Easy Listening 13 0.012369172216936251
Reggae 6 0.005708848715509039
Electronica/Dance 5 0.004757373929590866
Classical 4 0.003805899143672693
Heavy Metal 3 0.0028544243577545195
Soundtrack 2 0.0019029495718363464
TV Shows 1 0.0009514747859181732

Q2. Which new albums the record store should purchase?

Record store has just signed a deal with a new record label, and we need to select the first three albums that will be added to the store, from a list of four.

Artist Name Genre
Regal Hip-Hop
Red Tone Punk
Meteor and the Girls Pop
slim Jim Bites Blues

Based on the sales of tracks accross different genres in the USA, we should purchase the new albums by the following artists:

  • Red Tone (Punk)
  • Slim Jim Bites (Blues)
  • Meteor and the Girls (Pop)

These three genres only accounts 17% of total sales, so we should lookout for artists and albums for the ‘rock’ genre, which make up 53% of sales.

Step 4. Analyzing Employee Sales Performance

Q3. Who is the most productive sales agent?

%%sql

WITH customer_support_rep_sales AS
    (
     SELECT
         i.customer_id,
         c.support_rep_id,
         SUM(i.total) total
     FROM invoice i
     INNER JOIN customer c ON i.customer_id = c.customer_id
     GROUP BY 1,2
    )

SELECT
    e.first_name || " " || e.last_name employee,
    e.hire_date,
    SUM(csrs.total) total_sales
FROM customer_support_rep_sales csrs
INNER JOIN employee e ON e.employee_id = csrs.support_rep_id
GROUP BY 1;
 * sqlite:///chinook.db
Done.
employee hire_date total_sales
Jane Peacock 2017-04-01 00:00:00 1731.5099999999998
Margaret Park 2017-05-03 00:00:00 1584.0000000000002
Steve Johnson 2017-10-17 00:00:00 1393.92

ane Peacock is the most productive employee in the store, and there is a 20% difference in sales between Jane and Steve. However, there is six months difference in their hiring date. This may explain why Steve ranks at the bottom.

Step 5. Analyzing Sales by Country

Q4. How much our customers spent on average per order in different countries?

We need to analyze the sales data for customers from each different country on:

  • total number of customers
  • total value of sales
  • average value of sales per customer
  • average order value

Since some countries with only one customer, we will group these customers as “Other.” In this project, we want to keep “Others” to the bottom of our results. Inside the subquery, we selected all the values from invoce_line (il) and customer_id and added a country column using a case statement before sorting using that new column in the outer query.

%%sql

WITH country_or_other AS
    (
     SELECT
       CASE
           WHEN (
                 SELECT count(*)
                 FROM customer
                 where country = c.country
                ) = 1 THEN "Other"
           ELSE c.country
       END AS country,
       c.customer_id,
       il.*
     FROM invoice_line il
     INNER JOIN invoice i ON i.invoice_id = il.invoice_id
     INNER JOIN customer c ON c.customer_id = i.customer_id
    )
    
SELECT
        country,
        count(distinct customer_id) customers,
        SUM(unit_price) total_sales,
        SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,
        SUM(unit_price) / count(distinct invoice_id) average_order,
        CASE
            WHEN country = "Other" THEN 1
            ELSE 0
        END AS sort
    FROM country_or_other
    GROUP BY country
    ORDER BY sort ASC, total_sales DESC
 * sqlite:///chinook.db
Done.
country customers total_sales customer_lifetime_value average_order sort
USA 13 1040.490000000008 80.03769230769292 7.942671755725252 0
Canada 8 535.5900000000034 66.94875000000043 7.047236842105309 0
Brazil 5 427.68000000000245 85.53600000000048 7.011147540983647 0
France 5 389.0700000000021 77.81400000000042 7.781400000000042 0
Germany 4 334.6200000000016 83.6550000000004 8.161463414634186 0
Czech Republic 2 273.24000000000103 136.62000000000052 9.108000000000034 0
United Kingdom 3 245.52000000000078 81.84000000000026 8.768571428571457 0
Portugal 2 185.13000000000022 92.56500000000011 6.3837931034482835 0
India 2 183.1500000000002 91.5750000000001 8.72142857142858 0
Other 15 1094.9400000000085 72.99600000000056 7.448571428571486 1

Based on the data, there maybe opportunities in the following countries:

  • Czech Republic
  • United Kingdom
  • India

It’s worth keeping in mind that the amount of data from each of these countries is relatively small. Thus, we should be cautious about spending too much money on new marketing campaigns, as the sample size is not large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing new customers to make sure that these trends hold with new customers.

Step 6. Albums vs Individual Tracks

Q5. Shall we keep both purchase options open?

%%sql

WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ) percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track ifs
    )
GROUP BY album_purchase;
 * sqlite:///chinook.db
Done.
album_purchase number_of_invoices percent
no 500 0.8143322475570033
yes 114 0.18566775244299674

Album purchases account for 18.6% of sales. Based on the above table. We should encourage customers to purchase only tracks from the album and keep both purchase options available for potential customers.

Conclusion:

Based on our data, Rock is the most popular genre in the United States, and we recommend record store lookout for artists and albums for the ‘rock’ genre. The business should select the following new record labels based on historical sales data: Red Tone, Slim Jim Bites & Meteor and the Girls.

In terms of employee performance, Jane Peacock is the top-performing agent. We recommend giving her a promotion and letting her open a workshop to share her experience with other employees. Besides that, our record store might have opportunities to expand into the Czech Republic, UK and India. However, we recommend conducting more market research before spending a large campaign budget in these countries.

Finally, we would recommend the record store keeping both ‘tracks and the album’ purchase options available for our customers.