!pip install sqlalchemy matplotlib

Requirement already satisfied: sqlalchemy in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (2.0.38)
Requirement already satisfied: matplotlib in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (3.10.1)
Requirement already satisfied: typing-extensions>=4.6.0 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from sqlalchemy) (4.12.2)
Requirement already satisfied: greenlet!=0.4.17 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from sqlalchemy) (3.1.1)
Requirement already satisfied: pyparsing>=2.3.1 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (3.2.3)
Requirement already satisfied: pillow>=8 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (11.2.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (4.57.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: packaging>=20.0 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (25.0)
Requirement already satisfied: numpy>=1.23 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (2.2.5)
Requirement already satisfied: cycler>=0.10 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (1.4.8)
Requirement already satisfied: contourpy>=1.0.1 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from matplotlib) (1.3.2)
Requirement already satisfied: six>=1.5 in /home/kafa46/workspace/web-apps/deepshark/venv/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)

# Configure your Chinook DB path here
CHINOOK_DB = "./chinook.db"  # <-- CHANGE THIS

assert CHINOOK_DB and CHINOOK_DB.endswith(".db"), "Please set CHINOOK_DB to your chinook.db path"

# Imports
import pandas as pd
from sqlalchemy import create_engine, inspect, text
import matplotlib.pyplot as plt

# Create Engine
engine = create_engine(f"sqlite:///{CHINOOK_DB}", future=True)
print(f"sqlalchemy engine created >>> {engine}")

# Create Inspector
inspector = inspect(engine)

# List all tables
tables = inspector.get_table_names()
for index, table in enumerate(tables):
    print(f"{index+1}: {table}")

sqlalchemy engine created >>> Engine(sqlite:///./chinook.db)
1: albums
2: artists
3: customers
4: employees
5: genres
6: invoice_items
7: invoices
8: media_types
9: playlist_track
10: playlists
11: tracks

def load_core_tables(engine):
    with engine.connect() as conn:
        tracks = pd.read_sql(text("SELECT * FROM tracks"), conn)
        invoice_items = pd.read_sql(text("SELECT * FROM invoice_items"), conn)
        invoices = pd.read_sql(text("SELECT * FROM invoices"), conn)
        customers = pd.read_sql(text("SELECT * FROM customers"), conn)
        albums = pd.read_sql(text("SELECT * FROM albums"), conn)
        artists = pd.read_sql(text("SELECT * FROM artists"), conn)
        genres = pd.read_sql(text("SELECT * FROM genres"), conn)

    return {
        "tracks": tracks,
        "invoice_items": invoice_items,
        "invoices": invoices,
        "customers": customers,
        "albums": albums,
        "artists": artists,
        "genres": genres,
    }

tables = load_core_tables(engine)

# 각 테이블 이름과 (행, 열) 형태 출력
print("\nTable Shapes:")
for name, df in tables.items():
    print(f"{name:15s}\t→\t{df.shape}")

Table Shapes:
tracks         	→	(3503, 9)
invoice_items  	→	(2240, 5)
invoices       	→	(412, 9)
customers      	→	(59, 13)
albums         	→	(347, 3)
artists        	→	(275, 2)
genres         	→	(25, 2)

# tracks 테이블의 마지막 5개 행 미리보기
print("\nTracks Table Sample:")
tables['tracks'].tail()

Tracks Table Sample:

def build_dim_track(df: dict[str, pd.DataFrame]) -> pd.DataFrame:
    """
    Build a 'track dimension table' by joining multiple Chinook tables:
    tracks, albums, artists, and genres.

    Parameters
    ----------
    df : dict[str, pd.DataFrame]
        A dictionary containing DataFrames for each Chinook table,
        such as 'tracks', 'albums', 'artists', and 'genres'.

    Returns
    -------
    pd.DataFrame
        A new DataFrame containing track-level information
        enriched with album title, artist name, and genre name.
    """

    # Copy the original 'tracks' DataFrame to avoid modifying the input directly
    tracks = df["tracks"].copy()

    # Select relevant columns from 'albums' and rename 'Title' → 'AlbumTitle'
    albums = (
        df["albums"][["AlbumId", "Title", "ArtistId"]]
        .rename(columns={"Title": "AlbumTitle"})
    )

    # Rename the artist name column from 'Name' → 'ArtistName'
    artists = df["artists"].rename(columns={"Name": "ArtistName"})

    # Rename the genre name column from 'Name' → 'GenreName'
    genres = df["genres"].rename(columns={"Name": "GenreName"})

    # Merge all related information:
    #   1. tracks + albums  → adds album title and artist id
    #   2. + artists        → adds artist name
    #   3. + genres         → adds genre name
    dim_track = (
        tracks.merge(albums, on="AlbumId", how="left")
              .merge(artists, on="ArtistId", how="left")
              .merge(genres[["GenreId", "GenreName"]], on="GenreId", how="left")
    )

    return dim_track

# Build the dimension table
dim_track = build_dim_track(tables)

# Display the first few rows to verify the merge
dim_track.tail()

## Top 10 Tracks by Revenue

# 1. Copy relevant columns from the invoice_items table
invoice_items_df = tables["invoice_items"][["InvoiceId", "TrackId", "UnitPrice", "Quantity"]].copy()

# 2. Compute line-level revenue (price × quantity)
invoice_items_df["LineRevenue"] = invoice_items_df["UnitPrice"] * invoice_items_df["Quantity"]

# 3. Merge with track-related info (title, artist, album, genre)
track_sales_df = (
    invoice_items_df
    .merge(
        dim_track[["TrackId", "Name", "ArtistName", "AlbumTitle", "GenreName"]],
        on="TrackId",
        how="left"
    )
)
track_sales_df.head()

by_genre = (invoice_items_df.merge(dim_track[["TrackId","GenreName"]], on="TrackId", how="left")
              .groupby("GenreName", as_index=False)["LineRevenue"].sum()
              .sort_values("LineRevenue", ascending=False))
by_genre.head(10)

# Visualize revenue by genre
plt.figure(figsize=(8,4))
plt.bar(
    x=by_genre["GenreName"],
    height=by_genre["LineRevenue"]
)
plt.title("Revenue by Genre")
plt.xlabel("Genre")
plt.ylabel("Revenue")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

## Revenue by Country

# 1. Select essential columns from the invoices table
#    Includes invoice ID, billing country, and invoice date.
invoices_df = tables["invoices"][["InvoiceId", "BillingCountry", "InvoiceDate"]].copy()

# 2. Select relevant columns from the invoice_items table
#    Each line item contains unit price and quantity for a given invoice.
invoice_items_df = tables["invoice_items"][["InvoiceId", "UnitPrice", "Quantity"]].copy()

# 3. Compute line-level revenue (UnitPrice × Quantity)
#    This gives the total sales amount for each individual item.
invoice_items_df["LineRevenue"] = invoice_items_df["UnitPrice"] * invoice_items_df["Quantity"]

# 4. Merge invoice header info (country, date) with line-level revenue
#    Use inner join on InvoiceId to align each invoice item with its country.
merged_sales_df = invoice_items_df.merge(invoices_df, on="InvoiceId", how="inner")

# 5. Group data by BillingCountry and sum up total revenue per country
#    Set as_index=False to keep 'BillingCountry' as a column rather than an index.
country_revenue_df = (
    merged_sales_df
    .groupby(by="BillingCountry", as_index=False)["LineRevenue"]
    .sum()
    .sort_values(by="LineRevenue", ascending=False)
)

# 6. Display the top 10 countries by revenue
country_revenue_df.head(10)

plt.figure(figsize=(10,5))
plt.bar(
    x=country_revenue_df["BillingCountry"],
    height=country_revenue_df["LineRevenue"]
)
plt.title("Revenue by Country")
plt.xlabel("Country")
plt.ylabel("Revenue")
plt.xticks(rotation=60, ha="right")
plt.tight_layout()
plt.show()

	TrackId	Name	AlbumId	MediaTypeId	GenreId	Composer	Milliseconds	Bytes	UnitPrice
3498	3499	Pini Di Roma (Pinien Von Rom) \ I Pini Della V...	343	2	24	None	286741	4718950	0.99
3499	3500	String Quartet No. 12 in C Minor, D. 703 "Quar...	344	2	24	Franz Schubert	139200	2283131	0.99
3500	3501	L'orfeo, Act 3, Sinfonia (Orchestra)	345	2	24	Claudio Monteverdi	66639	1189062	0.99
3501	3502	Quintet for Horn, Violin, 2 Violas, and Cello ...	346	2	24	Wolfgang Amadeus Mozart	221331	3665114	0.99
3502	3503	Koyaanisqatsi	347	2	10	Philip Glass	206005	3305164	0.99

	TrackId	Name	AlbumId	MediaTypeId	GenreId	Composer	Milliseconds	Bytes	UnitPrice	AlbumTitle	ArtistId	ArtistName	GenreName
3498	3499	Pini Di Roma (Pinien Von Rom) \ I Pini Della V...	343	2	24	None	286741	4718950	0.99	Respighi:Pines of Rome	226	Eugene Ormandy	Classical
3499	3500	String Quartet No. 12 in C Minor, D. 703 "Quar...	344	2	24	Franz Schubert	139200	2283131	0.99	Schubert: The Late String Quartets & String Qu...	272	Emerson String Quartet	Classical
3500	3501	L'orfeo, Act 3, Sinfonia (Orchestra)	345	2	24	Claudio Monteverdi	66639	1189062	0.99	Monteverdi: L'Orfeo	273	C. Monteverdi, Nigel Rogers - Chiaroscuro; Lon...	Classical
3501	3502	Quintet for Horn, Violin, 2 Violas, and Cello ...	346	2	24	Wolfgang Amadeus Mozart	221331	3665114	0.99	Mozart: Chamber Music	274	Nash Ensemble	Classical
3502	3503	Koyaanisqatsi	347	2	10	Philip Glass	206005	3305164	0.99	Koyaanisqatsi (Soundtrack from the Motion Pict...	275	Philip Glass Ensemble	Soundtrack

	InvoiceId	TrackId	UnitPrice	Quantity	LineRevenue	Name	ArtistName	AlbumTitle	GenreName
0	1	2	0.99	1	0.99	Balls to the Wall	Accept	Balls to the Wall	Rock
1	1	4	0.99	1	0.99	Restless and Wild	Accept	Restless and Wild	Rock
2	2	6	0.99	1	0.99	Put The Finger On You	AC/DC	For Those About To Rock We Salute You	Rock
3	2	8	0.99	1	0.99	Inject The Venom	AC/DC	For Those About To Rock We Salute You	Rock
4	2	10	0.99	1	0.99	Evil Walks	AC/DC	For Those About To Rock We Salute You	Rock

	GenreName	LineRevenue
17	Rock	826.65
12	Latin	382.14
13	Metal	261.36
1	Alternative & Punk	241.56
22	TV Shows	93.53
11	Jazz	79.20
2	Blues	60.39
6	Drama	57.71
4	Classical	40.59
15	R&B/Soul	40.59

	BillingCountry	LineRevenue
22	USA	523.06
5	Canada	303.96
10	France	195.10
4	Brazil	190.10
11	Germany	156.48
23	United Kingdom	112.86
7	Czech Republic	90.24
19	Portugal	77.24
13	India	75.26
6	Chile	46.62

Chinook Analysis — pandas + SQLAlchemy¶

Requirements¶

Usage¶

Load core tables into DataFrames¶

Helper: Build a track dimension table (`track` + `album` + `artist` + `genre`)¶

Analysis: Top 10 Most in terms of Revenue¶

Revenue by genre¶

Revenue by country¶

Chinook Analysis — pandas + SQLAlchemy¶

Requirements¶

Usage¶

Load core tables into DataFrames¶

Helper: Build a track dimension table (track + album + artist + genre)¶

Analysis: Top 10 Most in terms of Revenue¶

Revenue by genre¶

Revenue by country¶

Helper: Build a track dimension table (`track` + `album` + `artist` + `genre`)¶