Final Project: Spotify Song Genre Prediction

UC Berkeley MIDS Program

Shuo Wang, Ivan Escalona, Daisy Khamphakdy, Iris Lew, Amanda Teschko

December 2022

Final Project: Spotify Song Genre Prediction

Project Overview

This project is to build machine learning learning models to have the music app automatically recognize a song’s genre when a song is added to its database, rather than manually classifying a song genre.

Dataset

Kaggle link: https://www.kaggle.com/datasets/mrmorj/dataset-of-songs-in-spotify

Dimension: 42,305 rows x 22 columns

Features: ‘danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature', 'genre', 'song_name', 'Unnamed: 0', 'title'

Data dictionary:

https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features

https://docs.google.com/document/d/1LWl88F8wGY1WkkOSzzVzwSYij1yeMR8hFXllRpkMyrI/edit

EDA

Underground rap was the most popular genre
Imbalance in the records per genre
Some numeric features are on different scales
Some fields has high levels of missing data
Some rows are duplicated
Some tracks are mapped to more than one genre

Feature Engineering

Models

Baseline Models: ALWAYS predict the most most popular genre from the raw data (Underground Rap)
Random Forest
XGBoost
Neural Networks
K-Means
K-Nearest Neighbors
Logistic Regres

Results and Discussion

Feature scaling and balancing data is crucial for some models
The benefits of different feature engineering techniques will vary from model to model
Establishing a baseline gave us better appreciation for our model, even though the accuracy wasn’t objectively high
More investigation into balancing techniques could be helpful

Helpful Information

Environment

Google Colaboratory

Back-To-Top

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
EDA		EDA
Images		Images
NewSourceInvestigation		NewSourceInvestigation
Old versions		Old versions
Team Meeting Notes		Team Meeting Notes
.DS_Store		.DS_Store
Baseline Presentation_ Spotify Song Genre Prediction.pptx		Baseline Presentation_ Spotify Song Genre Prediction.pptx
FINAL Presentation_ Spotify Song Genre Prediction.pptx		FINAL Presentation_ Spotify Song Genre Prediction.pptx
Master.ipynb		Master.ipynb
README.md		README.md
genres_v2.csv		genres_v2.csv
playlists.csv		playlists.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project: Spotify Song Genre Prediction

Project Overview

Dataset

EDA

Feature Engineering

Models

Results and Discussion

Helpful Information

Environment

About

Releases

Packages

Languages

Shuo-Wang-UCBerkeley/Song-Genre

Folders and files

Latest commit

History

Repository files navigation

Final Project: Spotify Song Genre Prediction

Project Overview

Dataset

EDA

Feature Engineering

Models

Results and Discussion

Helpful Information

Environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages