Week 6 (02.11.-06.11.)
- Topic: Databases
- Lessons: Postgres (from Python), Data Modelling, (Advanced) SQL Queries, Joins & Foreign Keys, (Cloud) Databases, Cloud Computing
- Project: Build a dashboard on top of a Postgres database that runs in the AWS cloud
- Dataset: Northwind
- Code: GitHub
This was another week packed with new information and experiences! In only five days, I managed to set up a Postgres database, deploy it on AWS RDS, connect it to Metabase to create a dashboard and in turn connect it to AWS EC2 to make the dashboard run continuously.
Getting the data
The Northwind database is a sample database created by Microsoft for tutorials and testing purposes. It contains information about the business of “Northwind Trades”, a fictional company who sells various food products. It includes information about orders, inventory, purchasing, suppliers, shipping, and employees. For my project, to make things a bit more complicated, I got the data in the form of several .csv files from here. I also got a .csv file with the country names and their ISO_A2 codes from here, in order to plot country data on a world map (I only realized this when creating the dashboard, so now you can spare yourself the frustration).
Setting up a Postgres database
Next, I installed Postgres (and pgAdmin, to query easier than in the command line), set up a server, and created my local northwind database.
First, I created tables for each of the files: each table was named like the .csv file it was designed for (e.g. order_details.csv file -> order_details table); in each table, I added columns that designated the columns in the respective .csv file. Here it’s important to note that the column names in the .csv files are written in camel case (e.g. orderID, shipCountry), and in PostgreSQL unquoted names are case-insensitive, but quoted names are case-sensitive. This means that you have to either specify the column names in quotes, or rename them in lowercase only, in order for the data to be imported; I chose the latter option and renamed the columns (e.g. order_id, ship_country).
Next, I imported the .csv files into the newly created tables and set primary and foreign keys for the tables, in order to be able to query across them:
- A primary key is the main key, like the index, of a table.
- A foreign key points to another table to connect.
Hosting the project in the cloud
First, I deployed my local database to RDS, which basically means that I sent my database to run on a bigger, faster, stronger computer in Frankfurt. In RDS, you can select from multiple locations around the world, and though this doesn’t alter the way your database is run, you might want to choose a location closest to your actual physical location, so that it runs faster, and also consider the data privacy measures in different countries.
Next, I set up an EC2 instance, which basically means that I rented a virtual machine on which to run my dashboard continuously.
Creating a dashboard
Now to the creative part! I connected Metabase to my remote database and EC2 instance and started creating a dashboard that represents KPIs of Northwind Trades. I focused only on three areas from my data and split it into two parts:
- The Sales part includes data from the Orders and Products tables. I visualized a gauge with the total number of orders, a trend count of the total order value compared to the previous year, a line plot of the number of orders per day, a bar chart of the best-selling products, and two choropeth maps of the number of orders in the world and in the US states.
- The Team part includes data from the Employees, Customers, and Orders tables. I visualized a table with the name of the employee who supported most customers, one with the employee who filled most orders, and one with the names, roles, birthdays, and seniority in the company of all employees.
Friday Lightning Talk
A the end of this week, I present my dashboard and talked about the challenges I’ve encountered while creating it, as well as solutions and possible improvements.