Project completed in week 1 (28.09.-02.10.20) of the Data Science Bootcamp at Spiced Academy in Berlin.

Our first bootcamp project was creating an animated scatterplot, using the libraries matplotlib or seaborn and imageio. The scatterplot illustrates the relationship between life expectancy and fertility rate of world’s countries from 1960 to 2015, based on the Gapminder data set:

country year population life_expectancy fertility_rate continent
0 Afghanistan 1800 3280000.0 28.21 7.0 Asia

The animated scatterplot is basically made of several overlapping static plots. The animation consists of four steps:

1. Create static scatterplots for each year in the data set.

Thescatterplots depict life_expectancy on the x axis and fertility_rate on the y rate. To make the plots even more insightful, the size of the points illustrates the population number and the color of the points illustrates the continent.

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

sns.scatterplot(x='life_expectancy',
y='fertility_rate',
hue='continent',
size='population',
sizes=(10, 1000),
legend=False,
data=gapminder_df.loc[gapminder_df['year']==year],
alpha=0.7,
palette='Set2')

plt.title(f'{year}', loc='center', fontsize=20, color='black', fontweight='bold')

plt.xlabel('Life expectancy')
plt.ylabel('Fertility rate')

2. Export the scatterplot images to a designated folder.

import imageio
import os

images = []
folder='/path/to/folder/images'

if not os.path.exists(folder):
os.mkdir(folder)

3. Join the individual images in chronological order.

filename = f'lifeexp_{year}.png'
plt.savefig(os.path.join(folder,filename))

4. Export the scatterplots sequence as a gif.
The fps (frames per second) parameter sets the speed of the animation.

imageio.mimsave(os.path.join(folder,'scatterplot.gif'), images, fps=20)

Now, putting everything together, here’s the full code and the animated scatterplot:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import imageio
import os

images = []
folder='/home/lorena/Documents/bootcamp/W1/images'

if not os.path.exists(folder):
os.mkdir(folder)

for year in range(1960, 2016):

plt.axis((0, 100, 0, 10))

sns.scatterplot(x='life_expectancy',
y='fertility_rate',
hue='continent',
size='population',
sizes=(10, 1000),
legend=False,
data=gapminder_df.loc[gapminder_df['year']==year],
alpha=0.7,
palette='Set2')

plt.title(f'{year}', loc='center', fontsize=20, color='black', fontweight='bold')
#plt.title(f'inspired by Hans Rosling', loc='right', fontsize=10, color='grey', style='italic', pad=-20)

#plt.legend(bbox_to_anchor=(0.74, 0.85), loc='center')

plt.xlabel('Life expectancy')
plt.ylabel('Fertility rate')
#plt.annotate({country}, )

filename = f'lifeexp_{year}.png'

plt.savefig(os.path.join(folder,filename))

plt.figure()

imageio.mimsave(os.path.join(folder,'scatterplot.gif'), images, fps=20) Friday Lightning Talk

Each week, we get a main dataset and several tasks to apply the concepts learned throughout the week. On Fridays, we present in 5 minutes a particular finding from our weekly challenge project, a chart, new library, (un)solved bugs, or anything that is worth sharing and helpful for others. This Friday, I chose to talk about five new bash commands for checking the installed Python libraries, their versions and dependencies.

function command
to list all installed libraries pip list
to list only outdated libraries pip list -o (or --outdated)
to list only the latest / up to date libraries pip list -u (or --uptodate)
to show all information about a library pip show <package-name>
to list all libraries installed in a specific environment conda list -n <environment-name>