Week 1/12 #DataScienceBootcamp

September 28th, 2020 – the end of a month, the beginning of a new challenge: Today I start the Data Science Bootcamp at Spiced Academy in Berlin. For the next 12 weeks, I’ll be learning about machine learning, NLP, data infrastructure, and software engineering, work on different small projects, and present my final big project on December 21st.

I feel it’s going to be an exciting (yet stressful) couple of months, so I decided to start a series of blog posts in which I share my weekly progress: topics I learn, projects I work on, people I meet, and the overall bootcamp experience! It’s a good way to keep myself accountable and track my development, with highs and lows. And most importantly, I would like this bootcamp diary to be helpful, inspiring, or motivational for my readers 🙂

Week 1 (28.09.-02.10.)

  • Topic: Visual Data Analysis
  • Lessons: bash, git, pandas, seaborn, descriptive stats
  • Dataset: Gapminder
  • Project: Create an animated scatter plot in Hans Rosling style.
  • Code: GitHub

I started the first day of the bootcamp in the comfort of my home! Due to the COVID-19 restrictions, me and the other 11 students were split into two groups, alternating daily between online and on-site attendance. It’s not the best experience, but so far it went pretty well. The first day started at 9:30 sharp with a short welcome and information presentation from our project manager and teachers.

We then introduced ourselves, and I was surprised at how diverse our Stochastic Sage cohort was, including people with backgrounds in Physics, Math, Finance, and Social Sciences, from coding newbies to experts. Another surprising fact was the even split between Linux, Mac, and Windows users! I’m curious whether this will change by the end of the course (come to the Linux side)…

Soon afterwards, the actual coding (and bugs) started – right from the basics. One day we spent several hours on writing a FizzBuzz function. On another day, two hours only for cloning two repos and creating a branch. Another class on how to use the command line. And several session of setting up Python and Jupyter Notebooks.

Then we dived into descriptive statistics and data visualization with pandas and seaborn. Our challenge for the week was to create an animated scatterplot with matplotlib/seaborn and imageio, depicting the relationship between life expectancy and fertility rate of world’s countries from 1960 to 2015, with combined data from Gapminder datasets. Here’s the result:

Friday Lightning Talk

Each week, we get a main dataset and several tasks to apply the concepts learned throughout the week. On Fridays, we present in 5 minutes a particular finding from our weekly challenge project, a chart, new library, (un)solved bugs, or anything that is worth sharing and helpful for other too. This Friday, I chose to play hacker and talk about five new bash commands for checking the installed Python libraries, their versions and dependencies.

functioncommand
to list all installed librariespip list
to list only outdated librariespip list -o (or –outdated)
to list only the latest / up to date librariespip list -u (or –uptodate)
to show all information about a librarypip show <package-name>
to list all libraries installed in a specific environmentconda list -n <environment-name>

Overall…

The first week technically underwhelming, no brain teasers, Stackoverflow-worthy bugs, unrealistic deadlines, all-nighters. But it’s always good to review basic concepts.

What I found most enriching was the exchange with my fellow students, which showed me different perspectives on a dataset or approaches to solve a problem. For example, a colleague looked in particular at the life expectancy for countries like Rwanda, China, and Cambodia, and explained the low peaks at certain moments in time with specific historic moments like the Rwandan genocide.

Also, I’ve learned from our teachers that bugs and mistakes are worth talking about! I used to panic anytime I got an error, my code didn’t work, or I didn’t get the expected results. But these are all issues that come up regularly in real-world projects and are part of every developer’s work. It’s important to discuss them and steps you took to (try to) overcome them. Maybe it’s just a missed comma, maybe it’s a serious problem – you won’t know until you present it to someone else.

Comments are closed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: