It is a truth universally acknowledged that a data scientist in possession of a good portfolio must be in want of a job. A curated selection of your projects is the best way to showcase your work, interests, and thinking to potential employers. From my experience and discussions with colleagues, I found four aspects that make a data science portfolio impressive:
✔ 𝗤𝘂𝗮𝗹𝗶𝘁𝘆 > 𝗤𝘂𝗮𝗻𝘁𝗶𝘁𝘆: It’s better to have only two complex or specialized projects than tens of repos with incomplete ones and errors. Also, psychologically, people tend to get stressed and lose interest if they are given too many options (the paradox of choice). The point is to give employers an idea of your potential and that you can complete a project from idea to presentation. For example, I have 16 repos on GitHub, but I only showcase 4-6 of them, which are completed projects.
✔ 𝗢𝗿𝗶𝗴𝗶𝗻𝗮𝗹𝗶𝘁𝘆: Wherever you study data science, you’ve most probably learned to predict the Boston house prices, classify the Iris flowers and Titanic survivors. Though these projects are a good start for learning the basics, they don’t impress anyone anymore, because they are so common. Instead, explore a new dataset of your own interest, apply different models and answer questions that you find insightful. I chose projects that reflect my interests in Linguistics (exploring a dataset on world languages), literature (exploring my Goodreads library), and NLP (doing sentiment analysis on product review).
✔ 𝗥𝗘𝗔𝗗𝗠𝗘: You wouldn’t buy a book or read a paper without checking out its summary or abstract first, to see if it’s interesting and worth your time. Same with data science projects. Add a README including the table of contents, a short description of the projects and maybe key findings. For example, I added a README describing the theory, summary, and main tools used in my psych-verb project.
✔ 𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: The more personal you make your projects and profile, the bigger their impact. It doesn’t have to be anything elaborate or too informal, but express your personality. On GitHub it’s really easy to do this with the special profile README and a custom status. I personally added a short description of my work interests and coding-related activities, and update my status depending on what I’m working on.
Creating a good data science portfolio takes time! Don’t rush it, take your time to learn and polish both your coding skills and presentation skills.