Introduction

Getting my first data science job was hard.

It’s especially hard to break into data science when companies typically require a Master’s degree and a minimum of 2–3 years of experience. That being said, there are a number of great resources that I came across that I want to share with you.

In this article, I’m going to give you three ways where you can get practical data science experience on your own. By completing these projects, you’ll develop a strong understanding of SQL, Pandas, and machine learning modeling.

First, I’m going to provide you with real-life SQL case studies in which you’re given a business problem and are required to query databases to diagnose the problem and formulate a solution.
Second, I’m going to provide you with dozens of practice problems for Pandas, a library in Python meant for data manipulation and analysis. This will help you develop the skills that are required for data wrangling and data cleaning.
Lastly, I’m going to provide you with a variety of machine learning problems where you can develop a machine learning model to make predictions. By doing so, you’ll learn how to approach a machine learning problem, as well as the fundamental steps required to develop a machine learning model from start to finish.

With that said, let’s dive into it!

1. SQL Case Studies

If you want to be a data scientist, you have to have strong SQL skills. Mode provides three practical SQL case studies that simulate real-life business problems, as well as an online SQL editor where you can write and run queries.

To open Mode’s SQL editor, go to this link and click on the hyperlink where it says ‘Open another window to Mode’.

Learning SQL

If you’re new to SQL, I would first start with Mode’s SQL tutorials where you can learn basic, intermediate, and advanced SQL techniques. Feel free to skip this if you already have a good understanding of SQL.

Case Study 1: Investigating a Drop-in User Engagement

Link to the case.

The objective of this case is to determine the cause for a drop in user engagement for Yammer’s project. Before diving into the data, you should read the overview of what Yammer does here. There are 4 tables that you should work with.

The link to the case will provide you with much more detail pertaining to the problem, the data, and the questions that should be answered.

Check out how I approached this case study here if you’d like guidance.

Case Study 2: Understanding Search Functionality

Link to the case.

This case is more focused on product analytics. Here, you’ll be required to dive into the data and determine whether the user experience is good or bad. What makes this case interesting is that it’s up to you to determine what ‘good’ and ‘bad’ means and how the user experience will be evaluated.

Case Study 3: Validating A/B Test Results

Link to the case.

One of the most practical data science applications is performing A/B tests. In this case study, you’ll dive into the results of an A/B test where there was a 50% difference between the control and treatment groups. Your task for this case is to validate or invalidate the results after a thorough analysis.

2. Pandas Practice Problems

When I first started developing machine learning models, I found that my lack of Pandas skills was a big limitation to what I could do. Unfortunately, there aren’t many resources on the internet that allow you to practice your Pandas skills, unlike Python and SQL…

A few weeks ago, however, I came across this resource — this is a repository full of practice problems specifically for Pandas. By completing these practice problems, you’ll know how to:

Filter and sort your data
Group and aggregate data
Use .apply() to manipulate data
Merge datasets
And much more.

If you can complete these practice problems, you should be able to confidently say that you know how to use Pandas for data science projects. It will also help you out significantly for the next section.

3. Machine Learning Modeling

One of the best ways to get data science experience is by creating your own machine learning models. This means finding a public dataset, defining a problem, and solving the problem with machine learning.

Kaggle is one of the world’s largest data science communities with hundreds of datasets that you can choose from. Below are a couple of ideas that you can use to get started.

Predicting Wine Quality

Dataset here.

Image for post — Photo by Terry Vlisidis on Unsplash

This dataset contains data on various wines, their composition, and their wine quality. This can be a regression or classification problem depending on how you frame it. See if you can predict the quality of a red wine given 11 inputs (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulfates, and alcohol.

If you’d like some guidance creating a machine learning model for this dataset, check out my approach here.

Used Car Price Estimator

Dataset here.

Craigslist is the world’s largest collection of used vehicles for sale. This dataset is composed of scraped data of Craigslist and is updated every few months. Using this data set, see if you can create a dataset that predicts whether a car listing is over or underpriced.

Thanks for Reading!

I hope that you find these resources and ideas helpful in your data science journey. :)

Terence Shin

Founder of ShinTwin | Let’s connect on LinkedIn | Project Portfolio is here.

3 Ways to Get Real-Life Data Science Experience Before Your First Job

Introduction

1. SQL Case Studies

Learning SQL

Case Study 1: Investigating a Drop-in User Engagement

Case Study 2: Understanding Search Functionality

Case Study 3: Validating A/B Test Results

2. Pandas Practice Problems

3. Machine Learning Modeling

Predicting Wine Quality

Used Car Price Estimator

Thanks for Reading!

Terence Shin

Contact Form