Grocery Basket Analysis for Instacart.

  • Background.

    Instacart, an online grocery company, is seeking to discover more about its sales patterns with the end goal of developing a targeted marketing strategy. To achieve these goals, the company requires a data analyst to get them started by deriving insights and suggesting marketing strategies based on initial data and exploratory analysis.

  • Context.

    This project was completed as part of the CareerFoundry's curriculum.

    Topics covered were Python, data wrangling, data merging, deriving variables. grouping data, aggregating data, reporting in Excel, and population flows.

    This knowledge was used to provide guidance on a marketing strategy for Instacart.

    Note, while Instacart is a real company, the data set was fabricated for the purpose of this project.

  • Tools.

    Python - a programming language that can be used to manipulate data frames and create visualizations.

The Path to a Guiding a Marketing Strategy.

 
  1. Familiarized self with the data frames, so that next steps (wrangling, cleaning, etc.) could be carried out effectively.

Sample code from data exploration phase.

 

Sample table of wrangling steps.

2. Wrangled the data frames.

  • Decided which columns to drop (delete) to make the data frames easier to use.

  • Renamed columns for consistency and ease of reference.

  • Ensured each columns’ datatype was consistent and correct for future use.

  • Transposed data, so that it was in the correct format for future steps, such as cleaning and merging.

 

3. Cleaned the data frames to ensure the analysis is accurate.

  • Identified any missing values, inconsistencies, and duplicates.

Sample code from data cleaning phase.

 

Population flow from stakeholders’ report.

4. Merged the data frames, so values originally across multiple data frames can be referenced in one data frame.

  • Determined which columns to merge the data frames on.

  • Identified which type of join was appropriate for merging the datasets.

 

5. Derived new variables from existing columns.

  • Used new variables for easier understand of data (ex: new price tags) and customer profiles (ex: profiles based on customer loyalty).

Bar chart of the new variable “price_tag”, which was derived from “prices” column.

Visualization made with Python.

6. Created visualizations in Python.

  • Included visualizations in final report to communicate findings to stakeholders.

  • Based visualizations on grouped data and aggregated data, etc.

7. Developed a report for stakeholders in Excel.

  • Documented a population flow, wrangling steps, consistency checks, column derivations, visualizations, and recommendations to provide detailed understanding of the data.

Image from stakeholders’ report

Key Findings.

  1. Ads should be targeted towards the following audiences: middle income people, married people, those who are already customers.

  2. Ads, promotions, and recommendations offered on dairy eggs and produce products will likely go over well. In particular, ads/promotions/recommendations for organic whole milk, limes, strawberries, large lemons, organic avocados, organic baby spinach, organic strawberries, bags of organic bananas, and bananas may do well.

  3. Bulk, pets, alcohol, and international are departments where sales that can be much improved. It could be beneficial to focus on building sales in these departments.

Image by AbsolutVision.

Deliverables.