Flight Delays from San Diego International Airport.
-
Background.
In 2021, 14.5% of flights departing from San Diego International Airport offered by Southwest Airlines, American Airlines, United Airlines, and Delta Airlines were delayed. According to Statistica*, these four airlines held the largest portions of domestic market share in 2021.
The purpose of this project is to explore the flight delays out of San Diego International Airport from Southwest Airlines, American Airlines, United Airlines, and Delta Airlines, to discover the main causes of flight delays from 2017-2021. Identifying leading causes of flight delays can point to areas where these airlines can improve.
*Statistic: https://www.statista.com/statistics/250577/domestic-market-share-of-leading-us-airlines/
-
Context.
This project was completed as part of the CareerFoundry's curriculum.
Topics covered were sourcing data, geographical visualizations, supervised machine learning (regression), unsupervised machine learning (clustering), time-series analysis, and dashboards.
This knowledge was used to explore and analyze flight delay data.
-
Tools.
Python - a programming language that can be used to manipulate data frames and create visualizations.
Tableau - software for data visualization.
The Path to Uncovering Causes of Delays.
Sourced open data from the US Bureau of Transportation.
Decided which data sets to be used in project.
Profiled datasets and documented related information, such as collection method, contents, limitations and relevance.
2. Wrangled, cleaned, and merged datasets.
Determined which variables from each dataset were necessary for analysis.
3. Derived new variables from the main dataset.
Formulated conditions through which new variables should be generated.
4. Familiarized self with data further through exploratory analysis.
Created initial visualizations (heat maps, scatterplots, pairplots, histograms, and catplot) to guide analysis and further understanding of data.
Developed an understanding of the data that allowed later advanced analysis to be conducted smoothly.
5. Constructed a hypothesis and additional questions to guide later analysis.
6. Designed a choropleth map through use of folium, a library allowing for geospatial visualization, and a json file, which contained the coordinates of states, to locate flight destinations.
7. Used supervised and unsupervised machine learning to evaluate trends in data.
Established which type of algorithm best fits the needs of the data.
Analyzed the results of the algorithms.
8. Built a Tableau storyboard and GitHub repository to document project findings and process.
Incorporated the art of storytelling to convey findings to audience.
Selected visualization type and format for the storyboard, resulting in a cohesive presentation
Key Findings.
Flight destination offerings and popularity changes out of San Diego International Airport based on the airline carrier.
The number of flights offered out of San Diego International airport also varies by airline carrier.
Overall, the four airlines investigated in this project had similar overall delay trends from 2017-2021 out of San Diego International Airport.
The top three causes of delayed flights for all airlines are: delays caused by the carrier., delays caused by a late aircraft arrival, and delays due to the national aviation system.
Each airline struggles with the type of delay a different degree for flights out of San Diego International Airport.
Longer delays are likely happening due to causes that do not occur during the flight.
Deliverables.
-
Tableau Storyboard
-
GitHub Repository
Next Step.
While the above information fulfills the requirements of the project, a next step may include working with the airlines to develop a plan to address the types of delays they are struggling with and monitoring the impact of this plan.