跳过正文

数据库-3作业

·386 字· loading · loading ·
Masterlong
作者
Masterlong
熬夜,但是世界之夜
目录
数据库系统 - 这篇文章属于一个选集。
§ 5: 本文

数据库作业 - chapter 3
#

使用了On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2022_2.csv中0~999条数据作为样本

各机场与航班延误的关系
#

with a as(with delay1 as (select Origin from airline where DepDelay > 0)
select *, count(*) as cnt1 from delay1 group by Origin),
b as (with delay2 as (select Dest from airline where ArrDelay > 0)
select *, count(*) as cnt2 from delay2 group by Dest)
select * from a, b where Origin = Dest order by cnt1 desc;
image-20230314202618693

注意到机场产生的航班延误中,产生终点延误多的机场常常也作为始发地具有较多的延误。

其中,JFK、ATL、LGA延误次数最多。除了JFK、CHS、EWR、SBN以外,大部分机场延误都是终点延误。

占比关系:

航班目的地与始发地对延误的影响
#

image-20230314205644815

注意到纽约、亚特兰大、伯灵顿都具有很高的航班延误次数。另外,纽瓦克与夏洛茨维尔在终点延误次数发生上也很多。

航班航行距离与平均延误时间的关系
#

with a as(with delay1 as (select * from airline where DepDelay > 0)
select Distance, avg(DepDelayMinutes) as t1 from delay1 group by Distance),
b as (with delay2 as (select * from airline where ArrDelay > 0)
select Distance, avg(ArrDelayMinutes) as t2 from delay2 group by Distance)
select * from a, b where a.Distance = b.Distance order by t1 desc;
image-20230314204509099 image-20230314205207918

航班延误与距离没有明显线性相关关系 ,以出发延误为例,在距离200~1200的范围内,大部分航班在延误在1小时以内。而延误较长的航班通常都超过2小时。

导致延误次数最多的是哪一个已知因素
#

以出发延误为例:

with delay1 as (select * from airline where DepDelay > 0)
select Year,
count(Year), 
sum(case when CarrierDelay>0 then 1 else 0 END)/count(Year),
sum(case when WeatherDelay>0 then 1 else 0 END)/count(Year),
sum(case when NASDelay>0 then 1 else 0 END)/count(Year),
sum(case when SecurityDelay>0 then 1 else 0 END)/count(Year),
sum(case when LateAircraftDelay>0 then 1 else 0 END)/count(Year)
from delay1 group by Year;
image-20230314211904080

注意到影响大小的先后顺序为NAS>LateAircraft>Carrier>Weather>Security。

with delay1 as (select * from airline where DepDelay > 0)
select OriginCityName, 
sum(case when CarrierDelay>0 then 1 else 0 END),
sum(case when WeatherDelay>0 then 1 else 0 END),
sum(case when NASDelay>0 then 1 else 0 END),
sum(case when SecurityDelay>0 then 1 else 0 END),
sum(case when LateAircraftDelay>0 then 1 else 0 END)
from delay1 group by OriginCityName;
image-20230314211053665

对于不同城市,它们延误情况的主要影响因素各不相同。

航班的日期与延误的影响
#

with a as(with delay1 as (select DayOfWeek from airline where DepDelay > 0)
select *, count(*) as cnt1 from delay1 group by DayOfWeek),
b as (with delay2 as (select DayOfWeek from airline where ArrDelay > 0)
select *, count(*) as cnt2 from delay2 group by DayOfWeek)
select a.DayOfWeek, (cnt1+cnt2) as cnt from a, b where a.DayOfWeek = b.DayOfWeek order by cnt desc;
image-20230314213939984

显然从周日、周一,周四、周五航班延误次数最多,周三延误最少。这可能和大多数人的出差/通勤习惯相关。

with a as(with delay1 as (select DayOfMonth from airline where DepDelay > 0)
select *, count(*) as cnt1 from delay1 group by DayOfMonth),
b as (with delay2 as (select DayOfMonth from airline where ArrDelay > 0)
select *, count(*) as cnt2 from delay2 group by DayOfMonth)
select a.DayOfMonth, (cnt1+cnt2) as cnt from a, b where a.DayOfMonth = b.DayOfMonth order by cnt desc;
image-20230314214138984

从每月的情况来看,可以观察到大致以一周七天为一个周期的延误次数变化。越靠近月末,延误次数就越多。

数据库系统 - 这篇文章属于一个选集。
§ 5: 本文

相关文章

数据库-c1&2作业
·251 字· loading · loading
数据库-1
·125 字· loading · loading
数据库-2
·64 字· loading · loading