文章目录
  1. 1. 日期补全
    1. 1.1. 长度优化
  2. 2. 月份数据补全

日期补全

在数据统计时,有时候会出现某天没有数据的情况,比如某天某类产品销量为0,这样就没有真实订单。我们统计的时候就可能出现下面的数据:

1
2
3
4
5
    dt     | val
----------------
2023-05-06 | 12
2023-05-08 | 80
2023-05-10 | 110

我们看到在上面的数据里,5月7日和5月9日的数据是缺失的。业务上往往要求,我们要把缺失的补0.也就是需要得到下面的结果:

1
2
3
4
5
6
7
    dt     | val
----------------
2023-05-06 | 12
2023-05-07 | 0
2023-05-08 | 80
2023-05-09 | 0
2023-05-10 | 110

把5月7日和5月9日的数据补上,那么怎么实现呢?先看完整的SQL:

1
2
3
4
5
6
7
8
9
10
11
12
with a as (select '2023-05-06' as dt, 12 as val union select '2023-05-10',110 union select '2023-05-08',80),
b as (select max(dt) max_dt,min(dt) min_dt from a),
z as (select DATE(DATE_SUB(CURDATE(),INTERVAL rn day)) as dt from
(SELECT ROW_NUMBER() over() as rn
FROM
(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a
,(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
,(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c
) t
),
t as (select z.* from z,b where dt>=b.min_dt and dt <= b.max_dt )
select t.dt,ifnull(a.val,0) as val from t left join a on t.dt = a.dt order by t.dt;

看到上面的SQL 是不是有一种要放弃的感觉。太长了,确实太长了。我们来解析一下 都是干嘛的。我们这里用的是MySQL with 的语法,这个语法的好处就是可以把SQL写的贼长。然后还能年的懂。

先看:

1
with a as (select '2023-05-06' as dt, 12 as val union select '2023-05-10',110 union select '2023-05-08',80)

这一段就是我们造三行数据,实际业务中应该用真实的业务表数据替换。接下来

1
b as (select max(dt) max_dt,min(dt) min_dt from a),

这个就是找到业务里最大的日期,和最小的日期。以方便后面的补全。然后就是特别长的 z

1
2
3
4
5
6
7
8
z as (select DATE(DATE_SUB(CURDATE(),INTERVAL rn day)) as dt from
(SELECT ROW_NUMBER() over() as rn
FROM
(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a
,(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
,(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c
) t
),

这一段就是 获取最近1000天的日期。 这里用三个 select 0 一直 union 到9的表,三个表每个表有10个数字。101010 就等于1000. 然后通过 ROW_NUMBER() over() 获得行号,也就是1 至 1000 。再通过 DATE(DATE_SUB(CURDATE(),INTERVAL rn day)) as dt 转成日期。因为我们不需要这么的日期,所以需要限定一下日期的数据,也就是

1
t as (select z.* from z,b where dt>=b.min_dt and dt <= b.max_dt ) 

到了这里 就只会返回 最小日期和最大日期之间的日期了。然后 做一下表关联和排序。也就是

1
select t.dt,ifnull(a.val,0) as val from t left join a on t.dt = a.dt order by t.dt;

这样我们就得到了想要的,日期补全后的数据了。

长度优化

看到这里,有的朋友要说了,这个SQL也太长了,能不能精简一点。我们看上面的 1 到 1000 这一步,确实太长了。我们可以用视图来优化一下。我们先创建一个视图:

1
2
3
4
5
create view vn as SELECT ROW_NUMBER() over() as rn
FROM
(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a
,(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
,(SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c

这样一个 1 到 1000 的视图就创建好了。前面的SQL 也可以简化为:

1
2
3
4
5
with a as (select '2023-05-06' as dt, 12 as val union select '2023-05-10',110 union select '2023-05-08',80),
b as (select max(dt) max_dt,min(dt) min_dt from a),
z as (select DATE(DATE_SUB(CURDATE(),INTERVAL rn day)) as dt from vn),
t as (select z.* from z,b where dt>=b.min_dt and dt <= b.max_dt )
select t.dt,ifnull(a.val,0) from t left join a on t.dt = a.dt order by t.dt;

月份数据补全

对于数据:

1
2
3
4
5
    dt     | val
----------------
2023-04 | 12
2023-06 | 110
2023-07 | 80

我们可以通过下面的SQL来对期进行补全:

1
2
3
4
5
with a as (select '2023-04' as dt, 12 as val union select '2023-06',110 union select '2023-07',80),
b as (select max(dt) max_dt,min(dt) min_dt from a),
z as (select date_format(DATE_SUB(CURDATE(),INTERVAL rn month),'%Y-%m') as dt from vn),
t as (select z.* from z,b where dt>=b.min_dt and dt <= b.max_dt )
select t.dt,ifnull(a.val,0) from t left join a on t.dt = a.dt order by t.dt;

运行上面的SQL,得到:

1
2
3
4
5
6
    dt     | val
----------------
2023-04 | 12
2023-05 | 0
2023-06 | 110
2023-07 | 80

对于 月份的处理,我们用到了 date_format(DATE_SUB(CURDATE(),INTERVAL rn month),’%Y-%m’) 来把月份变成 2023-05 这样的一个格式。
这个方法,各位觉得怎么样?欢迎留言讨论

文章目录
  1. 1. 日期补全
    1. 1.1. 长度优化
  2. 2. 月份数据补全