Date Tags sql / tutorial (3 min read)

Did you know that SQL queries can be recusive? I didn't! Now I do. It's glorious.

The problem

I had some data in the following format:

thing sub_thing_id start_time end_time
1 1 Sep 1 Sep 3
1 2 Sep 1 Sep 7

And I wanted it in the following format:

thing date number_of_sub_things
1 Sep 1 2
1 Sep 2 2
1 Sep 3 2
1 Sep 4 1
1 Sep 5 1
1 Sep 6 1
1 Sep 7 1

How do?!

The solution: A recursive common table expression

Ah ha!

A recursive CTE allows you to join all the levels of a hierarchy without knowing in advance how many levels there are.

tl;dr, the query that worked was:

select
  exp.thing,
  to_date(exp.start_time) as date,
  count(*) as number_of_sub_things
  from
    (with cte
      as (select
              thing,
              sub_thing,
              start_time,
              end_time
          from some_wide_table where start_time is not null
          union all
          select thing, sub_thing,
                dateadd(day, 1, start_time),
                end_time
          from cte
          where to_date(start_time) <= (to_date(end_time)-1))
    select *
      from cte) as exp
group by 1, 2
order by dte asc;

But whaaat is happening here? I shall explain:

  1. First, create a recursive common table expression (CTE): (with cte ... from cte) as exp
  2. In that CTE, create your "anchor clause" (select thing ... from some_wide_table) and union all it to your "recursive clause" (select thing ... from cte). Your anchor clause is the first, base set of rows you will be comparing your next select statement against. Your recursive clause adds its resulting rows back to that base, and then compares that new "anchor" against the same "recursive clause" select statement.
  3. It keeps doing this until the where to_date(start_time) <= (to_date(end_time)-1)) statement is no longer satisfied. So it's (kinda) recuuuursive! Magic! And, like all recursive functions, there is a risk that you fall into an infinite loop if your where clause never catches.
  4. Once you've completed recursing around, you select * from cte - this is selecting all the resulting rows that were found as the anchor and recursive clauses did their dance.
  5. I then needed to group by the top-level thing and count(*) how many sub-things were present on each date.

Voila!

The next problem

So I actually ended up not being able to use this, because I wanted to do this arbitrary recursion over thousands of dates, millions of things, and tens of millions of sub-things. I quickly got to a max iteration count reached error from the database; a safety mechanism it has in place to prevent infinite recursions. It wasn't infinite, but it was certainly a bit of a combinatorial explosion.

So what do now?!

tl;dr: There was a much more efficient, non-recursive way to do this. Basically, make a table of dates on the fly and join against that. D'oh.

select
  exp.thing,
  to_date(exp.this_date) as dte,
  count(*) as number_of_sub_things
  from
    (select
      x.thing,
      x.sub_thing,
      dates.this_date
    from some_wide_table as x
    right join
      (select
        dateadd(day, seq4(), '2011-01-01') as this_date
        from table(generator(rowcount => 9*365))) as dates
    on to_date(x.start_timee) <= to_date(dates.this_date) and to_date(x.end_time) >= to_date(dates.this_date)
    ) as exp
  group by 1, 2
;

I don't love this solution, namely because I need to decide (and hard-code) the range of dates to compare against. For now, I'm creating a list of dates from 2011-01-01 to 9 years hence (9*365). It feels hacky. But it took a fraction of the time and no combinatorial problems! So it's good enough.