Simple (Date) Range Overlap Detection

I started down this particular rabbit hole when (a) I had to check for date range overlaps when worrying about scheduling appraisals; and (b) my attempts to use SQL BETWEEN resulted in complicated, hard-to-read SQL.

Now you would think that BETWEEN should give you easy-to-understand SQL for date ranges. Maybe in the hands of others BETWEEN does, but for me by the time I accounted for all of the cases the BETWEEN-based SQL date range overlap detection got more complicated than I thought it needed to be. So I drew up some diagrams, which eventually led me to the realization that this:

dateA.start <= dateB.end
        AND
dateB.start <= dateA.end

is all that you need. No BETWEEN needed, simple to read, and should work in any useful dialect of SQL.

But there is an even simpler way to think about this (though the written expression is more complicated). In plain English, "there is no overlap if one range starts after the other one ends." In SQL that overlap detection looks like:

NOT (dateA.start > dateB.end)
        AND
NOT (dateB.start > dateA.end)

So the above SQL formulation for range overlap detection:

dateA.start <= dateB.end
        AND
dateB.start <= dateA.end

is simpler but is harder to reason about (IMHO).

As a (tiny) bonus, in ASCII art the no-overlap case always looks similar to:

| range A |
                        | range B |

There are several overlap cases (drawing these is left as an exercise to the gentle reader).

3 Comments

The way I prefer to think of it is

NOT (
rangeB.end < rangeA.start
OR
rangeB.start > rangeA.end
)

The prose version of that is the the most natural way I can think of to express this logic in human language:

“The ranges can’t overlap if range B either ends before or starts after range A.”

The implication is “The ranges must overlap if it does neither.” But I find this phrasing just a little more cumbersome to follow.

The latter phrasing is almost the prose version of your NOT (…) AND NOT (…) formulation, except your phrasing additionally switches the place of the ranges between the terms, in order to keep the relation the same. I find that makes it a little harder to follow still.

The first phrasing is the one that I think makes it most intuitively and immediately obvious that this logic is correct.

Note well however that the reason why any of this works is because it all relies on an invariant that is usually overlooked: reverse ranges must be disallowed, i.e. assert(range.start <= range.end).

If this was not the case then you would have to write out the complicated full-shebang logic. But by assuming this precondition you can, essentially, factor out most of the logic to it. This is what allows you to arrive at this simple two-condition check that avoids the gyrations of a full-blown formulation of all conditions.

Your definition of overlap seems to miss the case where one range is entirely contained within another.

   |          range A           |
         | range B       |

To detect that you unfortunately need at least two more tests :-(

Woops - I stand corrected:

type        a||'-'||b   c||'-'||d   a <= d AND c >= b
----------  ----------  ----------  -----------------
Below       1-3         4-8         0
Intersect   2-5         4-8         1
Contained   5-7         4-8         1
Intersect   7-9         4-8         1
Above       9-10        4-8         0
Within      4-8         5-7         1

Leave a comment

About Mark Leighton Fisher

user-pic Perl/CPAN user since 1992.