Today I am enthused to write about the DateTime and SmallDateTime datatypes in SQL Server 2005 (and possibly this also applies to 2008, although that has additional date and time types). I am driven to write this because I have seen a number of issues relating to their use in queries and one in particular that is a real annoyance to me – even if I have to admit that it is completely and utterly pedantic (most of the time).
So, what are DateTime and SmallDateTime? Both a datatypes to store date-and-time data unsurprisingly.
- Stored in 8 bytes;
- First 4 bytes store number of days before or after 1st January 1900;
- Second 4 bytes represent the time of day, measured in three and one-third millisecond units since midnight*;
- Allowed date range is 1st January 1753 to 31st December 9999 [Updated from 31st December 1999 which was an error on my part].
- Stored in 4 bytes;
- First 2 bytes store number of days after 1st January 1900;
- Second 2 bytes represent the time of day, measured in minutes since midnight;
- Allowed date range is 1st January 1900 to 6th June 2079.
Looking Inside the Internal Format of DateTime
The following snippet of SQL (heavily based on code from ‘Inside Microsoft SQL Server 2005: The Storage Engine’ by Kalen Delaney) gives an interesting insight on the internal structure of a DateTime:
DECLARE @today datetime SET @today = getdate() SELECT @today as 'Today' SELECT CONVERT(varbinary(8), @today) as 'DateTime in Hex' SELECT CONVERT(int, SUBSTRING(CONVERT(varbinary(8), @today), 1, 4)) as 'no. days since 1/1/1900' SELECT CONVERT(int, SUBSTRING(CONVERT(varbinary(8), @today), 5, 4)) as '@no. of 3.33ms since midnight'
In most places in plain SQL, a string literal formatted appropriately will happily be converted into a DateTime. You should know by now that a format like ’06/04/09′ will mean different things to different people (principally Americans vs. Europeans) where is may mean ’6th April 2009′ (European) or ‘June 4th 2009′ (American). Therefore, it tends to be safer to use a format that is not subject to such fickle fortune… so I prefer:
[yy]yymmdd[ hh:mi[:ss][.mmm]] e.g. ’20090406 13:42′ (where mmm is milliseconds)
Remembering that the resolution of a DateTime – time variable is three-and-one-third milliseconds, it should be clear that when we display DateTimes as a string, with a command such as CONVERT(VARCHAR, getdate(), 113), I may get a result ’30 Aug 2009 11:03:19:040′, but the milliseconds value may have been rounded. Similarly, for a SmallDateTime rounding is to the nearest minute so you will never see any value in the seconds or milliseconds.
Run these two statements together repeatedly to get a sense for the rounding of the SmallDateTime type:
select convert(varchar, getdate(), 113) select convert(varchar, CAST(getdate() as SMALLDATETIME), 113)
You will see output something like:
30 Aug 2009 11:05:19:037 30 Aug 2009 11:05:00:000 ... and run it again 20 seconds later ... 30 Aug 2009 11:05:39:037 30 Aug 2009 11:06:00:000
Rounding will also happen when converting string representations to their DateTime internal types. For example, what output might you expect from these three statements? (for reference, we have to specifically cast them to DateTimes in a select list, otherwise they will just be output as strings)
select cast('20090830 23:59:59' as DateTime) select cast('20090830 23:59:59.997' as DateTime) select cast('20090830 23:59:59.999' as DateTime)
What do you think?
2009-08-30 23:59:59.000 2009-08-30 23:59:59.997 2009-08-31 00:00:00.000 (note how this has been rounded to the next day)
For these reasons of rounding, if you want to compare dates for a specific day, or days, there is really only one right and safe way, as we shall now see.
Too often I see tests like this:
-- 'Bad' sample, don't do this! WHERE dateCol >= '20090830' AND dateCol <= '20090830 23:59:59'
Hey, what happened to all the stuff that happened during the 60th second of 23:59? You'll never see it, is what. This may be 'off' by just one second but it is still off. I'm sure you wouldn't welcome your bank telling you that your pay did not register as being deposited to your account this month because it came in at 23:59:59.003.
So how about we make sure that we cover that last second:
-- 'Bad' sample, don't do this! WHERE dateCol >= '20090830' AND dateCol <= '20090830 23:59:59.999'
Ooops. As we saw above, that '23:59:59.999' will get rounded up to the next day, so now your query will return too much data (because it will include values from midnight the next day)! You may ask, how about if we coded that time as '23:59:59.997', we have already seen that this does not get rounded - and my answer is that this is an implementation-specific issue that might - albeit 'conceptually' - change.
The simple fix is to do the following comparison:
-- 'Good' sample, DO this! WHERE dateCol >= '20090830' AND dateCol < '20090831'
It is also substantially sub-optimal to do something like this:
-- 'Bad' sample, it's probably better if you don't do this! WHERE CONVERT(VARCHAR, dateCol, 112) = '20090830' -- Format 112 is 'yyyymmdd'
Undoubtedly, it is attractive to do these sorts of comparisons because you only have to type 'dateCol' once and it solves all the issues noted above with that pesky last second of the day. But as I hope to show in my next post, it is bad for reasons of performance tuning (and, I might add, bad because the Sql Server Date formats are unintelligible without a comment).
It is actually misleading to say that the number of 3.33milliseconds since midnight is stored. Though it's kind-of true depending on how you look at it. Let's consider a date like 2nd January 1900. As a float is is represented by 1.0, so 6pm on 2nd January 1900 is 1.75. Now let's think about 30th December 1899. Well, frankly the only way to think about this is in a diagram:
So, perhaps contrary to your guess, the conversion is -1.25. What happened to the 0.75? The summary is, 18:00 or 6pm is 0.75 of a day above and below 0, but it is added to the integer 'day number' mathematically, not by concatenation. So -2 + 0.75 = -1.25
This fits in very nicely with a blog post by Eric Lippert, where he discusses the date format of Excel and some other applications, and some bugs associated with it. It's a fascinating read, but I do recommend you follow his recommended link to understand the background to it.
In contrast to the situation he found himself in, you will notice that that the approach used in Sql Server allows you to add 3 to the lower day number to get the higher day number. This is Good, as they are exactly 72 hours apart in time (whereas the date system Eric discusses, this is not true!).
If you'd like to double-check all this in Sql Server, try the following SQL:
DECLARE @dt datetime DECLARE @dtAsFloat float SET @dt = '19000102 18:00:00' SET @dtAsFloat = CAST(@dt AS FLOAT) SELECT @dtAsFloat as '2nd January 1900, 18:00 as a Float' SELECT FLOOR(@dtAsFloat) as 'Floored Value', CAST(FLOOR(@dtAsFloat) AS DATETIME) as 'Floored and converted back to date' SET @dt = '18991230 18:00:00' SET @dtAsFloat = CAST(@dt AS FLOAT) SELECT @dtAsFloat as '30th Dec 1899, 18:00 as a Float' SELECT FLOOR(@dtAsFloat) as 'Floored Value', CAST(FLOOR(@dtAsFloat) AS DATETIME) as 'Floored and converted back to date'
And the (simplified) results:
2nd January 1900, 18:00 as a Float 1.75 Floored Value Floored and converted back to date 1 1900-01-02 00:00:00.000 30th Dec 1899, 18:00 as a Float -1.25 Floored Value Floored and converted back to date -2 1899-12-30 00:00:00.000