Monday, October 3, 2011

MySQL Bug #8523 - A bug we all would love to see fixed

In this article, I am going to explain with an example, a well known and long unresolved bug in MySQL (as of today): 


MySQL does not retain fractional second precision in TIME, DATETIME, and TIMESTAMP fields.


If you are not patient enough to read the entire article just skip to the video at the bottom of this page.  That should summarize what I am going to explain here.


Example situation:


You support a popular Online Shopping Website which uses MySQL database.  There is a special offer where the first customer who orders will get a 50% off.  The offer opened today exactly at 09:00 AM.


You are notified immediately that there were 5 customers who were the first to order and all of them got the discount.  Your client tells you that it is unlikely that the orders came in at the same instant and they ask you to investigate.  


Root cause analysis:


When you manually run the query to get the row with the lowest TIMESTAMP value, 5 rows are retrieved.  This means the problem is not with the script.  


Query: 


SELECT * FROM prod_db.orders
WHERE order_ts = (SELECT MIN(order_ts) FROM prod_db.orders)
AND order_type = "offer";


Result:



+----------+-------------+------------+---------------------+
| order_ID | customer_ID | order_type | order_TS            |
+----------+-------------+------------+---------------------+
|        1 |           5 | offer      | 2011-10-03 09:00:00 |
|        2 |           9 | offer      | 2011-10-03 09:00:00 |
|        3 |           2 | offer      | 2011-10-03 09:00:00 |
|        4 |           7 | offer      | 2011-10-03 09:00:00 |
|        5 |           4 | offer      | 2011-10-03 09:00:00 |
+----------+-------------+------------+---------------------+
5 rows in set (0.04 sec)



Okay, the TIMESTAMP values are the same till the last second.  But there is a very slim chance of TIMESTAMP values being the same at microsecond level.  So you run another query.


Query:


SELECT order_ID, MICROSECOND(order_TS) FROM prod_db.orders;


Result:



+----------+-----------------------+
| order_ID | MICROSECOND(order_TS) |
+----------+-----------------------+
|        1 |                     0 |
|        2 |                     0 |
|        3 |                     0 |
|        4 |                     0 |
|        5 |                     0 |
+----------+-----------------------+
5 rows in set (0.00 sec)



This is interesting.   It looks like all the 5 orders were received at 09:00:00.000000.  But that is very unlikely.


So you decide to run a test in your test database.


Query:


INSERT INTO test_db.orders 
(order_ID, customer_ID, order_type, order_TS) VALUES 
(1, 9, 'offer', '2011-10-03 09:00:00.123456'),
(2, 4, 'offer', '2011-10-03 09:00:00.123453');


Result:



Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0



Below is going to be the last query you will run before you discover something big:


Query:


SELECT order_ID, MICROSECOND(order_TS) FROM test_db.orders;


Result:



+----------+-----------------------+
| order_ID | MICROSECOND(order_TS) |
+----------+-----------------------+
|        1 |                     0 |
|        2 |                     0 |
+----------+-----------------------+
2 rows in set (0.01 sec)



What?  MySQL does not retain fractional second precision?  


Yes, you got it right.


For more information and current status on this bug see:
http://bugs.mysql.com/bug.php?id=8523


Don't forget to view the video below.  Hitler also seems to have a similar situation.  I just wanted to get my point across in a funny way.


Please press the CC button on the lower right to turn on Captions.




2 comments:

  1. This is not a bug. Adding fractional precision into DATETIME, TIME, or TIMESTAMP would require an increase in the number of bytes needed to store that value per record. This will bloat your table, indexing, and even waste storage. This makes a big difference when you're working with millions/billions of records. It even makes a difference when working with merely thousands of records when you need every last drop of performance you can get your hands on.

    Create a new datatype that supports fractional precision? Perhaps. But most would never use it.

    More importantly, you should never rely on a timestamp for contesting. There's too many variables that will trip you up. Even with fractional precision it would still be possible have multiple records created with the exact same time stamp. One very easy way to solve your problem is to look at the lowest value in the auto-incremented primary key, based on your date range. Since you have a unique index, you won't have collision. Another option is to determine the winner at run time rather than calculate later. To do that you'll need to create your own mutex, which MySQL will let you do via GET_LOCK(). See http://dev.mysql.com/doc/refman/5.5/en/miscellaneous-functions.html. This of course becomes far more challenging if you're running replication depending on the topology you've chosen.

    ReplyDelete