Seven Deadly SQL Sins Part III
68Very Basic MySQL Tutorial
Introduction
continued from part two
This is the third in a series of four excerpts from the MySQL presentation I have given live on a number of occasions. It was inspired Arjen Lentz' outline entitled The Deadly Sins of MySQL. Parts one and two dealt with some pretty elementary errors. Today I have some slightly more advanced problems for you.
If you feel you need some more practice with the basics before continuing, check out the video on moving around in databases and creating tables.
Sin #7 What gets replaced?
CREATE TABLE tab (id INT PRIMARY KEY, name CHAR(40), address CHAR(40));
INSERT tabVALUES(1,'Colleen','Hollywood');
Oops, we meant Tony, not Colleen. (Colleen can't afford to live in Hollywood.) Let's do a replace. Replace is so much more clever than testing to see if the key exists, doing an update if it does and an insert if it doesn't in the application. SQL does that for us in the REPLACE logic so the application doesn't have to. Good idea!
REPLACE tab(id,name) VALUES(1,'Tony');
That's gonna replace "Colleen" with "Tony", right?
SELECT * from tab;
+----+---------+---------+| id | name | address |+----|---------+---------+| 1 | Tony | NULL | <=Oops!+----+---------+---------+
It deletes, then replaces the whole row. Since you didn't give a value for address, it defaults to NULL. Luckily there is a way to solve this that will also demonstrate prepared statements.
Let's say we want to log pagehits. The unique key is constructed from the page id and the user id. So the first time user 4 hits page 1 we want to do an insert and set the hitcount to 1. If the record already exists we want to do an update, updating hitcount with hitcount+1.
CREATE TABLE pagehitlog(pageid INT UNSIGNED NOT NULL, pagename VARCHAR(20), userid INT UNSIGNED NOT NULL, hitcount INT UNSIGNED NOT NULL, PRIMARY KEY(pageid,userid));
PREPARE bumphit FROM 'INSERT pagehitlog(pageid,pagename,userid,hitcount) VALUES (?,?,?,?)ON DUPLICATE KEY UPDATE hitcount=hitcount+1; ';
When you know you are going to do a query several times you can prepare it -- kind of a server variable. It saves bandwidth of repeatedly sending the query to the server, but more important, it saves the server the bother of repeatedly setting up the same query over and over. Note that this is an INSERT with a special case for duplicate key.. a kind of prebuilt trigger, if you will. It contains instructions for what to update in the case of a duplicate key, but will not wipe out the other data in the record. Where the question marks are, you stuff those values into the prepared query as follows:
SET @p1=1;SET @pn='Foo';SET @ui=4;SET @hc=1;
EXECUTE bumphit USING @p1,@pn,@ui,@hc;
Normally you would prepare the query and stuff application values with whatever API you are using to interface with prepared statements and not feed them directly to the raw client as shown here. I'll leave that as a search exercise for the reader. Let's execute the prepared query a few more times and see what we get.
EXECUTE bumphit USING @p1,@pn,@ui,@hc;EXECUTE bumphit USING @p1,@pn,@ui,@hc;EXECUTE bumphit USING @p1,@pn,@ui,@hc;SELECT * from pagehitlog;+--------+-----------+--------+----------+ | pageid | pagename | userid | hitcount | +--------+-----------+--------+----------+ | 1 | Foo | 4 | 4 | +--------+-----------+--------+----------+
See that user #4 hit the Foo page 4 times, even though you were feeding a 1 to hitcount? Poof, problem solved.
Sin #8 Too much math
Let's assume you have a table with a field called d, and d is a DATETIME or a DATE. You would like to get all the rows with dates in May of the year 2004. At first the query below seems like a reasonable way to go about it:
SELECT ... WHERE MONTH(d) = 5 AND YEAR(d) = 2004;
This works fine until your table starts getting sizeable. After the table grows to be a large table, you start to see sluggish response. What is the problem? You have required it to do a math function on every d in the table not once, but twice (once for the month and once for the year) so you can compare the results to constants. Better would be:
SELECT ...WHERE d BETWEEN '2004-05-01' AND '2004-05-31';
which works for a DATE, or
SELECT ...WHERE d BETWEEN
'2004-05-01 00:00:00' AND '2004-05-31 23:59:59';
which takes into account the time as well on a DATETIME.
Here you set up the range as a constant. So you are not doing two date calculations PLUS a compare on every d.
Sin # 9 Expecting it to just "know " what you want
Let's say we have a stock table with item numbers. We have each dealer who sells each item and we track his/her price. So we'd like to find the dealer who has the lowest price for each item.
INSERT INTO shop
(article,dealer,price) VALUES
(1,'Foo',20), (1,'Bar',10), (2,'Foo',30);
SELECT article,dealer,MIN(price)FROM shop GROUP BY article;+---------+--------+------------+ | article | dealer | min(price) |+---------+--------+------------+ | 1 | Foo | 10 | What the? | 2 | Foo | 30 | +---------+--------+------------+
How does this happen? Bar is selling Article 1 for 10, not Foo! The RDBMS finds the min price for each group of records but when we GROUP BY, the dealer will just be the first dealer that happened to match, not necessarily the one that goes with the minimal price. I've fallen for this thinking trap myself--it's easy to do. With subselects, we can tell it much more clearly what we REALLY mean, as follows:
SELECT article,dealer,price FROM shop t1
WHERE price =
(SELECT MIN(t2.price) FROM shop t2
WHERE t1.article = t2.article);
+---------+--------+------------+| article | dealer | min(price) |+---------+--------+------------+| 1 | Bar | 10 | | 2 | Foo | 30 |+---------+--------+------------+
That's more like it. Notice we don't need the GROUP BY any more because now we are only selecting records where the price matches the pre-calculated min. Notice also we are using aliases for the different instances of the table to disambiguate which one we mean.
Conclusion
That's it for today, sinner. These are advanced sins, only the truly wicked can commit them. If you have not yet attained that capacity for wickedness and can't follow these sins, you might consider reading the first two articles in this series. Even if you are sufficiently wicked, my goal in this article was to make you scratch your head a bit before saying "Aha!" The "Aha!" is always more satisfying if you had to scratch first, is it not? Otherwise it's too much like a "Duh!" Please check back soon. The next article of this series should make you feel righteous indeed.
About the author...
Colleen Dick has been a database techie, php developer, security analyst, and social networker for a long time. She has single-handedly developed a dynamic joke site, a IT certification practice exam repository, social network for interactive video gamers, a community calendar, and an online ticket sales ecommerce site. When she comes off her extended break from programming she might be looking into ning and dojo. Thank you for reading; If you found this article interesting, useful, or cool, please reward me with a a stumble, digg, or whatever you do to articles you like. If you didn't find it any of the above, please leave a comment letting me know how I could improve it and earn your seal of approval.
PrintShare it! — Rate it: up down flag this hub










Elena. says:
11 months ago
I'm shaking my head and laughing at sin #9 !! :-)
I'm not really a programmer myself, but I deal with enough of them on a daily basis to have been delighted with the sentence in Part II, and to be about to fall off the chair with laughter at sin #9, 'cause it actually reminds me of sin #10, which isn't really an SQL sin, it's just a very plain human sin! Here it is:
Expecting non programmers to understand what you are talking about!!
Ay ay ay, what a laugh! Considering the nature of this series, all techie and SQLish, I think a laugh deserves a big, HUGE, thumbs up!