Monday, May 17, 2010

Sql Chat: Avoid the 'OR'

Sql.  Structured Query Language.  The joy and bane of so many developers' lives.  It can make or break you in the development game, and have you up at three in the morning saying "but WHY am I getting duplicates?!"



I have friends who try very hard to avoid using it.  They are the products of modern programming.  Data Entities that automatically update your data when you call a save method.  Code layers that remove the complexity of data from the developer.  Blah.  Blah.  Blah.

This is really nice-sounding and lots of guys I know swear they don't need to know Sql.  Then their boss wants a report, tabulated and summed and pivoted by all kinds of crazy things.  Entity objects suddenly become Satan's little minions.  If the only tool you have is a hammer, soldering that tiny wire back into place is gonna be one nasty job.



If you're using a database in your application, and you don't know Sql, here's some free advice:  Man up, get a book (or look online) and learn it.  You'll never regret it.  It always is a bonus knowing at least a little Sql.  It impresses database admins, and who doesn't want to impress them, right?  Plus you can lord it over your other geek friends who DON'T know it.


Over the coming weeks and months, I'll be writing a regular blog on Sql.  This will NOT be an introduction to Sql.  If you want an intro to Sql, here you go:
http://www.w3schools.com/sql/sql_intro.asp

What I'll be focusing on is some of those tricks and traps in Sql which will set you apart from the average Sql nerd.  


************************************************************************************************************


With that said, let's get stuck in.  Today, we'll be talking about the  OR statement, and why you should avoid this like the damned plague.


At first glance, an OR statement would seem like a reasonably good idea. 




select * from ObjectTable where ObjectId = 1234 or ObjectId = 2345




Get one OR the other, right?  Sure.  Good idea in simple scenarios such as this one, and fine if you are dealing with primary keys.  Nice and clean.  However, let's go a little further with this example:


select * from ObjectTable where ObjectTypeCode = 'Code1' or  ObjectTypeCode = 'Code2' 


Now we aren't talking about a primary key any more.  We're talking about a string hanging off the side which is a property.  Now the database is scanning to find which records have ObjectTypeCode of 'Code1' OR have 'Code2'.  This is a lot of scanning.  Your db server is saying:


"This one?  No.  This one?  Yes.  This one?  No.  This one?  Yes"


This is time consuming, and at all costs we wish to consume less time on the db.  Less time means less CPU being used, and less CPU means...well less CPU.  Yes, indexing can (and will help), but indexing for indexing sake can actually end up biting you in the ass.  Too many indexes is worse than too few, as the system wastes ridiculous amounts of time updating all these indexes that are only there to help you out in those rare scenarios where you want to pull off an OR statement.




And for all you smart-asses out there, don't think this trick will help you either:


select * from ObjectTable where ObjectTypeCode in ('Code1','Code2')


This is basically the same thing, just in prettier shorthand.  Trust me.  I tried it.


So how do we avoid this nasty OR statement (if we can help it)?  Obviously, there are times we NEED an OR statement.   What are our alternatives.


Remember that SQL is a set-based language.  When you make a query in Sql, we are  trying to group together things that meet the criteria in your joins and where clause.  The tighter this joining is, the less time your  database server will use trying to find it.  So here are some good rules:


1. If you query for these values OFTEN, maybe you need to rethink your data structure.  Rather than this:




You might consider this:
























Note the column 'GroupingCode'.  You can group your ObjectTypeCode values using the GroupingCode as in:







ObjectTypeCode GroupingCode
Code1 Group1
Code2 Group1
Code3 Group2
Code4 Group2
Code5 Group1






Now you have a usable set which can grab records quickly.


select * from ObjectTable ecot
join ecObjectType ecott on ecot.ObjectTypeId = ecott.ObjectTypeId
 where ecott.GroupingCode = 'Group1'




This is going to be far faster, because we are joining on a single key, not multiples that require 2x the table searching.




2. If you can, try the reverse.  For example, if you have 3 values in ObjectType, our two above and another one called 'Bleck', try this instead:


select * from ObjectTable where ObjectTypeCode <> 'Bleck'




3. Like this?
select * from ObjectTable where ObjectTypeCode like 'Counter%'


This doesn't work in all flavors of Sql, but in transact-sql it can be lightning fast, particularly if you full-text index the column.



That's it!  Hope it helps.  Remember, dodge that OR.  There is usually a solution around it.






No comments:

Post a Comment