You don't need to be an 'investor' to invest in Singletrack: 6 days left: 95% of target - Find out more
In work I've got a SQL view that joins together about 20 tables, and aside from renaming a few variables doesn't appear to do very much else. If I run a query that only requires variables in, say, 5 of those tables is the sever smart enough to only join those 5 together, or does it still plough ahead and join all 20 before giving me my subset?
This is Microsoft SQL server 2008 I believe. The particular operation I'm doing is very slow, and I'm wondering if spending time bypassing the view and querying the base tables is likely to be worthwhile.
I don't know the answer (can guess but I won't) but if you use 'Query Analyzer' it will tell you for sure; It's in the Tools option of the management studio but probably has it's own shortcut somewhere too.
you may only be bringing back fields from a few tables but by joining in the other tables in to the view, you may be excluding records, so it should be using all the tables, otherwise it could affect the result set. Depends what information you're after as to whether you create you're own query with just the 5 tables.
I would expect all of the tables to be joined.
General tips to improve speed are:
Use inner joins where possible
Index on joined fields if possible / sensible
Use temporary tables or staging tables if really big data are involved
It's as Jon1973 says. The join to the other tables could also be restricting the data set for you even though you aren't looking for any of the fields. You might be able to recreate it using less tables but you really need to understand the joins first 🙁
it really depends how those tables are joined. If you had 10 tables all joined on the same fields in each in a line and wanted results from the first and last tables, the query would fetch data from all 10 tables and also exclude any records from the result if a matching record didn't exist in any.of the 10 tables.
Hints and answers here: http://use-the-index-luke.com/sql/table-of-contents
Well worth a read.
As to the actual question: no idea. fubar's suggestion of using the query analyzer is a great one, it should tell you.
do you need to run the query live, i got round a very slow query by creating a view using using the query and an overnight run. The other thing, do you need all the fields being brought back? is there anything you can dump?
Thanks for the responses so far.
Query analyser sounds like a useful tool that I wasn't aware of, so I'll need to give it a try. In practice I'm executing this query from SAS, sent to the server via an ODBC connection, but I can experiment in Management Studio first
The view I'm talking about only contains left joins and outer joins, so it shouldn't be removing any records that I wouldn't want returned (in theory it could be adding rows of nulls for mismatches, but I'd just throw these away later anyway).
In reality I'm only going to be running the query about 5 times across the next 2-3 weeks and then probably never again, it'd just speed up my work a bit if I could get some results in under 9 hours each time I need to re-run!
Do you know what columns are indexed? Using and filtering off them even if you don't need them should speed things up. I suppose if you are using a view either you do not have access to the source or don't understand it enough to just write a query that gets the fields you want.
9 hours?! That must be a lot of data, a very slow server or some badly designed indexes.
In management studio, try right clicking and "Display execution plan" to see what its doing/which bit takes the most time.
Funnily enough I was just having a look at how you get data in to SAS and is seems on the face of it like a lot work client side. Given you quote 9 hours do you think that might be the bottle neck? Certainly sounds like a network issue.
Could it be worth running the query as a make table on the SQL server, timing it, and pulling in the results of that potentially filtered at source into SAS.
It's been a while since I properly used SQL Server, or SAS for that matter. Anything joining 20 tables says to me have a rethink and process your data differently, but then it really does depend on the specifics. I usually cringe when looking at joining more than three.
It sounds like it's important to identify the bottleneck. A properly indexed database should give you results in a very short period of time. A query analyser will help with this. To answer your original question, most databases have a query preprocessor which is generally good at using the right indexes. But it can also be affected by choices you make. If you start using text indexes or functions, for example, you can upset everything. MySQL has some documentation on this iirc.
There is one other thing. If you are using ODBC the indexes will often be processed inefficiently. Your best bet may be to grab the data out of ODBC as soon as you can and work with it independently.