Any DBAs in the hou...
 

  You don't need to be an 'investor' to invest in Singletrack: 6 days left: 95% of target - Find out more

[Closed] Any DBAs in the house

14 Posts
4 Users
0 Reactions
38 Views
Posts: 91000
Free Member
Topic starter
 

Can anyone help me with this query?

We have a table of account numbers which is loded from a flat file daily, more or less. Each time this is done, an entry is created in the load log and a new incremental load ID is assigned. The account table is not cleared out, but with each new load account numbers are associated with the latest load id. I need to check if an account number is in the table for the most recent load. There are two types of data, ABC and XYZ. The load IDs for each data type are different, since they are loaded separately. So I need to check the latest load ID for each data type.

I'm sure this query is horribly inefficient - any way to improve it? Does my description make sense?

SELECT
COUNT(Account_Unique_ID) from Account A, RefData_Load_Log B
WHERE
(A.Account_Unique_ID = "value1" AND
A.LoadId = B.LoadId AND
B.LoadId = (SELECT MAX LoadId FROM RefData_Load_Log WHERE b.data_type = "ABC"))
OR
(A.Account_Unique_ID = "value1" AND
A.LoadId = B.LoadId AND
B.LoadId = (SELECT MAX LoadId FROM RefData_Load_Log WHERE b.data_type = "XYZ"))


 
Posted : 26/03/2014 6:14 pm
 Aidy
Posts: 2941
Free Member
 

Hard to make a call on efficiency, but that I think that query could be written more simply:

SELECT COUNT(Account_Unique_ID)
FROM Account A JOIN RefData_Load_Log B ON A.LoadId = B.LoadID
WHERE A.Account_Unique_ID = 'value_1'
AND B.LoadId IN ( SELECT MAX(LoadId) FROM RefData_Load_Log WHERE B.data_type IN ('ABC', 'XYZ' ) );

Probably need to modify the join type for your data structure, though.


 
Posted : 26/03/2014 6:24 pm
Posts: 91000
Free Member
Topic starter
 

I don't think that's the same, is it? That only checks the latest load ID for either data type. I need to check BOTH loads for both data types, the latest of each one.


 
Posted : 26/03/2014 6:32 pm
 gary
Posts: 534
Full Member
 

A CTE might be a bit more efficient again if using a recent version of SQL server. Presumably you need data type in the result too?

Something like this (untested)

;WITH latestLoadIds (LoadId, data_type) AS
(
SELECT MAX LoadId, data_type FROM RefData_Load_Log
WHERE b.data_type IN ( "ABC", "XYZ" )
GROUP BY data_type
)
SELECT latestLoadIds.data_type, COUNT(Account_Unique_ID)
FROM Account A
JOIN latestLoadIds ON latestLoadIds.loadId = A.LoadId


 
Posted : 26/03/2014 6:41 pm
Posts: 91000
Free Member
Topic starter
 

Ok you've just gone beyond my SQL pay grade 🙂 I'll have to google the WITH thing.

This is DB2 btw.


 
Posted : 26/03/2014 6:46 pm
 Aidy
Posts: 2941
Free Member
 

I don't think that's the same, is it? That only checks the latest load ID for either data type. I need to check BOTH loads for both data types, the latest of each one.

'IN', rather than '='.


 
Posted : 26/03/2014 6:50 pm
Posts: 91000
Free Member
Topic starter
 

So your select max will return two values?


 
Posted : 26/03/2014 6:52 pm
 Aidy
Posts: 2941
Free Member
 

Er, no - good point. Throw a GROUP BY in there.


 
Posted : 26/03/2014 6:54 pm
Posts: 0
Free Member
 

Would like to help but can't make head ner tail of the problem:

There are two tables Account, RefData_Load_Log?

They are joined by LoadId

When you say types of data you mean ??? Which table what...

Why are we counting? This is not Oracle anyway so I'm out.


 
Posted : 26/03/2014 7:00 pm
 gary
Posts: 534
Full Member
 

'IN', rather than '='.

Yes but you will only get one loadId, not one for each type with your query. You can do that if you group by the type.

Not touched DB2 in years, but a quick google suggests it has a similar CTE syntax. Its probably overkill in this case and you could actually rewrite it like this (I rewrote both in terms of tables in a db here and they each give the same results).

There's a reasonable chance it will help the query optimiser do a decent job and it is at least more readable when you come back to it in 12 months 🙂

If you do use the CTE version, it needs the group by statement at the bottom of this query adding to it. Add an additional WHERE clause to filter the accounts as per your initial post.

SELECT latestLoadIds.data_type, COUNT(Account_Unique_ID)
FROM Account A
JOIN
(
SELECT MAX LoadId, data_type FROM RefData_Load_Log
WHERE b.data_type IN ( "ABC", "XYZ" )
GROUP BY data_type
) latestLoadIds ON latestLoadIds.loadId = A.LoadId
GROUP BY latestLoadIds.data_type


 
Posted : 26/03/2014 7:03 pm
Posts: 91000
Free Member
Topic starter
 

When you say types of data you mean ???

ok perhaps 'source' would be a better word. We load data twice - one lot of accounts from ABC and one lot from XYZ.

ABC and XYZ are exclusive btw, so I don't think I need data_type in the select, gary.... do I?

PS don't know what CTE is 🙂


 
Posted : 26/03/2014 7:07 pm
 gary
Posts: 534
Full Member
 

You only need the specific data types if there are other types you are excluding from the query (e.g. "DEF")

CTE = Common Table Expression = the WITH bit. Its just funky syntax for generating a dynamic table ahead of the main query. In this example you can just do it inside the query as per the second example. It comes in quite handy if you need to evaluate hierarchies and then join against those.


 
Posted : 26/03/2014 7:12 pm
Posts: 0
Free Member
 

I'd probably focus on an efficient query for one 'type', nail that then repeat it for the next type with a union between to return two rows.


 
Posted : 26/03/2014 7:18 pm
 gary
Posts: 534
Full Member
 

But in this case the efficiency is probably mostly from working out the 2 load ids in one go (albeit that is me second guessing the optimiser on a DB I don't work with 🙂 ). Once you have that, the rest of the query is trivial and just needs the appropriate indexes on loadId if (as it sounds) the tables are large.


 
Posted : 26/03/2014 7:31 pm
Posts: 91000
Free Member
Topic starter
 

You only need the specific data types if there are other types you are excluding from the query

There are only two data types, but there are lots of datatype/loadid combinations. Tables are large, yes.


 
Posted : 26/03/2014 8:30 pm

6 DAYS LEFT
We are currently at 95% of our target!