I'm all but convinced this is a flat out bug in MS Access's inner join. Which is really scary since it's what I'm supposed to be using for app dev. And no, it is not the intended behavior of an inner join - I've gotten 3 different people (counting myself) to confirm.
So the query is implementing the "greatest-n-of-group" via a max query to get the highest priority, then using that result as a filter back onto the original to retrieve the required non-group fields. (Side question - is "First" on an ordered table really stable enough to use to do it via a single pass?) Said non-grouped field is part of a key on a third table.
A particular configuration of this query drops half the records in a particular subset of my data (that I've zoomed in on). If I switch the joins of the third table to the "max filter" table instead of the base table, suddenly it works correctly. (The fields in question appear on all 3 tables, and are joined from the "base" to the "max filter" already.)
My group by does run over 11 fields. And swapping which copy of the joined fields does change my query execution plan around. But the version without a sort just doesn't seem to work. I have no idea how inner joins and group bys could interperate into an erroneous query plan, but I seem to have managed it.
I know there is no actual sql code [now posted], but it is spread over multiple queries and might require the data I've got as well. Is there a way to upload an actual .mdb file so people can play with the copy producing the bug? I'm new to posting, but I've yet to see a post with a .mdb on it, so I'm not sure how best to get the details needed for this one.
Or, if someone has an idea of what might be going on (unsorted joins have a 10 field limit?) that might cause this I'll take a further look on Monday. I've isolated memory unleaks in C before, so I'd like to think I know a thing or two, but this one is just plain befuddling - I really don't like to think I might have to implement inner join in vba because I can't trust Access (and can't get the green light on better tools).
[Further edit: I'm sole designer of this process, so for once it isn't someone else's code - mine is being the problem all on its own.]
2013-12-30 edit - adding the wall of SQL Wall of SQL:
Bad query returns only 8 of 16 results on the test subset:
SELECT
Allocations.Allocation_ID,
SLJ.Exp_Sign,
SLJ.Proj_ID,
SLJ.Doc_ID,
Allocations.Expense_Catagory,
SLJ.Prod_ID,
SLJ.OrigP_Type,
SLJ.OrigP,
SLJ.Class_ID,
Sum(SLJ.Total*[share]) AS Resposible_Doc_Count,
MP.MaxOfPriority,
SLJ.Priority,
MP.OrigP,
MP.Exp_Sign,
IIf(MP.exp_sign=SLJ.exp_sign,"true","false") AS What
FROM
(MP INNER JOIN SLJ
ON (MP.Exp_Sign = SLJ.Exp_Sign)
AND (MP.E_Delivery = SLJ.E_Delivery)
AND (MP.Expense_Type = SLJ.Expense_Type)
AND (MP.Proj_ID = SLJ.Proj_ID)
AND (MP.Doc_ID = SLJ.Doc_ID)
AND (MP.Doc_Typ_ID = SLJ.Doc_Typ_ID)
AND (MP.Prod_ID = SLJ.Prod_ID)
AND (MP.OrigP_Type = SLJ.OrigP_Type)
AND (MP.Class_ID = SLJ.Class_ID)
AND (MP.OrigP = SLJ.OrigP)
AND (MP.MaxOfPriority = SLJ.Priority)
AND (MP.Invested = SLJ.Invested))
INNER JOIN Allocations
ON (SLJ.Alloc_ID = Allocations.Allocation_ID)
AND (SLJ.Invested = Allocations.Invested)
AND (SLJ.E_Delivery = Allocations.E_Delivery)
AND (SLJ.Expense_Type = Allocations.Expense_Type)
GROUP BY
Allocations.Allocation_ID,
SLJ.Exp_Sign,
SLJ.Proj_ID,
SLJ.Doc_ID,
Allocations.Expense_Catagory,
SLJ.Prod_ID,
SLJ.OrigP_Type,
SLJ.OrigP,
SLJ.Class_ID,
MP.MaxOfPriority,
SLJ.Priority,
MP.OrigP,
MP.Exp_Sign,
IIf(MP.exp_sign=SLJ.exp_sign,"true","false");
Good query returns all 16 results:
SELECT Allocations.Allocation_ID,
SLJ.Exp_Sign,
SLJ.Proj_ID,
SLJ.Doc_ID,
Allocations.Expense_Catagory,
SLJ.Prod_ID,
SLJ.OrigP_Type,
SLJ.OrigP,
SLJ.Class_ID,
Sum(SLJ.Total*[share]) AS Resposible_Doc_Count,
MP.MaxOfPriority,
SLJ.Priority,
MP.OrigP,
MP.Exp_Sign,
IIf(MP.exp_sign=SLJ.exp_sign,"true","false") AS What
FROM
(MP INNER JOIN SLJ
ON (MP.Invested = SLJ.Invested)
AND (MP.MaxOfPriority = SLJ.Priority)
AND (MP.OrigP = SLJ.OrigP)
AND (MP.Class_ID = SLJ.Class_ID)
AND (MP.OrigP_Type = SLJ.OrigP_Type)
AND (MP.Prod_ID = SLJ.Prod_ID)
AND (MP.Doc_Typ_ID = SLJ.Doc_Typ_ID)
AND (MP.Doc_ID = SLJ.Doc_ID)
AND (MP.Proj_ID = SLJ.Proj_ID)
AND (MP.Expense_Type = SLJ.Expense_Type)
AND (MP.E_Delivery = SLJ.E_Delivery)
AND (MP.Exp_Sign = SLJ.Exp_Sign))
INNER JOIN Allocations
ON (Allocations.Expense_Type = MP.Expense_Type)
AND (Allocations.E_Delivery = MP.E_Delivery)
AND (Allocations.Invested = MP.Invested)
AND (SLJ.Alloc_ID = Allocations.Allocation_ID)
GROUP BY Allocations.Allocation_ID,
SLJ.Exp_Sign,
SLJ.Proj_ID,
SLJ.Doc_ID,
Allocations.Expense_Catagory,
SLJ.Prod_ID,
SLJ.OrigP_Type,
SLJ.OrigP,
SLJ.Class_ID,
MP.MaxOfPriority,
SLJ.Priority,
MP.OrigP,
MP.Exp_Sign,
IIf(MP.exp_sign=SLJ.exp_sign,"true","false");
Difference in query designer - Joined "Allocations" against MP instead of SLJ for the three fields that exist on both... allocation_id is only on SLJ. (Max_Priority and Stats_Loose_Join, respectively.)
Underlying queries/tables:
MP:
SELECT
Max(SLJ.Priority) AS MaxOfPriority,
SLJ.Exp_Sign,
SLJ.Invested,
SLJ.E_Delivery,
SLJ.Expense_Type,
SLJ.Proj_ID,
SLJ.Doc_ID,
SLJ.Doc_Typ_ID,
SLJ.Prod_ID,
SLJ.OrigP_Type,
SLJ.OrigP,
SLJ.Class_ID
FROM SLJ
GROUP BY
SLJ.Exp_Sign,
SLJ.Invested,
SLJ.E_Delivery,
SLJ.Expense_Type,
SLJ.Proj_ID,
SLJ.Doc_ID,
SLJ.Doc_Typ_ID,
SLJ.Prod_ID,
SLJ.OrigP_Type,
SLJ.OrigP,
SLJ.Class_ID;
SLJ - filtered (via where) to a test set of 4 records:
SELECT
KS.Alloc_ID,
KS.Priority,
KS.Exp_Sign,
StatsT.*
FROM
KS,
StatsT
WHERE
(
(
(KS.Proj_ID) Is Null
Or (KS.Proj_ID)=[StatsT].[proj_id]
)
AND (
(KS.Doc_ID) Is Null
Or (KS.Doc_ID)=[StatsT].[doc_id]
)
AND (
(KS.Doc_Typ_ID) Is Null
Or (KS.Doc_Typ_ID)=[StatsT].[doc_typ_id]
)
AND (
(KS.Prod_ID) Is Null
Or (KS.Prod_ID)=[StatsT].[prod_id]
)
AND (
(KS.OrigP_Type) Is Null
Or (KS.OrigP_Type)=[StatsT].[OrigP_Type]
)
AND (
(KS.OrigP) Is Null
Or (KS.OrigP)=[StatsT].[OrigP]
)
AND (
(KS.Class_ID) Is Null
Or (KS.Class_ID)=[StatsT].[class_id]
)
AND ((StatsT.Doc_ID)=10437)
AND ((StatsT.Prod_ID)=104));
SLJ results:
Alloc_ID Priority Exp_Sign Invested E_Delivery Expense_Type Proj_ID Doc_ID Doc_Typ_ID Prod_ID OrigP_Type OrigP Class_ID Total
1 10 Gross Y N Inforce 2013_199 10437 5 104 Fund_Doc 40 -1 1
1 10 Gross Y N Inforce 2013_199 10437 5 104 Fund_Doc 43 -1 1
-1 10 Reimb Y N Inforce 2013_199 10437 5 104 Fund_Doc 40 -1 1
-1 10 Reimb Y N Inforce 2013_199 10437 5 104 Fund_Doc 43 -1 1
StatsT:
SELECT
E2_In__Bound_Stats__Long.Invested,
E2_In__Bound_Stats__Long.E_Delivery,
E2_In__Bound_Stats__Long.Expense_Type,
E2_In__Bound_Stats__Long.Proj_ID,
E2_In__Bound_Stats__Long.Doc_ID,
E2_In__Docs_List__Doc.Doc_Typ_ID,
E2_In__Bound_Stats__Long.Prod_ID,
E2_In__Bound_Stats__Long.OrigP_Type,
E2_In__Bound_Stats__Long.OrigP,
E2_In__Bound_Stats__Long.Class_ID,
E2_In__Bound_Stats__Long.Total
FROM
E2_In__Bound_Stats__Long
INNER JOIN
E2_In__Docs_List__Doc
ON
E2_In__Bound_Stats__Long.Doc_ID = E2_In__Docs_List__Doc.Doc_ID;
KS table (Kitchen_Sink) - 119 Rows - null entries prevent a primary key:
Alloc_ID Priority Exp_Sign Proj_ID Doc_ID Doc_Typ_ID Prod_ID OrigP_Type OrigP Class_ID
Docs_List table - 12465 rows - Primary Key is Doc_ID:
Doc_ID Doc_Typ_ID
Bound_Stats table - 1289 rows (or 100k or so), only 4 survive filter on SLJ - Primary Key originally not set, should be everything except Total, setting did not remove bug:
Invested E_Delivery Expense_Type Proj_ID Doc_ID Prod_ID OrigP_Type OrigP Class_ID Total
Access query plans:
Bad plan:
--- Matrix_Math_AAA ---
Inputs to Query
Table 'KS'
Table 'E2_In__Bound_Stats__Long'
Table 'E2_In__Docs_List__Doc'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 12465 entries, 22 pages, 12465 values
which has 1 column, fixed, unique, primary-key, no-nulls
Doc_Type_ID1 12465 entries, 15 pages, 14 values
which has 1 column, fixed
Doc_ID1 12465 entries, 22 pages, 12465 values
which has 1 column, fixed
Table 'KS'
Table 'E2_In__Bound_Stats__Long'
Table 'E2_In__Docs_List__Doc'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 12465 entries, 22 pages, 12465 values
which has 1 column, fixed, unique, primary-key, no-nulls
Doc_Type_ID1 12465 entries, 15 pages, 14 values
which has 1 column, fixed
Doc_ID1 12465 entries, 22 pages, 12465 values
which has 1 column, fixed
Table 'Allocations'
End inputs to Query
- Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
- Inner Join result of '01)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID"
- Sort table 'Allocations'
- Inner Join result of '02)' to result of '03)' using temporary index join expression "SLJ.Expense_Type=Allocations.Expense_Type And SLJ.E_Delivery=Allocations.E_Delivery And SLJ.Invested=Allocations.Invested"
- Sort result of '04)'
- Inner Join table 'KS' to result of '05)' using temporary index join expression "SLJ.Alloc_ID=Allocations.Allocation_ID" then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id) And ((KS.Doc_I"
- Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
- Inner Join result of '07)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID" store result in temporary table
- Inner Join table 'KS' to result of '08)' using X-Prod join then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id) And ((KS.Doc_I"
- Group result of '09)'
- Inner Join result of '06)' to result of '10)' using temporary index join expression "SLJ.Exp_Sign=MP.Exp_Sign And SLJ.Invested=MP.Invested And SLJ.E_Delivery=MP.E_Delivery And SLJ.Expense_Type=MP.Expense_Type And SLJ.Proj_ID=MP.Proj_ID And SLJ.Doc_ID=MP.Doc_ID And SLJ.Doc_Typ_ID=MP.Doc_Typ_ID And SLJ.Prod_ID=MP.Prod_ID And SLJ.OrigP_Type=MP.OrigP_Type" then test expression "MP.Class_ID=SLJ.Class_ID And (MP.OrigP=SLJ.OrigP And MP.MaxOfPriority=SLJ.Priority)"
- Group result of '11)'
Good Plan:
--- Matrix_Math_AAA_Good ---
Inputs to Query
Table 'KS'
Table 'E2_In__Bound_Stats__Long'
Table 'E2_In__Docs_List__Doc'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 12465 entries, 22 pages, 12465 values
which has 1 column, fixed, unique, primary-key, no-nulls
Doc_Type_ID1 12465 entries, 15 pages, 14 values
which has 1 column, fixed
Doc_ID1 12465 entries, 22 pages, 12465 values
which has 1 column, fixed
Table 'KS'
Table 'E2_In__Bound_Stats__Long'
Table 'E2_In__Docs_List__Doc'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 12465 entries, 22 pages, 12465 values
which has 1 column, fixed, unique, primary-key, no-nulls
Doc_Type_ID1 12465 entries, 15 pages, 14 values
which has 1 column, fixed
Doc_ID1 12465 entries, 22 pages, 12465 values
which has 1 column, fixed
Table 'Allocations'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 312 entries, 4 pages, 312 values
which has 5 columns, fixed, unique, primary-key, no-nulls
Allocation_ID 312 entries, 1 page, 6 values
which has 1 column, fixed
End inputs to Query
- Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
- Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
- Inner Join result of '02)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID" store result in temporary table
- Inner Join table 'KS' to result of '03)' using X-Prod join then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id) And ((KS.Doc_I"
- Group result of '04)'
- Inner Join table 'KS' to result of '05)' using temporary index join expression "SLJ.Exp_Sign=MP.Exp_Sign" then test expression "MP.MaxOfPriority=SLJ.Priority"
- Sort result of '06)'
- Inner Join result of '01)' to result of '07)' using temporary index join expression "SLJ.E_Delivery=MP.E_Delivery And SLJ.Expense_Type=MP.Expense_Type And SLJ.Proj_ID=MP.Proj_ID And SLJ.Doc_ID=MP.Doc_ID And SLJ.Prod_ID=MP.Prod_ID And SLJ.OrigP_Type=MP.OrigP_Type And SLJ.Class_ID=MP.Class_ID And SLJ.OrigP=MP.OrigP And SLJ.Invested=MP.Invested" then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_ID Is Null Or KS.Doc_ID=StatsT.doc_id) And (KS.Proj_ID Is Null Or"
- Inner Join result of '08)' to table 'Allocations' using index 'Allocations!PrimaryKey' join expression "SLJ.Alloc_ID=Allocations.Allocation_ID And MP.Invested=Allocations.Invested And MP.E_Delivery=Allocations.E_Delivery And MP.Expense_Type=Allocations.Expense_Type"
- Inner Join result of '09)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID" then test expression "MP.Doc_Typ_ID=SLJ.Doc_Typ_ID And (KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id)"
- Group result of '10)'
End 2013-12-30 edit