0
votes

I'm all but convinced this is a flat out bug in MS Access's inner join. Which is really scary since it's what I'm supposed to be using for app dev. And no, it is not the intended behavior of an inner join - I've gotten 3 different people (counting myself) to confirm.

So the query is implementing the "greatest-n-of-group" via a max query to get the highest priority, then using that result as a filter back onto the original to retrieve the required non-group fields. (Side question - is "First" on an ordered table really stable enough to use to do it via a single pass?) Said non-grouped field is part of a key on a third table.

A particular configuration of this query drops half the records in a particular subset of my data (that I've zoomed in on). If I switch the joins of the third table to the "max filter" table instead of the base table, suddenly it works correctly. (The fields in question appear on all 3 tables, and are joined from the "base" to the "max filter" already.)

My group by does run over 11 fields. And swapping which copy of the joined fields does change my query execution plan around. But the version without a sort just doesn't seem to work. I have no idea how inner joins and group bys could interperate into an erroneous query plan, but I seem to have managed it.

I know there is no actual sql code [now posted], but it is spread over multiple queries and might require the data I've got as well. Is there a way to upload an actual .mdb file so people can play with the copy producing the bug? I'm new to posting, but I've yet to see a post with a .mdb on it, so I'm not sure how best to get the details needed for this one.

Or, if someone has an idea of what might be going on (unsorted joins have a 10 field limit?) that might cause this I'll take a further look on Monday. I've isolated memory unleaks in C before, so I'd like to think I know a thing or two, but this one is just plain befuddling - I really don't like to think I might have to implement inner join in vba because I can't trust Access (and can't get the green light on better tools).

[Further edit: I'm sole designer of this process, so for once it isn't someone else's code - mine is being the problem all on its own.]

2013-12-30 edit - adding the wall of SQL Wall of SQL:

Bad query returns only 8 of 16 results on the test subset:

SELECT 
    Allocations.Allocation_ID, 
    SLJ.Exp_Sign, 
    SLJ.Proj_ID,  
    SLJ.Doc_ID,  
    Allocations.Expense_Catagory,  
    SLJ.Prod_ID,  
    SLJ.OrigP_Type,  
    SLJ.OrigP,  
    SLJ.Class_ID,  
    Sum(SLJ.Total*[share]) AS Resposible_Doc_Count,  
    MP.MaxOfPriority,  
    SLJ.Priority,  
    MP.OrigP,  
    MP.Exp_Sign,  
    IIf(MP.exp_sign=SLJ.exp_sign,"true","false") AS What  
FROM  
    (MP INNER JOIN SLJ  
        ON (MP.Exp_Sign = SLJ.Exp_Sign)  
        AND (MP.E_Delivery = SLJ.E_Delivery)  
        AND (MP.Expense_Type = SLJ.Expense_Type)  
        AND (MP.Proj_ID = SLJ.Proj_ID)  
        AND (MP.Doc_ID = SLJ.Doc_ID)  
        AND (MP.Doc_Typ_ID = SLJ.Doc_Typ_ID)  
        AND (MP.Prod_ID = SLJ.Prod_ID)  
        AND (MP.OrigP_Type = SLJ.OrigP_Type)  
        AND (MP.Class_ID = SLJ.Class_ID)  
        AND (MP.OrigP = SLJ.OrigP)  
        AND (MP.MaxOfPriority = SLJ.Priority)  
        AND (MP.Invested = SLJ.Invested))  
    INNER JOIN Allocations  
        ON (SLJ.Alloc_ID = Allocations.Allocation_ID)   
        AND (SLJ.Invested = Allocations.Invested)   
        AND (SLJ.E_Delivery = Allocations.E_Delivery)   
        AND (SLJ.Expense_Type = Allocations.Expense_Type)  
GROUP BY   
    Allocations.Allocation_ID,    
    SLJ.Exp_Sign,    
    SLJ.Proj_ID,    
    SLJ.Doc_ID,    
    Allocations.Expense_Catagory,    
    SLJ.Prod_ID,    
    SLJ.OrigP_Type,    
    SLJ.OrigP,    
    SLJ.Class_ID,    
    MP.MaxOfPriority,    
    SLJ.Priority,    
    MP.OrigP,    
    MP.Exp_Sign,    
    IIf(MP.exp_sign=SLJ.exp_sign,"true","false");

Good query returns all 16 results:

SELECT Allocations.Allocation_ID, 
    SLJ.Exp_Sign,  
    SLJ.Proj_ID,  
    SLJ.Doc_ID,  
    Allocations.Expense_Catagory,  
    SLJ.Prod_ID,  
    SLJ.OrigP_Type,  
    SLJ.OrigP,  
    SLJ.Class_ID,  
    Sum(SLJ.Total*[share]) AS Resposible_Doc_Count,  
    MP.MaxOfPriority,  
    SLJ.Priority,  
    MP.OrigP,  
    MP.Exp_Sign,  
    IIf(MP.exp_sign=SLJ.exp_sign,"true","false") AS What  
FROM  
    (MP INNER JOIN SLJ  
        ON (MP.Invested = SLJ.Invested)   
        AND (MP.MaxOfPriority = SLJ.Priority)    
        AND (MP.OrigP = SLJ.OrigP)    
        AND (MP.Class_ID = SLJ.Class_ID)    
        AND (MP.OrigP_Type = SLJ.OrigP_Type)    
        AND (MP.Prod_ID = SLJ.Prod_ID)    
        AND (MP.Doc_Typ_ID = SLJ.Doc_Typ_ID)    
        AND (MP.Doc_ID = SLJ.Doc_ID)    
        AND (MP.Proj_ID = SLJ.Proj_ID)    
        AND (MP.Expense_Type = SLJ.Expense_Type)    
        AND (MP.E_Delivery = SLJ.E_Delivery)    
        AND (MP.Exp_Sign = SLJ.Exp_Sign))    
    INNER JOIN Allocations    
        ON (Allocations.Expense_Type = MP.Expense_Type)    
        AND (Allocations.E_Delivery = MP.E_Delivery)    
        AND (Allocations.Invested = MP.Invested)    
        AND (SLJ.Alloc_ID = Allocations.Allocation_ID)  
GROUP BY Allocations.Allocation_ID,    
    SLJ.Exp_Sign,     
    SLJ.Proj_ID,     
    SLJ.Doc_ID,     
    Allocations.Expense_Catagory,     
    SLJ.Prod_ID,     
    SLJ.OrigP_Type,     
    SLJ.OrigP,     
    SLJ.Class_ID,     
    MP.MaxOfPriority,     
    SLJ.Priority,     
    MP.OrigP,     
    MP.Exp_Sign,     
    IIf(MP.exp_sign=SLJ.exp_sign,"true","false");

Difference in query designer - Joined "Allocations" against MP instead of SLJ for the three fields that exist on both... allocation_id is only on SLJ. (Max_Priority and Stats_Loose_Join, respectively.)

Underlying queries/tables:

MP:

SELECT 
        Max(SLJ.Priority) AS MaxOfPriority,  
        SLJ.Exp_Sign,  
        SLJ.Invested,  
        SLJ.E_Delivery,  
        SLJ.Expense_Type,  
        SLJ.Proj_ID,  
        SLJ.Doc_ID,  
        SLJ.Doc_Typ_ID,  
        SLJ.Prod_ID,  
        SLJ.OrigP_Type,  
        SLJ.OrigP,  
        SLJ.Class_ID  
FROM SLJ  
GROUP BY  
        SLJ.Exp_Sign,  
        SLJ.Invested,  
        SLJ.E_Delivery,  
        SLJ.Expense_Type,  
        SLJ.Proj_ID,  
        SLJ.Doc_ID,  
        SLJ.Doc_Typ_ID,  
        SLJ.Prod_ID,  
        SLJ.OrigP_Type,  
        SLJ.OrigP,  
        SLJ.Class_ID;

SLJ - filtered (via where) to a test set of 4 records:

SELECT  
        KS.Alloc_ID,  
        KS.Priority,  
        KS.Exp_Sign,  
        StatsT.*  
FROM  
        KS,  
        StatsT  
WHERE  
        ( 
            ( 
                (KS.Proj_ID) Is Null  
                Or (KS.Proj_ID)=[StatsT].[proj_id] 
            )  
                AND ( 
                        (KS.Doc_ID) Is Null  
                        Or (KS.Doc_ID)=[StatsT].[doc_id] 
                    )  
                AND ( 
                        (KS.Doc_Typ_ID) Is Null  
                        Or (KS.Doc_Typ_ID)=[StatsT].[doc_typ_id] 
                    )  
                AND (
                        (KS.Prod_ID) Is Null  
                        Or (KS.Prod_ID)=[StatsT].[prod_id] 
                    )  
                AND ( 
                        (KS.OrigP_Type) Is Null  
                        Or (KS.OrigP_Type)=[StatsT].[OrigP_Type] 
                    )  
                AND ( 
                        (KS.OrigP) Is Null  
                        Or (KS.OrigP)=[StatsT].[OrigP] 
                    )  
                AND ( 
                        (KS.Class_ID) Is Null  
                        Or (KS.Class_ID)=[StatsT].[class_id] 
                    )  
                AND ((StatsT.Doc_ID)=10437)  
                AND ((StatsT.Prod_ID)=104));

SLJ results:

Alloc_ID  Priority  Exp_Sign  Invested  E_Delivery  Expense_Type  Proj_ID   Doc_ID  Doc_Typ_ID  Prod_ID OrigP_Type  OrigP   Class_ID    Total  
1         10    Gross     Y     N           Inforce       2013_199  10437   5   104 Fund_Doc    40  -1  1  
1         10    Gross     Y     N           Inforce       2013_199  10437   5   104 Fund_Doc    43  -1  1  
-1        10    Reimb     Y     N           Inforce       2013_199  10437   5   104 Fund_Doc    40  -1  1  
-1        10    Reimb     Y     N           Inforce       2013_199  10437   5   104 Fund_Doc    43  -1  1

StatsT:

SELECT 
    E2_In__Bound_Stats__Long.Invested,  
    E2_In__Bound_Stats__Long.E_Delivery,  
    E2_In__Bound_Stats__Long.Expense_Type,  
    E2_In__Bound_Stats__Long.Proj_ID,  
    E2_In__Bound_Stats__Long.Doc_ID,  
    E2_In__Docs_List__Doc.Doc_Typ_ID,  
    E2_In__Bound_Stats__Long.Prod_ID,  
    E2_In__Bound_Stats__Long.OrigP_Type,  
    E2_In__Bound_Stats__Long.OrigP,  
    E2_In__Bound_Stats__Long.Class_ID,  
    E2_In__Bound_Stats__Long.Total  
FROM  
    E2_In__Bound_Stats__Long  
INNER JOIN  
    E2_In__Docs_List__Doc  
ON  
    E2_In__Bound_Stats__Long.Doc_ID = E2_In__Docs_List__Doc.Doc_ID;

KS table (Kitchen_Sink) - 119 Rows - null entries prevent a primary key:

Alloc_ID    Priority    Exp_Sign    Proj_ID Doc_ID  Doc_Typ_ID  Prod_ID OrigP_Type  OrigP   Class_ID

Docs_List table - 12465 rows - Primary Key is Doc_ID:

Doc_ID  Doc_Typ_ID

Bound_Stats table - 1289 rows (or 100k or so), only 4 survive filter on SLJ - Primary Key originally not set, should be everything except Total, setting did not remove bug:

Invested    E_Delivery  Expense_Type    Proj_ID Doc_ID  Prod_ID OrigP_Type  OrigP   Class_ID    Total

Access query plans:

Bad plan:
--- Matrix_Math_AAA ---

Inputs to Query

Table 'KS'  
Table 'E2_In__Bound_Stats__Long'  
Table 'E2_In__Docs_List__Doc'  
    Using index 'PrimaryKey'  
    Having Indexes:  
    PrimaryKey 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed, unique, primary-key, no-nulls  
    Doc_Type_ID1 12465 entries, 15 pages, 14 values  
      which has 1 column, fixed  
    Doc_ID1 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed  
Table 'KS'  
Table 'E2_In__Bound_Stats__Long'  
Table 'E2_In__Docs_List__Doc'  
    Using index 'PrimaryKey'  
    Having Indexes:  
    PrimaryKey 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed, unique, primary-key, no-nulls  
    Doc_Type_ID1 12465 entries, 15 pages, 14 values  
      which has 1 column, fixed  
    Doc_ID1 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed  
Table 'Allocations'

End inputs to Query

  1. Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
  2. Inner Join result of '01)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID"
  3. Sort table 'Allocations'
  4. Inner Join result of '02)' to result of '03)' using temporary index join expression "SLJ.Expense_Type=Allocations.Expense_Type And SLJ.E_Delivery=Allocations.E_Delivery And SLJ.Invested=Allocations.Invested"
  5. Sort result of '04)'
  6. Inner Join table 'KS' to result of '05)' using temporary index join expression "SLJ.Alloc_ID=Allocations.Allocation_ID" then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id) And ((KS.Doc_I"
  7. Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
  8. Inner Join result of '07)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID" store result in temporary table
  9. Inner Join table 'KS' to result of '08)' using X-Prod join then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id) And ((KS.Doc_I"
  10. Group result of '09)'
  11. Inner Join result of '06)' to result of '10)' using temporary index join expression "SLJ.Exp_Sign=MP.Exp_Sign And SLJ.Invested=MP.Invested And SLJ.E_Delivery=MP.E_Delivery And SLJ.Expense_Type=MP.Expense_Type And SLJ.Proj_ID=MP.Proj_ID And SLJ.Doc_ID=MP.Doc_ID And SLJ.Doc_Typ_ID=MP.Doc_Typ_ID And SLJ.Prod_ID=MP.Prod_ID And SLJ.OrigP_Type=MP.OrigP_Type" then test expression "MP.Class_ID=SLJ.Class_ID And (MP.OrigP=SLJ.OrigP And MP.MaxOfPriority=SLJ.Priority)"
  12. Group result of '11)'

Good Plan:
--- Matrix_Math_AAA_Good ---

Inputs to Query

Table 'KS'  
Table 'E2_In__Bound_Stats__Long'  
Table 'E2_In__Docs_List__Doc'  
    Using index 'PrimaryKey'  
    Having Indexes:  
    PrimaryKey 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed, unique, primary-key, no-nulls  
    Doc_Type_ID1 12465 entries, 15 pages, 14 values  
      which has 1 column, fixed  
    Doc_ID1 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed  
Table 'KS'  
Table 'E2_In__Bound_Stats__Long'  
Table 'E2_In__Docs_List__Doc'  
    Using index 'PrimaryKey'  
    Having Indexes:  
    PrimaryKey 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed, unique, primary-key, no-nulls  
    Doc_Type_ID1 12465 entries, 15 pages, 14 values  
      which has 1 column, fixed  
    Doc_ID1 12465 entries, 22 pages, 12465 values  
      which has 1 column, fixed  
Table 'Allocations'  
    Using index 'PrimaryKey'  
    Having Indexes:  
    PrimaryKey 312 entries, 4 pages, 312 values  
      which has 5 columns, fixed, unique, primary-key, no-nulls  
    Allocation_ID 312 entries, 1 page, 6 values  
      which has 1 column, fixed

End inputs to Query

  1. Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
  2. Restrict rows of table E2_In__Bound_Stats__Long by scanning testing expression "StatsT.Prod_ID=104 And StatsT.Doc_ID=10437"
  3. Inner Join result of '02)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID" store result in temporary table
  4. Inner Join table 'KS' to result of '03)' using X-Prod join then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id) And ((KS.Doc_I"
  5. Group result of '04)'
  6. Inner Join table 'KS' to result of '05)' using temporary index join expression "SLJ.Exp_Sign=MP.Exp_Sign" then test expression "MP.MaxOfPriority=SLJ.Priority"
  7. Sort result of '06)'
  8. Inner Join result of '01)' to result of '07)' using temporary index join expression "SLJ.E_Delivery=MP.E_Delivery And SLJ.Expense_Type=MP.Expense_Type And SLJ.Proj_ID=MP.Proj_ID And SLJ.Doc_ID=MP.Doc_ID And SLJ.Prod_ID=MP.Prod_ID And SLJ.OrigP_Type=MP.OrigP_Type And SLJ.Class_ID=MP.Class_ID And SLJ.OrigP=MP.OrigP And SLJ.Invested=MP.Invested" then test expression "(KS.Class_ID Is Null Or KS.Class_ID=StatsT.class_id) And ((KS.OrigP Is Null Or KS.OrigP=StatsT.OrigP) And ((KS.OrigP_Type Is Null Or KS.OrigP_Type=StatsT.OrigP_Type) And ((KS.Prod_ID Is Null Or KS.Prod_ID=StatsT.prod_id) And ((KS.Doc_ID Is Null Or KS.Doc_ID=StatsT.doc_id) And (KS.Proj_ID Is Null Or"
  9. Inner Join result of '08)' to table 'Allocations' using index 'Allocations!PrimaryKey' join expression "SLJ.Alloc_ID=Allocations.Allocation_ID And MP.Invested=Allocations.Invested And MP.E_Delivery=Allocations.E_Delivery And MP.Expense_Type=Allocations.Expense_Type"
  10. Inner Join result of '09)' to table 'E2_In__Docs_List__Doc' using index 'E2_In__Docs_List__Doc!PrimaryKey' join expression "E2_In__Bound_Stats__Long.Doc_ID=E2_In__Docs_List__Doc.Doc_ID" then test expression "MP.Doc_Typ_ID=SLJ.Doc_Typ_ID And (KS.Doc_Typ_ID Is Null Or KS.Doc_Typ_ID=StatsT.doc_typ_id)"
  11. Group result of '10)'

End 2013-12-30 edit

1
Of course there is SQL code, and it's likely that code that is causing the problem -- especially since you weren't the one who made it! With the query you can switch into Design View, which has a SQL view. Please post the code of the queries involved so we can advise better. Access doesn't tend to implement complicated queries well by itself, you should be writing the SQL yourself. - serakfalcon
It is a chain of 4 (or so) queries built up off of two tables. Since access is so cranky about real sql (and I use longish field names) I normally use the query designer, but I have edited the sql directly too. I was trying to say (and will clarify in the OP) I have not posted the sql code because it is so long and was hoping for a better way to transmit that info vs. just a wall of sql. - Et314159
OK yeah, I know the code may be longish but it's hard to really see what's going on otherwise. I've done absurdly convoluted queries in access and there usually is a way to get it to work out in the end, a lot of pain can happen because the joins aren't joining the way you think they are. - serakfalcon
Re: "Is there a way to upload an actual .mdb file so people can play with the copy producing the bug?" - Yes, you could upload it to a file sharing service like wikisend.com and then post a comment here with the link to the file. - Gord Thompson

1 Answers

0
votes

Proposed answer - can anyone confirm this makes sense?

The bound_stats table has a 11 field primary key. The "loose" join brings the conceptual key on SLJ up to 12 fields. MP being a group by to get a max for flitering (greatest-n-of-group) is has the 11 field key but must also join back onto SLJ on the priority (the greatest n). Thus there is a 12 field join, of which the 11th field happens to have some distingushing data (erroneously, I think).

In the case of the bad query plan, step 11 is:
11) Inner Join result of '06)' to result of '10)' using temporary index

I think this temporary index suffers from the access limitation of indecies to 10 fields. Thus fields after the first 10 are producing incorrect results. The good query plan makes use of a sort, which while maybe slower is presumably bypassing the 10 field limit on the index (or is just lucky).

Solution: redesign system to avoid keys greater then length 8.
Possible alternative soution: use explicit sort + "first" access function to impliment the single-pass solution to greatest-n-per-group. Depends on "order by" avoiding the 10 field limit - I do not know if that is true.

So - did I actually find the solution, or am I distracted by something irrelivant? Sure would be nice if access had thrown an error on the join instead of executing it incorrectly.