1
votes

I have an excel file connected to an access database. I have created a query through Power Query that simply brings the target table into the file and does a couple of minor things to it. I don’t load this to a worksheet but maintain a connection only.

I then have a number of other queries linking to the table created in the first query.

In one of these linked queries, I apply a variety of filters to exclude certain products, customers and so on. This reduces the 400,000 records in the original table in the first query down to around 227,000 records. I then load this table to a worksheet to do some analysis.

Finally I have a couple of queries looking at the 227,000 record table. However, I notice that when I refresh these queries and watch the progress in the right hand pane, they still go through 400,000 records as if they are looking through to the original table.

Is there any way to stop this happening in the expectation that doing so would help to speed up queries that refer to datasets that have themselves already been filtered?

Alternatively is there a better way to do what I’m doing?

Thanks

1
It's probably doing query folding behind the scenes, you may or may not be able to improve performance using Table.Buffer. These may be useful to you: Link 1, Link 2.Alexis Olson
Thank you! I’ll have a look tomorrow and see if Table.Buffer improves matters.mickeyt
Hi. I tried this and I think in another circumstance it would maybe work but it ground my computer to a halt in this case. I suspect my performance issue is another query I’m running which I will post about in a separate question. But I think table.buffer is the answer. Thank you.mickeyt
Bear in mind that buffering means that you have to computations locally on your machine rather than pushing them back to the server, so in many cases, it's not a good idea.Alexis Olson
Hi. In my case I have an access database saved on my computer too. Basically if I filter out some information to reduce the data by almost half, I don’t understand why dependent queries then need to go all the way back to the database. To me conceptually it defeats the point of doing the filtering.mickeyt

1 Answers

0
votes

First: How are you refreshing your queries? If you execute them one at a time then yes, they're all independent. However, when using Excel 2016 on a workbook where "Fast Data Load" is disabled on all queries, I've found that a Refresh All does cache and share query results with downstream queries!

Failing that, you could try the following:

  1. Move the query that makes the 227,000-row table into its own group called "Refresh First"
  2. Place your cursor in your 227,000-row table and click Data - Get & Transform - From Table,
  3. Change all of your queries to pull from this new query rather than the source.
  4. Create another group called "Refresh Second" that contains every query that
    • is downstream of the query you created in step 2, and
    • loads data to the workbook
  5. Move any remaining queries that load to the workbook into "Refresh First", "Refresh Second", or some other group. (By the way: I usually also have a "Connections" group that holds every query that doesn't load data to the workbook, too.)

Unfortunately, once you do this, "Refresh All" would have to be done twice to ensure all source changes are fully propagated because those 227,000 rows will be used before they've been updated from the 400,000. If you're willing to put up with this and refresh manually then you're all set! You can right-click and refresh query groups. Just right-cick and refresh the first group, wait, then right-click and refresh the second one.

For a more idiot-proof way of refreshing... you could try automating it with VBA, but queries normally refresh in the background; it will take some extra work to ensure that the second group of queries aren't started before all of the queries in your "Refresh First" group are completed.

Or... I've learned to strike a balance between fidelity in the real world but speed when developing by doing the following:

  1. Create a query called "ProductionMode" that returns true if you want full data, or false if you're just testing each query. This can just be a parameter if you like.

  2. Create a query called "fModeSensitiveQuery" defined as

    let
        // Get this once per time this function is retrived and cached, OUTSIDE of what happens each time the returned function is executed
        queryNameSuffix = if ProductionMode then
                ""
            else
                " Cached",
        // We can now use the pre-rendered queryNameSuffix value as a private variable that's not computed each time it's called
        returnedFunction = (queryName as text) as table => Expression.Evaluate(
            Expression.Identifier(
                queryName & queryNameSuffix
            ),
            #shared
        )
    in
        returnedFunction
    
  3. For each slow query ("YourQueryName") that loads to the table,

    1. Create "YourQueryName Cached" as a query that pulls straight from results the table.
    2. Create "modeYourQueryName" as a query defined as fModeSensitiveQuery("YourQueryName")
    3. Change all queries that use YourQueryName to use modeYourQueryName instead.

Now you can flip ProductionMode to true and changes propagate completely, or flip ProductionMode to false and you can test small changes quickly; if you're refreshing just one query it isn't recomputing the entire upstream to test it! Plus, I don't know why but when doing a Refresh All I'm pretty sure it also speeds up the whole thing even when ProductionMode is true!!!

This method has three caveats that I'm aware of:

  • Be sure to update your "YourQueryName Cached" query any time the "YourQueryName" query's resulting columns are added, removed, renamed, or typed differently. Or better yet, delete and recreate them. You can do this because,
  • Power-Query won't recognize your "YourQueryName" and "YourQueryName Cached" queries as dependencies of "modeYourQueryName". The Query Dependences diagram won't be quite right, you'll be able to delete "YourQueryName" or "YourQueryName Cached" without Power Query stopping you, and renaming YourQueryName will break things instead of Power Query automatically changing all of your other queries accordingly.
  • While faster, the user-experience is a rougher ride, too! The UI gets a little jerky because (and I'm totally guessing, here) this technique seems to cause many more queries to finish simultaneously, flooding Excel with too many repaint requests at the same time. (This isn't a problem, really, but it sure looks like one when you aren't expecting it!)