2
votes

I'm using Power Query in Excel to reference a table within the same workbook. I want to select specific columns within that table. I know that can be accomplished by loading the table into Power Query and then choosing the columns I want to use. The resulting query is:

let
    Source = Excel.CurrentWorkbook(){[Name="Legend_Data_Merged"]}[Content],
    #"Removed Other Columns" = Table.SelectColumns(
        Source,
        {
            "Observation number",
            "First Sales Offer - Products",
            "Middle Sales Offer(s) - Products",
            "Last Sales Offer - Products"
        }
    )
in
    #"Removed Other Columns"`

So, here's my question/issue:

I think this way is first pulling the entire table into Power Query, then stripping down from there. What I want to do is define the source table as the "Legend_Data_Merged" table, but choose which columns to pull from that table in the same operation. This way, it never has to load the entire table into Power Query. The reason is the table itself is about 120 columns long, and I only need three columns, and I have about 20 of these similar queries and it's starting to hog memory. Am I wrong in my logic here? And if not, anyone have an idea on what the query would be?

Could there maybe be a way to define the columns in the [content] part of the source operation ?

Thanks.

3

3 Answers

1
votes

It may be a very simple attempt, but why not add a Worksheet "DataTransfer" where you set only references to the columns you need and read this small table with power query ?

If your columns are close together you could also set a named range and read only this range with powerquery.

But anyway, when the workbook is open, your big table is already in memory. There should not be much memory allocation, when reading the table with powerquery and selecting the three columns.

0
votes

It's possible there's some problem in Excel or Power Query. How much memory are you seeing used by the excel.exe and Microsoft.Mashup.Container.NetFX40.exe process?

The only way to directly remove the columns from [Content] is to modify the actual data of the Excel table. You could try that to see if it makes a difference, but Power Query generally tries to be smart about only loading columns it needs.

If your query is using a lot of memory, you might get performance saving your data in a more efficient format (I'd try CSV). In any case, try turning off the "load to worksheet" and instead just load to data model.

0
votes

You can refer to my question and answer here.

What you will want to do is use the Table.SelectColumns method instead of Remove.

let
    db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
    Sales_vDimCustomer = Table.SelectColumns(
        db{[Schema="Sales",Item="vDimCustomer"]}[Data],
        {
            "Name",
            "Representative",
            "Status",
            "DateLastModified",
            "UserLastModified",
            "ExtractionDate"
        }
    )
in
    Sales_vDimCustomer

When viewing the raw sql using Express Profiler it will be done in one statement where

SELECT
       $Table.Name,
       $Table.Representative,
       $Table.Status,
       $Table.DateLastModified, 
       $Table.UserLastModified,
       $Table.ExtractionDate
FROM
       Sales.vDimCustomer as $Table

PowerBi and Power Query will also now show an error/ warning message with this recommendation when trying to import a large number of columns.