0
votes

I'm having a csv file as a source dataset. Currently in the table there is a column that I would like to use Python to loop and extract data from string in each cell. For example, in a cell:

"Quantity changed by 10, Price changed by 90."

I would like to use Python and extract "Quantity, Price" and "10, 90" to create new table with those properties and values. Then use PowerBI Visual to create visuals instead of using Python visual. How should I do that? And is that possible at all?

Edit: Due to all the confusions, I'm adding a screenshot of the column I'm working with.

enter image description here

I would like to go through all rows in the Properties column, get data in each cell, then extract them to create a new table. For example, in this case, the new table will be like:

Properties | Value Unconnected Height | -2800 Area | -39.883

Regarding code, I have nothing yet, just the base Python snippet PowerBI created:

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(Properties)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

So PowerBI is using pandas and matplotlib to get data and plot data. But I only want to get data and output as a new table. If Python is not possible, is there a way to use Query?

1
You should post any code that you have, particularly to "create a new table". I assume that you want to augment the existing data with columns for quantity and price (and perhaps to remove the column that contained the quantity & price strings), but it's not clear what you want.mhawke
I'm new with PowerBI Python so I have no code yet. I'll edit the question with the table. Sorry for the confusion.user3648721
So the properties column can contain anything in unpredictable formats? E.g. "Quantity changed by 10, Price changed by 90." can be parsed as I show in my answer, but the same code will not be able to parse the data shown in the screen shot. You need to clarify that, and show examples of all possible formats for the properties column, otherwise it's not possible to advise. Adding new columns to an existing data frame is possible, but the problem seems to be in reliably parsing the data.mhawke
The data in the specified cell mention above is string, so the code you posted still works. The only thing I'm looking for is to actually get the data in the cell in the table, then create a new table and put the data extracted in. Edit: Sorry I was confused. Yes the format is unpredictable. One cell it can be 2 things, another will be 3, 4 etc. But in the end I just need a list of all those things, so it doesn't matter regarding the format.user3648721
No, the code I've posted will not work with different string formats. I don't know enough about Power BI to assist you further.mhawke

1 Answers

0
votes

Extracting the data from the column is fairly easy if the data follows the same format:

s = "Quantity changed by 10, Price changed by 90."
l, sep, r = s.partition(',')
if sep:
    quantity = int(l.split()[-1])
    price = int(r.split()[-1][:-1])
    print(quantity, price)

#10 90

The string contained in the cell is partitioned by the comma to separate the quantity and price strings, then each string it processed to extract the value.