0
votes

I am trying to automate the importing of a county file and convert it to csv. The issue I am having is that the account number is getting converted to scientific notation.

Import-Excel BacktaxRaw_FL_Flagler2.xlsx | Export-Csv ExportTest.csv

If I tried formatting number fields to be text fields using

C:\testdata> Import-Excel BacktaxRaw_FL_Flagler.xlsx | Export-Excel ExportTest.xlsx -Show -AutoSize -NumberFormat '@'

The problem here is that the account number last few digits are replaced with Zero. Example. The original number is ''1914310000010100016", Import is converting it to "1.9143100000101E+18 and finally the export is getting converted to "1914310000010100000". As you can see the 16 at the end is replaced with 00.

Ironically if I import the data into a new excel file and specify importing the data through excel and define the type as text then I can use Import-excel to convert to csv with no issue. This is telling me that Import-Excel is struggling to import the data from the original excel file. Can someone help me with this issue please?

I have attached the import file: BacktaxRaw_FL_Flagler.xlsx

After doing some research and thanks to Ron's suggestions, I did a deeper dive to view the raw source code of the above file. I unzipped the contents of the BacktaxRaw_FL_Flagler.xlsx file and navigated to the \xl\worksheets\sheet1.xml file. I opened in notepad and saw first hand that none of the values I am looking for in column A are stored in scientific notation. They are in deed stored uncorrupted. Which means every attempt I have made to retrieve the data has resulted in data corruption. Even using Excels own data import wizard will alter the values.

Screen shot So you don't need to download file

If you want, download the raw xml file here: Raw XML file

2
I'm confused. Which is the original file (excel or CSV), and what is the desired result?. If CSV is the original, you need to tell Excel that the data it is importing is Text. If the Excel is the original, then the data in your linked file already has the error you are trying to avoid. - Ron Rosenfeld
The Excel file is the original file. I am trying any which way to convert to csv. The real problem is that in the original excel file there is a column called "Account Number" that has a value of ''1914310000010100016". Now matter what method I use to convert (programmatically that is) the end result is a csv file that either has "1.9143100000101E+18" or "1914310000010100000". Notice that the last two digits are replaced with 00. End result when I convert to CSV, the value ''1914310000010100016" should remain ''1914310000010100016". While I would like to use Import-Excel I will use anything. - D1gitalDR
In the file you attached, I do not see any values like 1914310000010100016. The numeric values in the Account Number column have already been converted to their numeric values. See Rows 3,23,25,28,30 and higher. - Ron Rosenfeld
If you open directly in excel without doing anything you will see 1.91431E+18 (A3). Original File Screen Shot If you use Excels data import wizard and set column A as text you will see the correct values. Fixed File Screen Shot. But when I import using Import-Excel I get a corrupted value Converted File Screen Shot. - D1gitalDR
Yes, but when you "use Excels data import wizard", where are you importing it from??? - Ron Rosenfeld

2 Answers

0
votes

Importing csv in Excel can drive you nuts sometimes..

Best thing to do is download the original CSV file and make adjustments in there before opening in Excel.

The trick is to make Excel interpret the values for the "Account Number" column as strings instead of numeric values. This can be done by prefixing all values in that column with a Tab character ("`t").

$csvFileIn  = 'D:\Test\Report.csv'
$csvFileOut = 'D:\Test\CorrectedReport.csv'
$ColumnName = 'Account Number'

# import the csv file you have downloaded and format the "Account Number"
# column by prefixing the values with a TAB character ("`t").
# this will effectively force Excel NOT to interpret the value as numeric.
$csv = Import-Csv -Path $csvFileIn
foreach ($item in $csv) { $item.$ColumnName = "`t" + $item.$ColumnName }

# save the updated csv file
# the '-UseCulture' switch makes sure the delimiter used is the same that Excel will use on the same system
$csv | Export-Csv -Path $csvFileOut -UseCulture -NoTypeInformation

Now you can simply double-click the 'D:\Test\CorrectedReport.csv' file to open in Excel and this should be what it looks like:

enter image description here

0
votes

The behavior seems odd. Especially since an Excel cell that is stored as a text string will retain that property if just opened in Excel.

So I delved into the Open Office XML specifications a bit more closely.

Examining the XML of the document closely, it appears that the document was created incorrectly. I suspect the xlsx workbook was NOT created by Excel, but rather by some other program.

If I am correct, for the cell in question A3, the value is stored as a number and formatted as General.

<c r="A3" s="2"><v>1914310000010100016</v></c>

s="2" points to a General format in the style table, and the value is stored directly.

If it were stored as a string, with a format of text, the entry would look something like:

r="A3" s="1" t="s"><v>10247</v></c>

where t="s" represents a value from a string table -- actually the 10247th (0-based counting) entry from the SharedStrings table which doesn't even exist in your original xlsx, but will be created if you format A3 as text, then enter the account number for that entry, and then save the file.

So rather than a problem with the import process, I am thinking there is a problem with the actual xlsx file creation. And that we have no control over.

If the file creation problem cannot be corrected, you will need a non-Excel tool that can edit the XML files to correct the problem.

Note: If the county can provide a CSV file, instead of an improperly created Excel file, you should be able to import that without difficulty. Note: I see that one of the options on the page is to download a CSV file. Just use that option and then you'll be able to import with no issues at all!