0
votes

I have some data in a csv file but the decimal separator is ',', as we use in Brazil. I tried to read my file using the read_csv function with the parameter decimal defined as ',' but when I check the type of return, is str and I tough it would be float.

So, what the decimal parameter does? what is the best way to deal with this? should I convert the data manually? I'm using Python 3 and Pandas 0.19.2

Bellow is a data sample and the code I'm using

import pandas as pd

# Get raw data from file
file_name = 'dados.csv'
dados = pd.read_csv(file_name, sep=";", decimal=",", thousands=".")

ANO;COD_SEG;Codi_Saude;COD_UB;MES;SB_CONS;SB_ESCO;SB_TRAT;SB_URGE;SB_GEST;POP;ICONSB;IESCO;IRESOL;IURG
2012;4;10;19712;4;28;164;3;16;0;5274;0,530906333;3,109594236;0,107142857;0,303375047
2012;4;10;19712;5;13;0;6;23;0;5274;0,246492226;0;0,461538462;0,436101631
2012;4;10;19712;6;8;135;7;12;0;5274;0,151687524;2,559726962;0,875;0,227531286
2012;4;10;19712;7;0;0;0;0;0;5274;0;0;;0
1
How do you expect it to tell the difference between a ',' used to separate columns and a ',' used as a decimal separator? - Craig
I defined the separator as ";" and the decimal as "." - daniboy000
Please add a sample from the file you are trying to parse and the python code you are using to load the file. - Craig
I've just tested your code and data and it works correctly for me. Is it possible that you have a non-numeric value somewhere in your input file? When I add a text character to one of the columns of data, it treats the whole column as text and returns the numbers as strings. - Craig
It worked. It was an invalid in my dataset. Thank you @Craig. - daniboy000

1 Answers

0
votes

You are handling the data correctly, there is no need to convert the data manually, the read_csv function itself is capable of handling this.

In traditional formatting, data in Brazil uses a comma "," as a decimal point and the separation of columns is done by a semicolon ";" (contrary to what the CSV file extension suggests).

It is recommended to read the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

The file is identifying a column as a string because it must have some field with text, or another invalid character.