1
votes

I am trying to get a column data type from a dataframe

Here is a sample code:

    print training_data.schema
    print 'fields'
    print training_data.schema.fields
    print 'names'
    print training_data.schema.names

The above code prints as shown below: StructType(List(StructField(id,LongType,true),StructField(text,StringType,true),StructField(label,DoubleType,true))) fields [StructField(id,LongType,true), StructField(text,StringType,true), StructField(label,DoubleType,true)] names ['id', 'text', 'label']

But how can I get the datatype for label column ? Thanks a lot for your time.

Regards

3

3 Answers

1
votes
df['col label'].dtype

Is one option.

Edit

name_dtype = df['col label'].dtype.name
1
votes

Here is copy-paste example of how to get column names and colum types for pandas dataframe:

import pandas as pd

list = [['Tom',34, 45.5], ['Jack',23, 60.5]]
df = pd.DataFrame(list, columns=["Name","Age","Pay"])

for column in df.columns:
    print("Column ", column, "is dtype:", df[column].dtype.name)

result:

Column  Name is dtype: object
Column  Age is dtype: int64
Column  Pay is dtype: float64
0
votes

Thanks for all the responses. I found the below solution, hoping it will be helpful to any one looking for the answer:

       for f, v in zip(df.schema.fields, df.schema.names):
          if v == colname:
            datatype = f.dataType