I need to remove utf formatting and convert a column to Integer type.
Below is what I have done to remove the utf format
>>auction_data = auction_raw_data.map(lambda line: line.encode("ascii","ignore").split(","))
>>auction_Data.take(2)
>>[['8211480551', '52.99', '1.201505', 'hanna1104', '94', '49.99', '311.6'], ['8211480551', '50.99', '1.203843', 'wrufai1', '90', '49.99', '311.6']]
But, when I create a dataframe with for the same data using the schema, and try to retrieve particular data, I get the data prefixed with a " u' ".
>>schema = StructType([ StructField("auctionid", StringType(), True),
StructField("bid", StringType(), True),
StructField("bidtime", StringType(), True),
StructField("bidder", StringType(), True),
StructField("bidderrate", StringType(), True),
StructField("openbid", StringType(), True),
StructField("price", StringType(), True)])`
>>xbox_df = sqlContext.createDataFrame(auction_data,schema)
>>xbox_df.registerTempTable("auction")
>>first_line = sqlContext.sql("select * from auction where auctionid=8211480551").collect()
>>for i in first_line:
>> print i
>>Row(auctionid=u'8211480551', bid=u'52.99', bidtime=u'1.201505', bidder=u'hanna1104', bidderrate=u'94', openbid=u'49.99', price=u'311.6')
>>Row(auctionid=u'8211480551', bid=u'50.99', bidtime=u'1.203843', bidder=u'wrufai1', bidderrate=u'90', openbid=u'49.99', price=u'311.6')
How to remove the u' infront of the values, also I want to convert the bid value into an Integer. When I directly change in schema definition, I get an error saying " TypeError: IntegerType can not accept object in type ".Show less