1
votes

I'm seeking a solution on how to write string as jsonb type in postgresql. So DynamicFrame has a string column that holds json data. When trying to save to postgres

DataSink0 = glueContext.write_dynamic_frame.from_catalog(frame = Transform0, database = "cms", table_name = "cms_public_listings", transformation_ctx = "DataSink0")

I get the following error:

An error was encountered:

An error occurred while calling o1623.pyWriteDynamicFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 134.0 failed 4 times, most recent failure: Lost task 0.3 in stage 134.0 (TID 137, ip-172-31-27-18.ec2.internal, executor 24): java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "public".listings ([REMOVED_COLUMNS]) VALUES ([REMOVED_VALUES]) was aborted: ERROR: column "schema" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 207  Call getNextException to see other errors in the batch.

I can't change the schema to hold a string, so it is either I use AWS Glue ETL or would have to craft Python Shell Job. I would prefer to find a way to use PySpark with AWS Glue.

1

1 Answers

1
votes

I prefer to use native spark dataframe, because it allows me more customization.I can use stringtype property to cast json field from dataframe to jsonb field in the table. For this case, my dataframe has two fields.

from pyspark import SparkConf

sc = SparkContext.getOrCreate(SparkConf())
spark = SparkSession(sc)

df = spark.read.format('csv') \
               .option('delimiter','|') \
               .option('header','True') \
               .load('your_path') 

##some transformation...

url = 'jdbc:postgresql://your_host:5432/your_databasename'
properties = {'user':'*****',
              'password':'*****',
              'driver': "org.postgresql.Driver",
              'stringtype':"unspecified"}
        
df.write.jdbc(url=url, table='your_tablename', mode='append', properties=properties)

Before to execute the above script, you should create the table in postgresql, because the property mode is setted as append. This as follow:

create table your_tablename
(
    my_json_field jsonb,
    another_field int
)