I am trying to direct insert sparkstream data into an Amazon Redshift cluster but am not able to find right way.
Below is the code that i got but its first inserting into S3 then copying to Redshift:.
#REDSHIFT_JDBC_URL = "jdbc:redshift://%s:5439/%s" % (REDSHIFT_SERVER, DATABASE)
df.write \
.format("com.databricks.spark.redshift") \
.option("url", REDSHIFT_JDBC_URL) \
.option("dbtable", TABLE_NAME) \
.option("tempdir", "s3n://%s:%s@%s" % (ACCESS_KEY, SECRET, S3_BUCKET_PATH)) \
.mode("overwrite") \
.save()
Does it impact streaming or insertion performance?
Or any other way to do it?