0
votes

I'm trying to load data from PubSub messages to GCS files. Simple pipeline: PubSub source -> JSON Parser -> GCS sink.

Since PubSub only accept the data argument as utf-8, how can I decode it in CDAP? Should I build a custom plugin implementing a decode function or is it better to pass my data as string using attributes in the PuSub message instead of 'data'?

1
The standard encoding for JSON is utf-8. I may be entirely misunderstanding your question, but it seems as if you are thinking of JSON and utf-8 as two mutually-exclusive things... but "PubSub only accept the data argument as utf-8," makes data perfectly suitable for a JSON payload. - Michael - sqlbot
Hi Michael, thanks for your time. The issue was about CDAP plugin. The publish phase on PubSub was perfectly fine. The problem was the retrieve of the message in plain text. I wasn't able to retrieve in plain text the PubSub message using the CDAP plugins, I was reading only bytes or the ASCII numbers. - Luca Natali

1 Answers

1
votes

I solved the issue using a Projector plugin instead of the JSON Parser between PubSub source and GCS sink. The Projector casts the byte message attribute of the PubSub source to a string (plain text).