I have data streaming into a topic in Google PubSub. I can see that data using simple Python code:
...
def callback(message):
print(datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f") + ": message = '" + message.data + "'")
message.ack()
future = subscriber.subscribe(subscription_name, callback)
future.result()
The above python code receives data from the Google PubSub topic (with subscriber subscriber_name) and writes it to the terminal, as expected. I would like to stream the same data from the topic into PySpark (RDD or dataframe), so I can do other streaming transformation such as windowing and aggregations in PySpark, as described here: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html.
That link has documentation for reading other streaming sources, (e.g. Kafka), but not Google PubSub. Is there a way to stream from Google PubSub into PySpark?