So I'm trying to fetch reddit posts data using praw and turn it into a JSON Lines file.
What I need is something like this:
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?"], "response": ["Debug Stick?"], "id": "gabsj3"}
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?", "Debug Stick?"], "response": ["My guess is the dot is flat out gone\n\nThere's no way for it to exist so why would they leave it in"], "id": "gabsj3"}
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?", "Debug Stick?", "My guess is the dot is flat out gone\n\nThere's no way for it to exist so why would they leave it in"], "response": ["No, it's still in the game. Use the debug stick to set all sides to `none`"], "id": "gabsj3"}
So context contains ["POST TITLE", "FIRST LEVEL COMMENT", "SECOND LEVEL COMMENT", "ETC..."] and response contains the last level comment. In this post on reddit, it should be:
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?", "Debug Stick?", "My guess is the dot is flat out gone\n\nThere's no way for it to exist so why would they leave it in", "No, it's still in the game. Use the debug stick to set all sides to `none`"], "response": ["Huh, alright"], "id": "gabsj3"}
But the output of my code is something like this:
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?"], "response": ["Debug Stick?", "I think we can still use resource packs to change it back into a dot, I don't know so don't quote me on that", "I honestly think the cross redstone looks a bit more like a splatter."], "id": "gabsj3"}
Here's my code:
import praw
import jsonlines
reddit = praw.Reddit(client_id='-', client_secret='-', user_agent='user_agent')
max = 1000
sequence =1
for post in reddit.subreddit('minecraft').new(limit=max):
data = []
title = []
comment = []
response = []
post_id = post.id
titl = post.title
# print("https://www.reddit.com/"+post.permalink)
print("Fetched "+str(sequence) + " posts .. ")
title.append(titl)
try:
submission = reddit.submission(id=post_id)
submission.comments.replace_more(limit=None)
sequence = sequence + 1
for top_level_comment in submission.comments:
cmnt_body = top_level_comment.body
comment.append(cmnt_body)
for second_level_comment in top_level_comment.replies:
response.append(second_level_comment.body)
context = [title[0],comment[0]]
data.append({"context":context,"response":response,"id":post_id})
response = []
# print(data[0])
with jsonlines.open('2020-04-30_12.jsonl', mode='a') as writer:
writer.write(data.pop())
comment.pop()
title.pop()
except Exception :
pass
import jsonlines, andimport praw, and a definition ofreddit. It also has a few syntax errors, which makes it cumbersome to run and debug. - jarhill0