0
votes

I currently have a Python reddit bot written using PRAW that gets all of the comments from a specific subreddit, looks at their author and finds out if that author has at least 3 comments in the subreddit. If they have 3+ comments, then they are added to an approved to post submissions text file. My code currently "works" but honestly, it's so bad that I'm not even sure how. What is a better way of accomplishing my goal? What I currently have:

def get_approved_posters(reddit):

   subreddit = reddit.subreddit('Enter_Subreddit_Here')
   subreddit_comments = subreddit.comments(limit=1000)
   dictionary_of_posters = {}
   unique_posters = {}
   commentCount = 1
   duplicate_check = False
   unique_authors_file = open("unique_posters.txt", "w")

   print("Obtaining comments...")
   for comment in subreddit_comments:
      dictionary_of_posters[commentCount] = str(comment.author)
      for key, value in dictionary_of_posters.items():
        if value not in unique_posters.values():
            unique_posters[key] = value
    for key, value in unique_posters.items():
        if key >= 3:
            commentCount += 1
        if duplicate_check is not True:
            commentCount += 1
            print("Adding author to dictionary of posters...")
            unique_posters[commentCount] = str(comment.author)
            print("Author added to dictionary of posters.")
            if commentCount >= 3:
                duplicate_check = True

   for x in unique_posters:
      unique_authors_file.write(str(unique_posters[x]) + '\n')

   total_comments = open("total_comments.txt", "w")
   total_comments.write(str(dictionary_of_posters))

   unique_authors_file.close()
   total_comments.close()

   unique_authors_file = open("unique_posters.txt", "r+")
   total_comments = open("total_comments.txt", "r")
   data = total_comments.read()
   approved_list = unique_authors_file.read().split('\n')
   print(approved_list)
   approved_posters = open("approved_posters.txt", "w")
   for username in approved_list:
      count = data.count(username)
      if(count >= 3):
        approved_posters.write(username + '\n')
      print("Count for " + username + " is " + str(count))

   approved_posters.close()
   unique_authors_file.close()
   total_comments.close()
1
Is that whole section of code in one function? You should indent so we can tell what's in the function and what isn't. - jarcobi889
it is all one function; I'll edit it - CsCody
What kind of improvements are you hoping to gain? Speed? Accuracy? What's "wrong" with this code for you? There isn't a whole lot of guidance on how we can help you out - jarcobi889
I'm currently using 3 different text files just to figure out who has posted three times in a subreddit. It's also not always accurate. I was originally using it for a specific subreddit, where it was working fine, and then I tested the code on a second subreddit and ended up with the same name in my output file 3 times in a row instead of it being unique - CsCody
For your text files, I would store your data in a json file. So you would add your data to one single dictionary that's subdivided by key into the three types of data you need to track, then convert it to a json string and dump it all into one file. There's quite a few tutorials on it, personally I import json and use json.load() and json.dump(). I'll keep looking through your code to see what else would help you out - jarcobi889

1 Answers

0
votes

Maybe it's just me being slow this morning, but I'm struggling to follow/understand your use of commentCount and unique_posters. Actually, it probably is me.

I would get all the comments from the subreddit, like you did, and for each comment, do the following:

for comment in subreddit_comments:
    try:
        dictionary_of_posters[comment.author] += 1
    except KeyError:
        dictionary_of_posters[comment.author] = 1

for username, comment_count in dictionary_of_posters.items():
    if comment_count >= 3:
        approved_authors.append(username)

This method takes advantage of the fact that dictionaries cannot have two of the same key value. That way, you don't have to do a duplicate check or anything. If it makes you feel better, you can go list(set(approved_authors)) and that will get rid of any stray duplicates.