0
votes

First off, I'm a complete beginner, my apologies if this is too easy or trivial.

So, I have some big twitter json datasets from archive.org (https://archive.org/details/archiveteam-twitter-stream-2017-01 for example), which I would like to filter on certain hashtags, and make somewhat readable using python. As of now, I can't seem to open the file with python or jupyter, and can't seem to order the file at all.

An example of how the files look:

{"created_at":"Sun Oct 22 06:30:00 +0000 2017","id":921986981168422912,"id_str":"921986981168422912","text":"RT @hypebizzle: \"Tell your dog to leave me alone, it's annoying\"\n\nFirst off all, get out of my house","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":421547249,"id_str":"421547249","name":"Cris","screen_name":"crisbeltran98","location":"Cajeme, Sonora","url":"http://Instagram.com/cristinabeltraan","description":"il futuro non \u00e8 scritto // Lic.inPsicology on my way. \\ \u201cCristina saludos, un beso\" LFHP.","translator_type":"none","protected":false,"verified":false,"followers_count":1498,"friends_count":1383,"listed_count":6,"favourites_count":3174,"statuses_count":39135,"created_at":"Sat Nov 26 02:51:49 +0000 2011","utc_offset":-25200,"time_zone":"Arizona","geo_enabled":true,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/768201074/3b0047f4eb39cd54a3a82a2d62fa715a.png","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/768201074/3b0047f4eb39cd54a3a82a2d62fa715a.png","profile_background_tile":true,"profile_link_color":"000088","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/919935822694047745/nm6uOnr3_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/919935822694047745/nm6uOnr3_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/421547249/1508164767","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":

Is there anyone who knows which steps to take? I can't seem to find the solution online.

1
Have you tried using the json module? - asongtoruin
You should show us what you've tried so far - Xay
Welcome to StackOverflow! What do you mean by "I can't seem to open the file with python"? Do you have any code that you could share? It's quite difficult to see what's gone wrong if we can't see the code. Please have a look at how to create a Minimal, Complete and Verifiable example. Post the code you have tried and the errors you have received. Be as specific as possible as it will lead to better answers. - José Luis
Sure! I've tried multiple tutorials and steps, this (seems to me) one of the more simpler ones: 'import json twitter_test = open('dertig.json', 'rU') json_data = json.load(twitter_test) print (json_data)' This is pretty much the first step, and when I enter this, it gives me: JSONDecodeError: Extra data: line 2 column 1 (char 4856)' - pimwel

1 Answers

0
votes

Welcome to Stack Overflow! What have you tried so far? When I open JSON in Python, this is what I do:

import json
import pprint

df = json.load(open('YOUR JSON DATA'))
pprint(df)

Once this is done, you can call your data by doing something like:

df[“created_at”]