Extract a specific string from text file and then to create HTTP request

Question

I'm trying to extract a specific string value from a text file (file1.txt) and then to create HTTP GET request with the extracted string (url address), the HTTP response should be saved as a new HTML file in the directory. The string I'm trying to extract is a value of a specific key.

For example: "display_url":"test.com" (extract "test.com" and then to create http request)

My txt file content:

{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://t.co/ZtCsuk7Ek2 #osining","source":"\u003ca href=\"http://twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://t.co/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}

My code:

import re
with open('file1') as f:
found = []
for line in f.readlines():
    found += re.findall(r'"display_url":\s(\w+)\s', line)
print(found)

And what have you tried so far? Please post your code as it is. — Matt Morgan
Does your indentation actually look like what you posted? If not, you should fix it, it matters in Python. — Matt Morgan

Matt Morgan Matt Morgan · Accepted Answer · 2018-11-18T15:10:20

Please note that indentation is critical in Python. It's not clear to me if you have made a mistake in your code indentation, or just a mistake in formatting your posted question. Having said that...

You need to do four things to accomplish the task:

Read file1.txt from disk.
Parse the contents of the file to find the display_url
Call the URL to get a response
Write the response to disk

Your code attempts to do steps 1 and 2, but there are a few problems. The first issue is that your text file has an error in it. It is missing a closing quotation mark for this key-value pair: "id_str":"325802".

If you fix that, you then need to fix the indentation of your code so that f is available when you try to use it. Finally, I don't think the regex approach is really the way to go here.

You can read the file and parse it to a Python dictionary easily. Finding the information you want requires that you know the structure of the JSON, here is one way you could do it:

import json


with open('./file1.txt', 'r') as f:
    lines = f.readlines()
    text = ''.join(lines)


dictionary = json.loads(text)
entities = dictionary.get('entities')
urls = entities.get('urls')[0]
display_url = urls.get('display_url')
print(display_url)

Now you need to figure out steps 3 and 4, which are really the easy part compared to step 2.

Extract a specific string from text file and then to create HTTP request

2 Answers