0
votes

I'm having a bit of a problem trying to get my code to compile. Looks like the line with main_df = df is causing a failure, and I don't quite understand why.

Any help is much appreciated.

import quandl
import pandas as pd

# API key was removed
api_key = 'X'
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states',flavor='html5lib')

main_df = pd.DataFrame()

for abbv in fiddy_states[0][0][1:]:
    query = "FMAC/HPI_"+str(abbv)
    df = quandl.get(query, authtoken=api_key)

    if main_df.empty:
        main_df = df
    else:
        main_df = main_df.join(df)

print(main_df.head())

I get this error:

Traceback (most recent call last): File "C:/Users/Dave/Documents/Python Files/helloworld.py", line 17, in main_df = main_df.join(df)

File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 4385, in join rsuffix=rsuffix, sort=sort)

File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 4399, in _join_compat suffixes=(lsuffix, rsuffix), sort=sort)

File "C:\Python35\lib\site-packages\pandas\tools\merge.py", line 39, in merge return op.get_result()

File "C:\Python35\lib\site-packages\pandas\tools\merge.py", line 223, in get_result rdata.items, rsuf)

File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 4445, in items_overlap_with_suffix to_rename) ValueError: columns overlap but no suffix specified: Index(['Value'], dtype='object')

1
What are you trying to do? Append the new data to the dataframe in each iteration? pd.join does sql-style joins, probably not what you are looking for here. Try main_df = main_df.append(df). - bananafish
I'm looking to join rather than append in this case. Appending does work, but doesn't give me what I'm looking for. Do you know why it isn't working in this case? - wowdavers
Well, I don't know what you are looking for. Can you give an example input/output? - bananafish
Should be a data frame with Date as the index, then 50 columns (each one should be a state abbreviation) with data corresponding to a specific date. - wowdavers

1 Answers

1
votes

You can pass a list of codes to the quandl.get function, then you get a dataframe back with data for each code in a column. Code:

import quandl
import pandas as pd

fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states', flavor='html5lib')
data = quandl.get(["FMAC/HPI_"+s for s in fiddy_states[0][0][1:]])