0
votes

I have a dataframe and a function to get random dates..

from datetime import time
import pandas as pd

def dates(start_date, end_date):
    start_date = date(start_date[0], start_date[1], start_date[2])
    end_date = date(end_date[0], end_date[1], end_date[2])
    
    days_delta = (end_date - start_date).days
    
    return start_date + timedelta(days=random.randrange(days_delta))


df = pd.Dataframe(index=range(100))

df['MOVE_OUT_DATE'] = date(9999, 12, 31)
df['MOVE_IN_DATE'] = [dates((2021, 1, 1), (2021, 6, 30)) for _ in range(df.shape[0])]

To get the difference in days I do this,

df['days_diff] = df['MOVE_OUT_DATE'] - df['MOVE_IN_DATE']

and this works fine in VS Code. But it throws a "Python int too large to convert to C long" in Databricks. A screenshot of error is attached below, enter image description here

Any help or suggestion is appreciated. Thank you.

1
please provide a reproducible example, your current code is not giving a DataFrame - mozway

1 Answers

0
votes

I was able to get everything to work and I believe it is what you are trying to accomplish with your code

df = pd.DataFrame(pd.date_range('2021-01-01', '2021-06-01', freq = 'D'), columns = ['START_DATE'])
df['MOVE_OUT_DATE'] = '2260-12-31'
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MOVE_OUT_DATE'] = pd.to_datetime(df['MOVE_OUT_DATE'])
df['DAYS_DIFF'] = df['MOVE_OUT_DATE'] - df['START_DATE']
df

However, if you notice the 'MOVE_OUT_DATE' is only set to 2060 as anything long than that produced an error as the being to long. Could you take this and generate the results you want (if you converted it into a function)?