2
votes

I am really new to dask. I want to create a dask dataframe from a python list of tuples. In pandas, you can use DataFrame.from_records to convert a list of tuples to a dataframe. What function can give me same functionality in dask. My data looks a bit like this

[(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')]

I am using this code to perform the task right now. Is this correct way of doing this.

import pandas as pd
import dask
import dask.dataframe as dd

names = ['id', 'status', 'reg_entry']
dfs = dask.delayed(pd.DataFrame.from_records)(cursor.fetchall(), columns=names)

df = dd.from_delayed(dfs)
1
Welcome to SO. Please read How to ask a good question. Can you provide code samples what you did already? - Florian
@Florian sorry for not being clearer the first time. I am new to this forum and in learning phase. Thanks for correcting me. - Ali. K

1 Answers

2
votes

You can try creating a dask dataframe from an existing pandas dataframe (to be able to use all pandas constructors):

df = pd.DataFrame([(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')])
ddf = dd.from_pandas(df, npartitions=2)