1
votes

Need help with this, since I'm struggling to understand both how to convert my data stored in .csv to pandas df, and how to parse data for bokeh.

My .csv file looks similar to this

6:22,30,30,31,31
6:38,30,30,32,32
6:53,30,30,31,32
7:05,30,30,32,32
7:39,31,31,33,33

What I do:

df = pd.DataFrame.from_csv('D:\\Job\\GoogleDrive\\Job\\chatwars.csv')

This gives me df with beautiful five columns of data:

       30  30.1  31  31.1
6:22                     
6:38   30    30  32    32
6:53   30    30  31    32
7:05   30    30  32    32
7:39   31    31  33    33

Though I have no idea why first rows are formatted like that, I suggest I just need to add sort of header to the .csv (like 'time, low, open, close, high').

My question is: how do I parse this data correctly, so the bokeh could draw candlestick chart for me?

I've copied this code for candlestick chart from the tutorial, but can hardly read it (I'm only a week into Python, please bear with me). It does exactly everything I need it to, except reading my df:

df["date"] = pd.to_datetime(df["date"])

inc = df.close > df.open
dec = df.open > df.close
w = 12*60*60*1000 # half day in ms

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, title = "MSFT Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3

p.segment(df.date, df.high, df.date, df.low, color="black")
p.vbar(df.date[inc], w, df.open[inc], df.close[inc], fill_color="#D5E1DD", line_color="black")
p.vbar(df.date[dec], w, df.open[dec], df.close[dec], fill_color="#F2583E", line_color="black")

output_file("candlestick.html", title="candlestick.py example")

show(p)  # open a browser

After applying suggested code from the commentary, I found it easier to modify my .csv like this:

2017-05-19 06:22:00,30,30,31,31
2017-05-19 06:38:00,30,30,32,32
2017-05-19 06:53:00,30,30,31,32

After that, I've changed 'date' on 'time', and the code worked as supposed too! Awesome! :D

2

2 Answers

1
votes

The following should fix your first problem. You can set header to None and specify column names so all lines in the csv will be loaded as data.

df = pd.read_csv('D:\\Job\\GoogleDrive\\Job\\chatwars.csv',header=None,names=['time', 'low', 'open', 'close', 'high'])
0
votes

Final code looks like this:

from math import pi

import pandas as pd

from bokeh.plotting import figure, show, output_file
df = pd.read_csv('D:\\Job\\GoogleDrive\\Job\\chatwars.csv',
                 header=None,names=['time', 'low', 'open', 'close', 'high'])

#print(df.time)

df['time'] = pd.to_datetime(df['time'])

inc = df.close > df.open
dec = df.open > df.close
w = 15*60*1000 # 15 minutes ms

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

p = figure(x_axis_type="datetime", tools=TOOLS,
           plot_width=1000, title = "MSFT Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3

p.segment(df.time, df.high, df.time, df.low, color="black")
p.vbar(df.time[inc], w, df.open[inc], df.close[inc],
       fill_color="#D5E1DD", line_color="black")
p.vbar(df.time[dec], w, df.open[dec], df.close[dec],
       fill_color="#F2583E", line_color="black")

output_file("candlestick.html", title="candlestick.py example")

show(p)  # open a browser