1
votes

It's my first post. Sorry if it looks like a wall of text. Hoping someone can understand my question and provide an example module that can do this, or some code to try. I am working with timeseries csv data, example rows below, the columns are (datetime,O,H,L,C)

1999-10-26 21:00:00 68.81 68.83 68.07 68.19
1999-10-27 21:00:00 68.19 68.2 66.83 67.43
1999-10-28 21:00:00 67.43 68.06 66.91 68.06
1999-10-29 21:00:00 68.06 68.11 66.31 66.66
1999-01-11 22:00:00 66.66 67.15 66.09 66.63
1999-02-11 22:00:00 66.63 67.38 66.42 66.58
1999-03-11 22:00:00 66.58 67.73 66.42 67.48
1999-04-11 22:00:00 67.48 67.81 66.54 66.76
1999-05-11 22:00:00 66.76 68.2 66.54 67.87

Which is what is known as a candlestick in financial price data and can be represented visually like so:

enter image description here

Now, what I am trying to achieve is to automatically plot a line that would cross as many wicks (vertical lines on candlesticks) as possible, while not crossing more than a specified tolerance number of candle bodies (green or red color areas)

Hope this makes sense. Example. The line to cross as many wicks (in this case below the candles) would start on position 4, around 66.4 and have an up slope until position 9 around 66.55...

What I imagine the line to look like based on the picture above

Is there a module that can do this if I provide a starting position? Basically, something like a line of best fit, but actually crossing all the points (a point is a vertical line, so a range on the x axis), while avoiding to cross red/green areas so essentially a trendline (not curved)

If this is too complicated because the line is diagonal, you may as well provide an example for a horizontal line to do the same (just one value from x axis). For example, a value of 66.54 will work too (it will cross all wicks, and touch last two ones) as seen in the data L (4th column) of last two records See yellow line crossing or touching all low wicks

in this case value (x axis) of the starting position candle does not matter and was adjusted too

The line will start at pre-defined points (for example, I may choose position 5 in the image, which is the lowest point) and continue until best result is achieved so it can stop anywhere. I then plan to extend this line using angle value based on these two points.

So the output I am looking for is really just the end position of the line ((x,y) if diagonal trendline) or simply the x axis value (horizontal line). Starting point will be chosen (A) and end point to be based on the algorithm optimisation where the line should cross as many vertical lines of the candles as possible (only looking to the right of the starting point) but keeps the count of crossing the candle body areas below accepted threshold tolerance, until it cannot find a better solution. (B)

I've looked at Bresenham's Line Algorithm and a bunch of others but don't see how exactly to implement this in python on such timeseries data. I wish it was easy like using fbprophet

Real csv files weight up to 500MB and contain up to 6 million rows I would prefer an exhaustive algorithm as opposed to genetic, which brings up different result each time...

I've a few ideas on how to try do this in the meantime, but could not find any appropriate module with examples to speed up the process

If this is still unclear... I want to maximise the number of times the line would cross the vertical lines as seen in the middle of the candles (white lines on the picture) while keeping the number of times it is allowed to cross green/red areas restricted to chosen value as well.

I used pyqtgraph for the visuallisation Example code to make the plot: http://www.pyqtgraph.org/downloads/0.10.0/pyqtgraph-0.10.0-deb/pyqtgraph-0.10.0/examples/customGraphicsItem.py

1

1 Answers

0
votes

If I understand correctly, you want to maximize the number of intersections of an oblique line with a set of equally spaced vertical line segments (the allowance/disallowance of crossing candlesticks makes the question a little fuzzy).

And if I also understand correctly, there can be million segments at a time.

If the slope of the line is fixed, let m, you can shear space in such a way that the line becomes horizontal and the endpoints are translated vertically by amount m.i for the segment i (assuming a unit spacing).

Then, sorting the endpoints by increasing ordinate, you can easily obtain the number of horizontally overlapping segments as you translate the line. This takes N Log N operations for the sort, and N operations to construct the counting function.

Now for any slope, you can obtain the maximum number of possible crossings. If this function is smooth enough, you can search for the maximum of this function by the Golden ratio method or similar.