It's my first post. Sorry if it looks like a wall of text. Hoping someone can understand my question and provide an example module that can do this, or some code to try. I am working with timeseries csv data, example rows below, the columns are (datetime,O,H,L,C)
1999-10-26 21:00:00 68.81 68.83 68.07 68.19
1999-10-27 21:00:00 68.19 68.2 66.83 67.43
1999-10-28 21:00:00 67.43 68.06 66.91 68.06
1999-10-29 21:00:00 68.06 68.11 66.31 66.66
1999-01-11 22:00:00 66.66 67.15 66.09 66.63
1999-02-11 22:00:00 66.63 67.38 66.42 66.58
1999-03-11 22:00:00 66.58 67.73 66.42 67.48
1999-04-11 22:00:00 67.48 67.81 66.54 66.76
1999-05-11 22:00:00 66.76 68.2 66.54 67.87
Which is what is known as a candlestick in financial price data and can be represented visually like so:
Now, what I am trying to achieve is to automatically plot a line that would cross as many wicks (vertical lines on candlesticks) as possible, while not crossing more than a specified tolerance number of candle bodies (green or red color areas)
Hope this makes sense. Example. The line to cross as many wicks (in this case below the candles) would start on position 4, around 66.4 and have an up slope until position 9 around 66.55...
What I imagine the line to look like based on the picture above
Is there a module that can do this if I provide a starting position? Basically, something like a line of best fit, but actually crossing all the points (a point is a vertical line, so a range on the x axis), while avoiding to cross red/green areas so essentially a trendline (not curved)
If this is too complicated because the line is diagonal, you may as well provide an example for a horizontal line to do the same (just one value from x axis). For example, a value of 66.54 will work too (it will cross all wicks, and touch last two ones) as seen in the data L (4th column) of last two records See yellow line crossing or touching all low wicks
in this case value (x axis) of the starting position candle does not matter and was adjusted too
The line will start at pre-defined points (for example, I may choose position 5 in the image, which is the lowest point) and continue until best result is achieved so it can stop anywhere. I then plan to extend this line using angle value based on these two points.
So the output I am looking for is really just the end position of the line ((x,y) if diagonal trendline) or simply the x axis value (horizontal line). Starting point will be chosen (A) and end point to be based on the algorithm optimisation where the line should cross as many vertical lines of the candles as possible (only looking to the right of the starting point) but keeps the count of crossing the candle body areas below accepted threshold tolerance, until it cannot find a better solution. (B)
I've looked at Bresenham's Line Algorithm and a bunch of others but don't see how exactly to implement this in python on such timeseries data. I wish it was easy like using fbprophet
Real csv files weight up to 500MB and contain up to 6 million rows I would prefer an exhaustive algorithm as opposed to genetic, which brings up different result each time...
I've a few ideas on how to try do this in the meantime, but could not find any appropriate module with examples to speed up the process
If this is still unclear... I want to maximise the number of times the line would cross the vertical lines as seen in the middle of the candles (white lines on the picture) while keeping the number of times it is allowed to cross green/red areas restricted to chosen value as well.
I used pyqtgraph for the visuallisation Example code to make the plot: http://www.pyqtgraph.org/downloads/0.10.0/pyqtgraph-0.10.0-deb/pyqtgraph-0.10.0/examples/customGraphicsItem.py