As a marvellous intro into this domain, you could try Stuart REID's "10 Misconceptions about Neural Networks".
While a highly general question, these would be my points:
fast learning curve
( having spent countless time with "products" does not get easily justified. Product may quote speed, but beware that they speak about saving miliseconds / seconds in production phase on doing predictions on a well-pre-trained ANN, but your time will be mostly allocated on other activities than this -- first learning and in-depth understanding the product and it's capabilities and weaknesses, then on ANN-model prototyping and most of all, on ANN-Feature-Engineering, then on design-candidate-ANN hyperparametrisation grid-searches for tuning it's best generalisation and CrossValidation properties )
support and tools for rapid prototyping
( once going beyond a scholarly elaborated ANN-mimicking-XOR, the very prototyping phase is the innovative playground and as such, it is very costly both in time and CPU-resources )
support for smart automated feature scaling
( is inevitable for evolutionary ( genetic et al ) search-processing for robust networks, reduced to a just-enough scale ( to achieve computability within acceptable time-frame ) under given precision targets )
- support for automated hyperparametric controls for high-bias / overfitting tuning
- support for both fully-meshed orthogonal, gradient-driven and random processing of "covering" the (hyper-) parametrisation-space
- support for local vectorised processing, means for fast distributed processing ( not a marketing-motivated blabble, but a fair & reasonable architecture ( as GPGPU I/O latencies do not help much on trained networks ( finally it's low-computing intensity task and nothing more than a set of single sumproduct decision, so the high GPU IO-bound latency masking does not cover the immense delays and thus GPU "help" can even become disastrous [quantitative citations available], compared to plain, well configured, CPU-based ANN-computing )
AI/ML NN-Feature Engineering! ...forget about 6:?[:?[:?]]:1
architectures
This is The Key.
Whatever AI/ML-Predictor you choose,
be it an ANN or SVM or even an ensemble-based "weak"-learner, the major issue is not the engine, but the driver -- the predictive powers of the set of Features.
Do not forget, what complexity the FOREX multi-instrument Marketplace exhibits in real-time. Definitely many orders of magnitude more complex than 6:1. And you aspire to create a Predictor
being capable to predict what happens.
How to make it within a reasonable computability cost?
smart tools exist:
feature_selection.RFECV( estimator, # loc_PREDICTOR
step = 1, # remove 1 FEATURE at a time
cv = None, # perform 3-FOLD CrossValidation <-opt. sklearn.cross_validation
scoring = None, # <-opt. aScoreFUN with call-signature aScoreFUN( estimator, X, y )
estimator_params = None, # {}-params for sklearn.grid_search.GridSearchCV()
verbose = 0
)
# Feature ranking with recursive feature elimination
# and cross-validated selection of the best number of features.
|>>> aFeatureImportancesMAP_v4( loc_PREDICTOR, X_mmap )
0. 0.3380673 _ _ _____________________f_O.............RICE_: [216]
1. 0.0147430 _ _ __________________________________f_A...._: [251]
2. 0.0114801 _ _ ___________________f_............ul_5:3_8_: [252]
3. 0.0114482 _ _ ______________________________H......GE_1_: [140]
4. 0.0099676 _ _ ______________________________f_V....m7_4_: [197]
5. 0.0083556 _ _ ______________________________f.......7_3_: [198]
6. 0.0081931 _ _ ________________________f_C...........n_0_: [215]
7. 0.0077556 _ _ ______________________f_Tr..........sm5_4_: [113]
8. 0.0073360 _ _ _____________________________f_R.......an_: [217]
9. 0.0072734 _ _ ______________________f_T............m5_3_: [114]
10. 0.0069267 _ _ ______________________d_M.............0_4_: [ 12]
11. 0.0068423 _ _ ______________________________f_......._1_: [200]
12. 0.0058133 _ _ ______________________________f_......._4_: [201]
13. 0.0054673 _ _ ______________________________f_......._2_: [199]
14. 0.0054481 _ _ ______________________f_................2_: [115]
15. 0.0053673 _ _ _____________________f_.................4_: [129]
16. 0.0050523 _ _ ______________________f_................1_: [116]
17. 0.0048710 _ _ ________________________f_..............1_: [108]
18. 0.0048606 _ _ _____________________f_.................3_: [130]
19. 0.0048357 _ _ ________________________________d_......1_: [211]
20. 0.0048018 _ _ _________________________pc.............1_: [ 86]
21. 0.0047817 _ _ ________________________________d.......3_: [212]
22. 0.0045846 _ _ ___________________f_K..................8_: [260]
23. 0.0045753 _ _ _____________________f_.................2_: [131]
1st.[292]-elements account for 100% Importance Score ________________
1st. [50]-elements account for 60%
1st. [40]-elements account for 56%
1st. [30]-elements account for 53% . . . . . . . . . . . . . . . . .
1st. [20]-elements account for 48%
1st. [10]-elements account for 43%
Precision?
Assembler guys and C gurus will object on first sight, however let me state, that a numerical (im)-precision does not make an issue in FX/ANN solutions.
Dimensionality curse does... O(2)
& O(3)
class problems are not seldom.
Doable with both smart/efficient ( read fast ... ) representation, even for nanosecond resolution time-stamped HFT data-stream I/O hoses.
Sometimes, there is even a need to reduce a numerical "precision" of inputs ( sub-sampling and blurring ) to avoid adverse effects of ( unreasonably computationally expensive ) high dimensionality and also to avoid a tendency to overfitting, to benefit from better generalisation abilities of the well adjusted AI/ML-Predictor.
(im)PRECISION JUST-RIGHT FOR UNCERTAINTY LEVELs MET ON .predict()-s
___:__:___________MANTISSA|
| v | v|_____________________________________________________________________________________
0.001 | 1023| np.float16 Half precision float: 10 bits mantissa + sign bit| 5 bits exp|
1.02? |
v 123456:|_____________________________________________________________________________________
E00 0.000001 8388607| np.float32 Single precision float: 23 bits mantissa + sign bit| 8 bits exp|
+00 12345.6? DAX ^
+05 1.23456? DAX
123456789012345:|_____________________________________________________________________________________
4503599627370495| np.float64 Double precision float: 52 bits mantissa + sign bit| 11 bits exp|
^|
Anyway, a charming FX project, let me know if onboarding:
To receive another, un-biased view about the top-down similar situation to your one, reported from another person, one might want to read and think a bit about this experience and just count the weeks and months estimates for making there presented list mastered top-down and complete to make one's final decision.