I Use ExtraTreesClassifier
for training and predicting. I execute the same source code on the same dataset on Windows 10 and Linux Ubuntu 16.04, surprisingly i get a huge difference in the execution time.
The results :
+---------------+-----------+----------+----------+---------+ | Dataset in Mo | Win Train | Win Pred | Ub Train | Ub Pred | +---------------+-----------+----------+----------+---------+ | 430 | 104 | 11 | 2420 | 2019 | +---------------+-----------+----------+----------+---------+ | 530 | 122 | 14 | 2948 | 2162 | +---------------+-----------+----------+----------+---------+ | 699 | 140 | 18 | 3672 | 2500 | +---------------+-----------+----------+----------+---------+
Note: the loading time of the csv file and the creation of the dataFrame is negligible.
The source code:
import time
import pandas as pd
import datatable as dt
import numpy as np
from sklearn.ensemble import ExtraTreesClassifier
def __init__(self):
self.ExTrCl = ExtraTreesClassifier()
def train_with_dt(self, csv_file_path):
start_0_time = time.time()
data_arn = dt.fread(csv_file_path)
end_time = time.time()
print(" time Read_csv file : ",end_time-start_0_time," s")
data_classe = np.ravel(data_arn[:,"familyId"])
del data_arn[:,"familyId"]
start_time_train = time.time()
self.ExTrCl.fit(data_arn, data_classe)
end_time = time.time()
print(" train only time : ",end_time-start_time_train, " s")
def test_groupe_score_dt(self, test_matrix, list_classes):
start_0_time = time.time()
dt_dftest = dt.Frame(np.array(test_matrix),names=self.list_motifs)
end_time = time.time()
print(" time creatind Fram dt = ",end_time-start_0_time)
result = self.ExTrCl.predict(dt_dftest)
end_time = time.time()
print(" Time pred = ",end_time-start_0_time," s")
The OS information and the library version used are in the table below. I update all the used library.
+---------------------------------------+-------------------------------------------+ | Windows 10 | Ubuntu 16.04 | | Intel i7-8550U CPU @ 1.80Ghz 1.99Ghz | Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz | | cpu cores : 4 | cpu cores : 1 | | 64 bit OS | 64 bit OS | | RAM 16 Go | RAM 1007 Go | +---------------------------------------+-------------------------------------------+ | Python 3.7.7 | Python 3.5.2 | | ----------------- | ------------- | | biopython==1.77 | biopython==1.73 | | datatable==0.11.0a0+pr2536.12 | datatable==0.10.1 | | numpy==1.19.0 | numpy==1.18.5 | | pandas==1.0.5 | pandas==0.24.2 | | pyahocorasick==1.4.0 | pyahocorasick==1.4.0 | | scikit-learn==0.23.1 | scikit-learn==0.22.2.post1 | | scipy==1.5.0 | scipy==1.4.1 | | suffix-trees==0.3.0 | suffix-trees==0.3.0 | +---------------------------------------+-------------------------------------------+
using cprofile :
1619734 function calls (1589052 primitive calls) in 6495.451 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 4828 6248.349 1.294 6248.349 1.294 {built-in method numpy.core.multiarray.array} 100 130.458 1.305 130.458 1.305 {method 'build' of 'sklearn.tree._tree.DepthFirstTreeBuilder' objects} 1 48.288 48.288 48.288 48.288 {built-in method datatable.lib._datatable.gread} 2 21.834 10.917 25.749 12.874 Main.py:40(get_matrix_nbOcrrs_listStr_AhoCorasick) 2 20.747 10.374 2570.626 1285.313 model.py:233(test_groupe_score_dt) 4365 6.476 0.001 6.476 0.001 {method 'reduce' of 'numpy.ufunc' objects} 1 5.851 5.851 6492.121 6492.121 Main.py:309(main) 6710 3.705 0.001 3.705 0.001 {method 'copy' of 'list' objects} 400 2.548 0.006 2.548 0.006 {method 'predict' of 'sklearn.tree._tree.Tree' objects} 1 2.288 2.288 6495.453 6495.453 Main.py:1() 1 1.334 1.334 3889.596 3889.596 model.py:189(train_with_dt) 400 0.827 0.002 3.628 0.009 _classes.py:880(predict_proba) 4 0.522 0.131 4936.793 1234.198 _forest.py:591(predict) 400 0.354 0.001 3.982 0.010 _forest.py:442(_accumulate_prediction) 376662 0.150 0.000 0.150 0.000 {method 'add_word' of 'ahocorasick.Automaton' objects} 803 0.120 0.000 0.120 0.000 {built-in method marshal.loads} 2272/2260 0.070 0.000 0.144 0.000 {built-in method builtins.__build_class__} 1081/1 0.069 0.000 6495.453 6495.453 {built-in method builtins.exec} 143/119 0.064 0.000 0.116 0.001 {built-in method _imp.create_dynamic} 2 0.046 0.023 0.046 0.023 {method 'make_automaton' of 'ahocorasick.Automaton' objects} ...etc
Thank you for your help.