4
votes

I have submitted a training job to cloud ml. But, it can't find the csv file. it is there in the bucket. this is the code.

# Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=11, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
    return model
seed = 7
numpy.random.seed(seed)

FIL = "gs://bubbly-hexagon-112008-ml/dataset/mixed.csv"
dataset = numpy.loadtxt(FIL, delimiter=",")
X = dataset[:,0:11]
Y = dataset[:,11]

model = KerasClassifier(build_fn=create_model, verbose=1)
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100, 500, 1000]
param_grid = dict(batch_size=batch_size, nb_epoch=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

after submitting the job i get this error.

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in
run_globals File "/root/.local/lib/python2.7/
site-packages/trainer/task.py", line 18, in <module> dataset = numpy.loadtxt(FIL, delimiter=",") File "/root/.local/lib/python2.7/
site-packages/numpy/lib/npyio.py", line 803, in loadtxt fh = iter(open(fname, 'U')) IOError: [Errno 2] No such file or directory: 
'gs://bubbly-hexagon-112008-ml/dataset/mixed.csv'

-The file is in the specified bucket and its permission includes cloud ml as reader.

-I also used gcloud beta ml init-project to initialize the project.

-And i created a new bucket and put the file in there, but got the same error.

-My bucket is in the same region as my submitted job.

Thanks

3

3 Answers

0
votes

file_io from tensorflow works great:

from tensorflow.python.lib.io import file_io
import numpy as np
import json

To read a numpy array:

with file_io.FileIO(path_npx, 'rb') as f:
    np_arr = np.load( BytesIO(f.read()) )
    print(np_arr)

To read a json file:

with file_io.FileIO(path_json, 'r') as f:
    print(json.loads(f.read()))
0
votes

You can't read directly from gfs like that you need to use some sort of io library.

from io import BytesIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io

FIL = "gs://bubbly-hexagon-112008-ml/dataset/mixed.csv"
f = BytesIO(file_io.read_file_to_string(FIL, binary_mode=True))
data = np.load(f)
-1
votes

I don't think you can read gcs files directly with numpy.