0
votes

Is there a way to determine number of nodes and hidden layers based on shape of the data? Also, is there a way to determine the best activation function based on the topic?

For example, Im making model for fake news prediction. My features are number of words in text, number of words in title, number of questions, number of capital letters etc. My dataset has 22 features and around 35000 rows. My output should be 0 or 1.

Based on that, how many layers and nodes should I use and what activation functions are the best for this?

This is my net:

model = Sequential()
model.add(Dense(100, input_dim = features.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="mean_squared_error", optimizer=sgd, metrics=['accuracy'])


# call the function to fit to the data training the network)
model.fit(x_train, y_train, epochs = 10, shuffle = True, batch_size=32, validation_data=(x_test, y_test), verbose=1)


scores = model.evaluate(features, results)
print(model.metrics_names[1],  scores[1]*100)
2
I'm trying it right now but auto-keras looks like it helps with this? You can find it on github and look through the examples.Lostsoul
Do you know how can I fix this? I installed autokers with pip, and Im using jupyter notebooktaga

2 Answers

1
votes

Selecting those requires prior experience, otherwise we won't need that much ML Engineers trying different architectures and writing papers.

But for a start I would recommend you take a look at autokeras, It will help with your problem as it's kind of a known problem -Text Classification-, you only need to structure your data as input(X and Y) and then feed that to their Text Classifier which will try different models(You could specify that) to choose the best fitting for your case.

You could find more examples in the docs here https://autokeras.com/tutorial/text_classification/

import autokeras as ak

# Initialize the text classifier.
clf = ak.TextClassifier(max_trials=10) # It tries 10 different models
# Feed the text classifier with training data.
clf.fit(x_train, y_train)
# Predict with the best model.
predicted_y = clf.predict(x_test)
# Evaluate the best model with testing data.
print(clf.evaluate(x_test, y_test))
0
votes

Answer is no and no.

Well these are also hyperparameters. You can select a bunch of them and try all of them to get a rough idea of which is giving you the best result. Yes the same statement holds for activation function as well.

You can use more layers than you need and then use regularization to stop producing an overfitted model. Also if it is too less you can clearly understand the underfitting behavior from the loss curve giving high training error.

There is no formula for determining all these. You have to try different things based on the problem at hand and you will see some of it would work better than the others.

For output softmax layer would be good as this will give you a probability of predictions which you can easily convert to one-hot encoding.