I know that usually you don't have local minima in the error surface using a perceptron (no hidden layers) with linear output. But is it possible to get stuck in local minima with a perceptron using a sigmoid function since it is not linear? I'm using the functions.MultilayerPerceptron in WEKA (uses a sigmoid activation function and Backpropagation) with no hidden layers. I train it on a linearly separable dataset with 4 different classes. When I change the seed for the random generator (used for the initial weights of the nodes) most of the time it classifies only 60% right (it doesn't fully learn the target concept). But I found a specific seed where it classifies 90% right (which is the optimum). I already played with momentum, training time and learning rate but it doesn't change anything. It seems like it gets stuck in a local minimum.. or what else could be the explanation?
I'm thankful for any help