Neural Network Normalization of Nominal Data for 1 Output Neuron

Question

I am new to machine learning and AI and started with NN recently.

Already got some information here on stackoverflow, but I don't understand the logic from the whole gathered information at the moment.

Let's take 4 nominal (but not ordinal) values [A, B, C, D] and 2 numericals already normalized [0.35, 0.55] - so 2 input neurons, one for nominal one for numerical. I mostly see in NN literature you have to use 4 input neurons for encoding. But I don't need it to predict those nominal ones. I have only one output neuron that represents at most a relationship in the way if I would use it with expert systems and rules.

If I would normalize them to [0.2, 0.4, 0.6, 0.8] for example, isn't the NN able to distinguish between them? For the NN it's only a number, isn't it?

Naive approach and thinking:

A with 0.35 numerical leads to ideal 1.
B with 0.55 numerical leads to ideal 0.
C with 0.35 numerical leads to ideal 0.
D with 0.55 numerical leads to ideal 1.

Is there a mistake in my way of thinking about this approach?

Additional info (edit): Those nominal values are included in decision making (significance if measured with statistics tools by combining with the numerical values), depends if they are true or not. I know they can be encoded binary, but the list of nominal values is a litte bit larger.

Other example:

Symptom A with blood test 1 leads to diagnosis X (the ideal) Symptom B with blood test 1 leads to diagnosys Y (the ideal)

Actually expert systems are used. Symptoms are nominal values, but in combination with the blood test value you get the diagnosis. The main question finally: Do I have to encode symptoms in binary way or can I replace symptoms with numbers? If I can't replace it with numbers, why binary representation is the only way in usage of a NN?

Please be more clear about the structure of your network. First you write "so 2 input neurons" and then "that those 4 input neurons". So which one is it? How many layers? And what's your problem precisely? Yes, inputs on neurons are just numbers, like everything in a computer, but I'm guessing that that's not exactly your problem. — BartoszKP
Sorry, now more clear I hope. I mean that if you want encode 4 nominal values, you should encode it in binary way. In literature I see that in a one-of-n normalization for nominal data you want to identify a pattern based on the input nominal data (but encoded binary, more input neurons needed as nominal data entries grow). 3 Layers used (I-H-O), but I need to know the theory behind nominal data and if it's still possible to encode nominal data (not ordinal) in a single input neuron. — Duplex
I really don't get your "nominal/ordinal" data confusion. It really doesn't matter for the NN how do YOU interpret inputs, so ordinal data doesn't make any sense in this context. Either you have nominal data and encode it into a binary form to solve a classification task or you have numerical data to solve a classification or prediction task. — BartoszKP
Well, is it possible to use nominal data to do prediction tasks where output neuron is numerical? — Duplex
Not sure if we define "prediction" in the same way. I've meant: en.wikipedia.org/wiki/Time_series_prediction. Sure, you can try putting nominal data and aim to predict future values. — BartoszKP

BartoszKP BartoszKP · Accepted Answer · 2013-10-02T08:39:39

INPUTS

Theoretically it doesn't really matter how do you encode your inputs. As long as different samples will be represented by different points in the input space it is possible to separate them with a line - and that what's the input layer (if it's linear) is doing - it combines the inputs linearly. However, the way the data is laid out in the input space can have huge impact on convergence time during learning. A simple way to see this is this: imagine a set of lines crossing the origin in the 2D space. If your data is scattered around the origin, then it is likely that some of these lines will separate data into parts, and few "moves" will be required, especially if the data is linearly separable. On the other hand, if your input data is dense and far from the origin, then most of initial input discrimination lines won't even "hit" the data. So it will require a large number of weight updates to reach the data, and the large amount of precise steps to "cut" it into initial categories.

OUTPUTS

If you have categories then encoding them as binary is quite important. Imagine that you have three categories: A, B and C. If you encode them with two three neurons as 1;0;0, 0;1;0 and 0;0;1 then during learning and later with noisy data a point about which network is "not sure" can end up as 0.5;0.0;0.5 on the output layer. That makes sense, if it is really something conceptually between A and C, but surely not B. If you'd choose one output neuron end encode A, B and C as 1, 2 and 3, then for the same situation the network would give an input of average between 1 and 3 which gives you 2! So the answer would be "definitely B" - clearly wrong!

Reference: ftp://ftp.sas.com/pub/neural/FAQ.html

Neural Network Normalization of Nominal Data for 1 Output Neuron

1 Answers