Loss Function & Its Inputs For Binary Classification PyTorch

Question

I'm trying to write a neural Network for binary classification in PyTorch and I'm confused about the loss function.

I see that BCELoss is a common function specifically geared for binary classification. I also see that an output layer of N outputs for N possible classes is standard for general classification. However, for binary classification it seems like it could be either 1 or 2 outputs.

So, should I have 2 outputs (1 for each label) and then convert my 0/1 training labels into [1,0] and [0,1] arrays, or use something like a sigmoid for a single-variable output?

Here are the relevant snippets of code so you can see:

self.outputs = nn.Linear(NETWORK_WIDTH, 2) # 1 or 2 dimensions?


def forward(self, x):
  # other layers omitted
  x = self.outputs(x)           
  return F.log_softmax(x)  # <<< softmax over multiple vars, sigmoid over one, or other?

criterion = nn.BCELoss() # <<< Is this the right function?

net_out = net(data)
loss = criterion(net_out, target) # <<< Should target be an integer label or 1-hot vector?

Thanks in advance.

MBT MBT · Accepted Answer · 2018-12-05T09:14:50

For binary outputs you can use 1 output unit, so then:

self.outputs = nn.Linear(NETWORK_WIDTH, 1)

Then you use sigmoid activation to map the values of your output unit to a range between 0 and 1 (of course you need to arrange your training data this way too):

def forward(self, x):
    # other layers omitted
    x = self.outputs(x)           
    return torch.sigmoid(x)

Finally you can use the torch.nn.BCELoss:

criterion = nn.BCELoss()

net_out = net(data)
loss = criterion(net_out, target)

This should work fine for you.

You can also use torch.nn.BCEWithLogitsLoss, this loss function already includes the sigmoid function so you could leave it out in your forward.

If you, want to use 2 output units, this is also possible. But then you need to use torch.nn.CrossEntropyLoss instead of BCELoss. The Softmax activation is already included in this loss function.

Edit: I just want to emphasize that there is a real difference in doing so. Using 2 output units gives you twice as many weights compared to using 1 output unit.. So these two alternatives are not equivalent.

Loss Function & Its Inputs For Binary Classification PyTorch

2 Answers