High Loss Validation

Question

I use the following model to predict the competition: "intel-mobileodt-cervical-cancer-screening". The labels are divided into 3 categories(1 ,2 ,3).

When I want to do a prediction workout I get the next output

Model:

resnet50 = pretrainedmodels.__dict__["resnet50"](num_classes=1000, pretrained='imagenet')
resnet50.last_linear=torch.nn.Linear(in_features=2048,out_features=3, bias=True)
#optim 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model=model.to(device)
loss_cross=torch.nn.CrossEntropyLoss().cuda()

optim Adam

DataLoader:

class Kaggle_Cancer(Dataset):
  def __init__(self, root_path, transform=None,preprocessing=None,resize=216):
    self.path = root_path
    self.transform=transform
    self.preprocessing=preprocessing
    self.resize=resize

  def __len__(self):
      return len(self.path)


  def __getitem__(self, idx):
    p=self.path[idx]
    image1=cv2.imread(p)
    label=p.split("/")[-2].split("_")[-1]
    image1=cv2.cvtColor(image1,cv2.COLOR_BGR2RGB)
    if self.transform:
      image1=self.transform(image=image1)['image']
    image1=transforms.ToPILImage()(image1)
    image1=transforms.ToTensor()(image1)
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
    image1=normalize(image1)
    return image1,int(label)

Train :

def train(epoch_number,model,optim,loss):
  model.train()
  all_loss=0
  correct=0
  tqdm_loader=tqdm(training_set)
  for  index,(img,target) in enumerate(tqdm_loader):
    img=img.float().cuda()
    target=target.long().cuda()

    optim.zero_grad()
    out=model(img)
    print(out," target ",target)
    loss1=loss(out,target)
    print(loss1)
    loss1.backward()
    optim.step()
    all_loss+=loss1.item()
    avg_loss=all_loss/(index+1)
    pred=out.argmax(dim=1,keepdim=True)
    correct+=pred.eq(target.view_as(pred)).sum().item()/len(target)

    avg_acc=correct/(index+1)
    tqdm_loader.set_description("Epoch {} train loss={:4}  acc={:4} ".format(epoch_number,round(avg_loss,4),round(avg_acc,4)))

  return avg_loss,avg_acc

output:

print(out," target ",target)

,
        [ 6.1667e-02, -3.9864e-01, -4.1212e-01],
        [-2.3100e-01, -3.7821e-01, -2.8159e-01],
        [-2.9442e-01, -5.0409e-01, -3.1046e-01],
        [ 1.4866e-01, -2.8496e-01, -1.7643e-01],
        [-2.4554e-01, -2.5063e-01, -6.7061e-01],
        [-7.1597e-02, -3.5376e-01, -5.7830e-01],
        [-2.1527e-01, -4.0284e-01, -4.5993e-01],
        [ 1.2050e-02, -5.5684e-01, -1.6044e-01],
        [-3.7750e-02, -5.3680e-01, -4.3820e-01],
        [-1.1966e-01, -2.5146e-01, -4.9405e-01],
        [-2.3308e-01, -6.3452e-01, -3.9821e-01],
        [-3.6530e-01, -1.5242e-01, -2.6457e-01],
        [-1.8864e-01, -6.0979e-01, -5.5342e-01],
        [-2.4755e-01, -4.7011e-01, -2.6204e-01],
        [-3.1907e-01, -4.2680e-01, -3.4576e-01],
        [-2.1872e-01, -5.3857e-01, -2.9729e-01],
        [-7.1475e-02, -4.0458e-01, -3.2042e-01],
        [-2.8925e-01, -4.3376e-02, -4.9899e-01],
        [-4.8227e-02, -1.8701e-01, -2.2106e-01],
        [ 1.7829e-02, -6.5816e-01, -4.0141e-01],
        [-2.7450e-01, -3.9498e-01, -2.3189e-01],
        [-1.8847e-01, -6.8187e-01, -2.0631e-01],
        [-3.5251e-01, -5.3258e-01, -6.3298e-01],
        [-6.5548e-02, -2.5093e-01, -5.4346e-01],
        [ 2.3848e-01, -3.6152e-01, -1.6380e-01],
        [-2.1488e-01, -6.4888e-01, -7.7022e-01],.....



target  tensor([2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 3, 2, 3, 2, 2, 2, 2, 3, 2, 1, 3, 3, 2, 2,
        3, 2, 3, 2, 3, 1, 3, 3, 1, 2, 3, 2, 1, 1, 3, 1, 1, 2, 3, 2, 2, 2, 2, 2,.....

print(loss1)

 1, 2, 3, 3, 1, 3, 1, 3, 3, 2, 3, 3, 2, 3, 2, 3], device='cuda:0'
tensor(1.0870, device='cuda:0', grad_fn=<NllLossBackward>

number epoch =10/20/30:

same result:

val loss=1.2 acc=0.4 train loss=0.6 acc=0.65

What i do wrong?

Its pretty normal to get a lower training loss than validation loss. This is generally an indication of overfitting. Usually this is improved by doing one or more of the following: more training data, different network architecture, change train params (e.g. batch size, # epochs), change optimizer hyperparams (type, learning rate, learning rate schedule, weight decay, etc...), data augmentation, change domain(i.e. map input to a different domain which more effectively distinguishes classes). These are mostly empirical so you need to try lots of things to figure out which is best. — jodag
Ideally someone has already given a good baseline on the dataset/problem you're trying to solve, usually that's a good starting point. — jodag

Victor Sim Victor Sim · Accepted Answer · 2020-06-01T12:47:12

When the validation loss is larger than the training loss, It is usually a sign of overfitting. There area few things you can do:

Add Dropout of Batch Normalisation:

This makes the model more robust.

Make the model deeper:

Add more layers to the model for a better comprehension of the patterns.

Use better optimizers:

Adaptive optimizers such as Adam, Adagrad and RMSprop are usually effective.

High Loss Validation

1 Answers