2
votes

I have already learn about some classification using CNN like for Mnist. But recently I received a dataset which is consist of a vector set. The normal image dataset(mnist) is like nxcxwxh. The one I received is (w*h)x1xc. The goal is to train a network to classify these pixels(as I understand,is classification for pixels). The label length is groundtruth picture.

I am a little confused about this work. As I understand, for Image processing, we use the CNN with different receiption feld to make the convolution operations so that a feature can be obtained to represent the image. But in this case, the image is already expand to pixel-set. Why is the convolutional neural network still suitable?

Still I am not sure about the work but I started to try.I used the 1d convolution instead of 2d in the network. After 4-Conv1d,the output is connected to a softmax layer then fed to crossentropy loss function. It seems, I have some problems with the output dimensions so the network is not able to train.

I use pytorch to implement the work. Below is the network form I try to build. The dimensions do not match those need for crossentropyloss function. 122500 was set to be the sample numbers. So I think the convolution was performed along the 1-200 directions.

First I want to know,is that right to implement like this using conv1d when I want to classify the pixels?

If this thought was right, how can I continue to feed the features to the loss function?

If this is wrong,can I have some similar examples for this kind of work? I am new to python, so if there were some stupid mistakes, pls point out.

Thanks all.

class network(nn.Module):
"""
    Building network

"""
def __init__(self):
    super(network, self).__init__()
    self.conv1 = nn.Conv1d(in_channels = 1,out_channels = 32,stride = 1,kernel_size = 3)
    self.conv2 = nn.Conv1d(in_channels = 32,out_channels = 64,stride = 1,kernel_size = 3)
    self.conv3 = nn.Conv1d(in_channels = 64,out_channels = 128,stride = 1,kernel_size = 3)
    self.conv4 = nn.Conv1d(in_channels = 128,out_channels = 256,stride = 1,kernel_size = 3)
    self.fc = nn.Linear(13, 2)

def forward(self,s):
    s = self.conv1(s)
    s = F.relu(F.max_pool1d(s, 2)) 
    s = self.conv2(s)
    s = F.relu(F.max_pool1d(s, 2)) 
    s = self.conv3(s)
    s = F.relu(F.max_pool1d(s, 2)) 
    s = self.conv4(s)
    s = F.relu(F.max_pool1d(s, 2)) 

    s = self.fc(s)

    s = F.softmax(s,1)


output = model(input)
loss = loss_fn(output, labels)
1

1 Answers

1
votes

I guess what you're supposed to do is image segmentation and in the shape of the labels you got, the last dimension of 200 corresponds to 200 possible categories of pixels (that sounds like a lot to me, but without more context I cannot judge). The problem of image segmentation is way too broad to explain in an SO answer, but I suggest you check resources such as this tutorial and check out the influential papers in this field.