0
votes

I am building and training a CNN for a binary classification task. I have extracted images (frames) from a labelled video-database I had. The database claims to have videos recorded via active IR Illumination. The frames I have extracted as images, have 3 channel information.

The resulting trained algorithm (CNN model) would be deployed over an embedded board, which would take video feeds from a standard RGB usb-camera and would work on a frame level basis over the video-feed.

Question PART-1: Now correct me if I am wrong, but I am concerned - since my knowledge suggests that the data distribution of the active IR illuminated videos would be different than that of a standard RGB feed, would this model perform with an equal precision over the RGB images, for classifying frames?

Note 1: Although the videos in the database look like they are 'greyscale' (due to the visible grey-tone of the video, maybe due to active IR illumination) in nature, upon processing, they were found to be containing 3 channel information.

Note 2: The difference between the values of the per-pixel 3 channel information is considerably higher in normal RGB images, when compared to the images (frames) extracted from the database. For example, in a normal RGB image, if you consider any particular pixel, at random, the values corresponding to the three channels might differ from each other. It may be something like (128, 32, 98) or (34, 209, 173), etc. (Look at the difference between values in the three channels.) In case of the frames extracted from the videos of the database that I have, the values along the three channels of a pixel DO NOT vary as much as they do in case of regular RGB images - It is something along the lines of (112, 117, 109) or (231, 240, 235) or (32, 34, 30), etc. I am supposing this is due to the fact that the videos are in general grey-ish, for reference - similar to a black and white filter, but not exactly black and white.

Question PART-2: Would it be fair to convert RGB images into grey-scale and duplicating the single channel twice to essentially make it a three channel image?

1

1 Answers

0
votes

Part 1: the neural net will perform best with the more contrasted channels. And training on one type of image will perform poorly on the other type.

Part 2: an RGB image is three-channelled. It would be a nonsense to make the channels equal and lose the good information.


Most probably, your IR images are not grayscale, they are packed as an RGB image for viewing. As they are very similar to each other, the colors are very desaturated, i.e. nearly gray.

And sorry to say, capturing three IR channels is of little use.