I have the following code. The data set can be downloaded here or here. The data set contains images categorized as cat or dog.
The task of this code is for training cats and dogs image data. So that given a picture, it can tell whether it's cat's or dog. It is motivated by this page. Below is the fully running code:
library(keras)
# Organize dataset --------------------------------------------------------
options(warn = -1)
# Ths input
original_dataset_dir <- "data/kaggle_cats_dogs/original/"
# Create new organized dataset directory ----------------------------------
base_dir <- "data/kaggle_cats_dogs_small/"
dir.create(base_dir)
train_dir <- file.path(base_dir, "train")
dir.create(train_dir)
validation_dir <- file.path(base_dir, "validation")
dir.create(validation_dir)
test_dir <- file.path(base_dir, "test")
dir.create(test_dir)
train_cats_dir <- file.path(train_dir, "cats")
dir.create(train_cats_dir)
train_dogs_dir <- file.path(train_dir, "dogs")
dir.create(train_dogs_dir)
validation_cats_dir <- file.path(validation_dir, "cats")
dir.create(validation_cats_dir)
validation_dogs_dir <- file.path(validation_dir, "dogs")
dir.create(validation_dogs_dir)
test_cats_dir <- file.path(test_dir, "cats")
dir.create(test_cats_dir)
test_dogs_dir <- file.path(test_dir, "dogs")
dir.create(test_dogs_dir)
# Copying files from original dataset to newly created directory
fnames <- paste0("cat.", 1:1000, ".jpg")
dum <- file.copy(file.path(original_dataset_dir, fnames),
file.path(train_cats_dir))
fnames <- paste0("cat.", 1001:1500, ".jpg")
dum <- file.copy(file.path(original_dataset_dir, fnames),
file.path(validation_cats_dir))
fnames <- paste0("cat.", 1501:2000, ".jpg")
dum <- file.copy(file.path(original_dataset_dir, fnames),
file.path(test_cats_dir))
fnames <- paste0("dog.", 1:1000, ".jpg")
dum <- file.copy(file.path(original_dataset_dir, fnames),
file.path(train_dogs_dir))
fnames <- paste0("dog.", 1001:1500, ".jpg")
dum <- file.copy(file.path(original_dataset_dir, fnames),
file.path(validation_dogs_dir))
fnames <- paste0("dog.", 1501:2000, ".jpg")
dum <- file.copy(file.path(original_dataset_dir, fnames),
file.path(test_dogs_dir))
options(warn = -1)
# Making model ------------------------------------------------------------
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
summary(model)
length(model$trainable_weights)
freeze_weights(conv_base)
length(model$trainable_weights)
# Train model -------------------------------------------------------------
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
# Note that the validation data shouldn't be augmented!
test_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(
train_dir, # Target directory
train_datagen, # Data generator
target_size = c(150, 150), # Resizes all images to 150 × 150
batch_size = 20,
class_mode = "binary" # binary_crossentropy loss for binary labels
)
validation_generator <- flow_images_from_directory(
validation_dir,
test_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
# Compile model -----------------------------------------------------------
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
# Evaluate ---------------------------------------------------------------
history <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 30,
validation_data = validation_generator,
validation_steps = 50
)
# Plot --------------------------------------------------------------------
plot(history)
The example above require an external model to be downloaded (convnet trained ImageNet) dataset with VGG16 architecture, to be fully functional.
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
With cat/dog dataset this is reasonable. Because the VGG16 also contain
cats and dogs.
In actuality I am performing the same task classifying two types of biological cell instead of two animals. So the code is suitable except, it it seem's I can't use VGG16 as the model because the content are different.
How can I write the code (reasonable model) without conv_base?
If I still have to use external model, what the reasonable alternatives to that?
ImageNetis from generic image. Not very specific biological data such as cell. - pdubois