I am working on a project where I deal with:
- 70,000 JPG images totalling 1 GB
- Each files is ~ 15kb.
- Each image is 424x424.
My current solution for working with these files is to take each image, crop it to 150x150 and then saving it in a NumPy memmap array. I end up with 1 large memmap array file with dimensions 70,000 x 150 x 150 x 3 (coloured images).
My next step is to loop through this memmap array and randomly sample patches of image. However, my code is running very slowly at the moment and most annoyingly, it only uses about 10% of CPU with a HD read speed of 1-5 MB / sec. This is probably even lower than not pre-computing the cropped numpy memmap array and reading the JPG everytime.
What can I do to make better use of my system resources here?
System Information
- Mac OS X
- Macbook Pro with HDD
Thanks!