Thoughts and Tips


Tensorflow

Installation:

Out of all installation instructions, I found NVIDIA's the most helpful https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/tensorflow/
It includes configuring NVIDIA drivers, configuring CUDA and CUDNN, etc.



Performance:

I ran into a huge decrease in performance after 50 iterations in training a CNN (at least a 33% decline). It turns out this can be a memory allocation issue. This is a known issue and can (in some cases) be remedied by using TCMalloc, which was created by Google. Personally, I found this to make a huge difference.

There's a discussion about it here: https://github.com/tensorflow/tensorflow/issues/3009#issuecomment-235993119

First, install google-perftools
Then, run your code from the terminal preceded by
export LD_PRELOAD="/usr/lib/libtcmalloc.so" like this:
LD_PRELOAD="/usr/lib/libtcmalloc.so" python main.py


I also was able to get a 15% decrease in training time by install from source. The Tensorflow documentation is helpful for this. https://www.tensorflow.org/install/install_sources




Problems I ran into training a CNN from scratch

My prediction accuracy was terrible despite reasonable dev accurate. So I decided to save copies of my images post-processing to disk (after cropping, resizing, flipping, etc). They were just static!
- I didn't think to remove the color channel normalization and just ended up with somewhat corrupted images in inspection. Remember to remove the normalization!

My prediction images (uploaded from my iPhone) were always rotated by 90 degrees!
- It turns out the iPhone camera stored them rotated while they weren't rotated on screen (see Zeiler and Fergus, 2013 for how rotating can impact performance).

My predictions always seemed to lag by about 1 attempted prediction!
- I have a Flask server running the predictions. It receives the image from the iPhone, saves it to disk, and then the Tensorflow graph reads it. It turns out, Tensorflow was prefetching about 1 "batch" or 1 prediction round of images. This was unfortunate as I wanted to reuse my entire preprocessing flow, which has the prefetch in it (and setting prefetch to zero causes the script to seize up). My lazy solution was to just update my X's twice, bypassing the issue. Inefficient, but it works :)