Foreword This is not a tutorial. It is a description of my first dive in to deep learning with practically no relevant background experience. I am not an expert in deep learning and the following most likely contains errors and misinterpretations. If you find some, please let me know. All the source code can be found from: https://github.com/Miksu82/DigitRecognizer Background I’ve worked as professional software developer for nearly 10 years and most of that time I’ve been developing software for Android and iOS platforms. For a long time I’ve also been interested in data and how to bring valuable information out of the heaps of data the we create every day. During that time I’ve read some machine learning blog posts, watched a few lectures and Youtube videos. More recently I’ve heard more and more about deep learning but until now I haven’t really taken any practical steps to really try out any of the technologies I’ve read about. Around a month ago I found from and finally decided to try out some deep learning stuff. this article /r/programming The Project How to start learning new things? Instead of going through text books, for me the best way to learn is to get my hands dirty (I usually turn to text books and documentation after I hit a problem or I have something working but no idea why it works). And to get started I needed to choose a project. I had two main requirements for my project I need to be able to train my deep learning model with my 2 years old Macbook Pro in reasonable time. I want to try out the model in a real world situation and not just trust the training/test data split The first requirement ruled out huge data sets and the second requirement ruled out all the data sets where I could not easily generate new data. I had already heard about the data set and thought it would be good candidate since the images are very small and it is easy to write an Android app to draw to the screen. Then I also found about that makes it possible to import (more about that later) models to Java. MNIST Deeplearning4j Keras I thought I was ready to start my handwritten digit recogniser app for Android. Setting up the environment First things first. I needed to set up my machine with some deep learning frameworks. As I already mentioned I started the whole thing by reading the blog posts. From there I learned about how to set up a and I decided to use that. By cloning the Github repo mentioned in the link above and following the instructions I had my environment setup in no time without any glitches. Unfortunately as explained in the instructions I couldn’t use the GPU to train my models but I figured it wouldn’t matter at this point. Learning AI if you suck at math deep learning environment by using a Docker image Building the model The examples in blog posts were using to build the models so I decided to go with that as well. Keras is an abstraction layer for and frameworks that makes it easier to describe the layers of a deep learning network. After describing the layers the model is built by using either or as a backend. Keras also supports persisting the models to a file and can export those models to be used in a Java app. Thus it seemed to be a really good fit. Learning AI if you suck at math Keras Tensorflow Theano Tensorflow Theano Deeplearning4j Learning to use Keras Before starting with MNIST data I wanted to have some kind of idea how Keras is used. Again I fired up Google to see what I could find. I came up with article that seemed like a really good start. After few hours of trying to figure out will a Pima indian have diabetes or not I decided it was time to try out the MNIST data. this Training with MNIST data I was pretty sure that if I Googled “MNIST Keras” I would find tons of examples describing how to train a model to recognise handwritten digits. I think the examples in Keras repo also has that. But that would have been like cheating so I decided to try something else. I already had played around with a model that had a binary outcome. A person could either have a diabetes or not. But in MNIST case I could have 10 different outputs (a number between 0–9). Obviously a very different problem. From the same site that had the first tutorial I also found a . I tried to follow that but couldn’t get anywhere. The reason is that in the tutorial the input data are 1x4 vectors describing the lengths of a different parts of Iris flower whereas in MNIST the input data are 28x28 images. In hindsight this is obvious but when I first started playing around with the tutorial it wasn’t. Multi-Class Classification tutorial So back to square one. How to get started without finding the whole answer. In the end the problem is about classifying images and I remembered that blog post had something like that. I checked it out again and of that blog has an example about image classification by using ImageNet data. I decided to use that as my starting point. Learning AI if you suck at math part 5 After dibbling with the input matrices, I got the MNIST to be correct form so that it could be inputted to Keras. I first tried the layers exactly as defined in the example model = Sequential() model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],border_mode='valid',input_shape=input_shape))model.add(Activation('relu'))model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=pool_size))model.add(Dropout(0.25))model.add(Flatten())model.add(Dense(256))model.add(Activation('relu'))model.add(Dropout(0.5))model.add(Dense(nb_classes))model.add(Activation('softmax')) but that didn’t workout too well. I just got something around 10% accuracy. I decided to do what I do when debugging — make things simpler. I started to remove layers. I only left the two convolution layers, pooling layer and the last dense layer. The last dense layer is always needed to limit the amount of different categories which in this case is 10. I really didn’t understand why there are two convolution layers or what is the point of the pooling layer but reading the blog post I kind of got the idea that they are useful. So I had this: model = Sequential()model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],border_mode='valid',input_shape=input_shape))model.add(Activation('relu'))model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=pool_size))model.add(Flatten())model.add(Dense(nb_classes))model.add(Activation('softmax')) but still no success. So I reduced them further (and changed the activation functions as a parameter to function): add model = Sequential()model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],border_mode='valid',input_shape=input_shapeactivation='relu'))model.add(MaxPooling2D(pool_size=pool_size))model.add(Flatten())model.add(Dense(nb_classes), activation='softmax') …and no success. I still got only around 10% accuracy for the test data. So I started to mess with different parameters like kernel size, pool size, number of filters (I still have no idea what those really mean) but nothing improved the accuracy. Until I changed the first activation function to I had accuracy_._ I have no idea why works for this data and does not but that is something I need to figure out in the future_._ Now I finally had my layers all setup. sigmoid. 97% sigmoid rectifier model = Sequential()model.add(Convolution2D(nb_filters,kernel_size[0],kernel_size[1],border_mode='valid',input_shape=input_shape,activation='sigmoid'))model.add(MaxPooling2D(pool_size=pool_size))model.add(Flatten())model.add(Dense(nb_classes, activation='softmax')) Now I could train the model. I had previously used only a subset of the training data and just few epochs to train the model, because using all the training data just took too long. To train the model with all the 60000 training images in MNIST data set and 20 epochs takes around 16 minutes in my early 2015 Macbook Pro using only the CPU. If someone knows how much it would improve if I used the GPU please add a comment. Importing the model to Android Now that I had my model trained it was time to load it to Android and draw some digits. I first implemented the drawing code which was pretty straightforward with my experience and couple examples from Stack Overflow. To use Deeplearning4j in Android I followed . After battling with configuring support for the Android project (for some reason I needed to add this tutorial multidex compile ‘com.android.support:multidex:1.0.1’ to my dependencies although the docs say it is not necessary if is 21 or above) I finally got everything to build. But when I got the project running the Deeplearning4j code threw this exception: minSdkVersion java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file "/data/app/com.kaamos.digitdetector-1/base.apk"],nativeLibraryDirectories=[/data/app/com.kaamos.digitdetector-1/lib/arm, /data/app/com.kaamos.digitdetector-1/base.apk!/lib/armeabi, /vendor/lib, /system/lib]]] couldn't find "libjnihdf5.so" Hmm… what is that? I started Googling but couldn’t find anything relevant. I tried the from the tutorial and it worked just fine. The only difference is that in my project I was using the which wasn’t in the example project that I was using as reference. I dug into Keras and Deeplearning4j documentation and realised that Keras saves the models to HDF5 format which seems to be related to exception I was seeing. Deeplearning4j must be using some library to read HDF5 format and that library is missing. Now I just needed to find that library and try to compile it to Android. example project Deeplearning4j Keras model import library So I checked all the dependencies I had in my Android project by running ./gradlew app:dependencies The resulting list was huge but what caught my eye was this ||+--- org.bytedeco.javacpp-presets:hdf5-platform:1.10.0-patch1-1.3||| \--- org.bytedeco.javacpp-presets:hdf5:1.10.0-patch1-1.3||| \--- org.bytedeco:javacpp:1.3 -> 1.3.2 It looks like an artifact with group id is handling the HDF5 files. Back to Google and I found the and very good instructions how to build the HDF5 library to Android. So I cloned the repo and followed the instructions and got the following error: org.bytedeco.javacpp-presets Github repo of that project Error: Platform "android-arm" is not supported Huh? Again back to Google but I couldn’t find anything. Then I tried to find from where that string is printed in Which led to me to find this comment from build scripts: org.bytedeco.javacpp. org.bytedeco.javacpp # HDF5 does not currently support cross-compiling:# https://support.hdfgroup.org/HDF5/faq/compile.html Okay so no luck for this working in Android. I decided to create plain Java app with Swing. Importing the model to Java This was fairly straight forward. I just had to learn a bit how to use Swing which felt really weird after years of building UIs with Android and iOS but I managed to get something decent together. The first algorithm did these steps: draw a digit find the edges of the digit add 5 white pixels of padding to the larger dimension (width or height) add white pixels to the shorter dimension so that the image is square scale to 28x28 image, because that is the input size the model expects and the results were horrible. Number 1 was usually recognised but everything else was just recognised wrong. I started to think what is wrong. Maybe the input digit line thickness is not as in the training data The training images are greyscale while the images from the app are black-and-white. I tried to fix those with different ways (experimenting with different line thicknesses, trying to draw the digit with different kind of gradients, changing the training data to black-and-white, etc.) but nothing helped. So back to programmer’s favourite friends. Google and Stack Overflow. I found . It had a quote from the official MNIST documentation: this article The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field. After changing my algorithm to: draw a digit find the edges of the digit scale to 20x20 while maintaining the aspect ration calculate center of mass input the 20x20 image to 28x28 background so the center of mass is in the middle. … and voilà. I got correct digit recognition almost every time. Until I let my girlfriend test the app. Somehow the number 2 she draws is often recognised as 3 or 7. No idea why is that. Next steps I was fairly satisfied with my first attempt with deep learning. I learned quite a lot. Especially that the input data must be constructed exactly (and not almost) as the training data and that the deep learning algorithms are very sensitive to small disruptions in the data that humans can’t recognise. . This also seems to be an active research area Next I hope I will find time to try to understand what my training algorithm actually does. Almost every line is still some what a black box to me so I think it is time to get back to the mathematics text books I haven’t opened since graduating from the university. Also all the recommendations are appreciated… is how hackers start their afternoons. We’re a part of the family. We are now and happy to opportunities. Hacker Noon @AMI accepting submissions discuss advertising & sponsorship To learn more, , , or simply, read our about page like/message us on Facebook tweet/DM @HackerNoon. If you enjoyed this story, we recommend reading our and . Until next time, don’t take the realities of the world for granted! latest tech stories trending tech stories