(Part 1) Migrate Deep Learning Training onto Mobile Devices!

Recently I’ve started a new project which aims to port training phase of deep convolutional neural networks into the mobile phone. Training neural networks is a hard and time consuming task, and it requires horse power machines to finish a reasonable training phase in a timely manner. Current successful models such as GoogleNet, VGG and Inception are based on tens of convolutional layers. The model is heavy enough that one for sure need large amount of memory and a super power GPU, to be able to train it at least in a day. (Although it still may take up to days to reach a reasonable accuracy.)

The nature of training neural networks almost prevents them from being deployed on embedded and mobile systems. These small systems are based on an SoC architecture with a small size GPU on it. They also have a medium size DRAM, which in combination designed to answer mobile size applications. Let’s mention that mobile devices today are much more powerful, compared to 10 years old PCs. They are also could be considered as a replacement for Laptops or desktops for everyday tasks. But they still cannot afford to perform heavy AI tasks.

Despite all arguments above, having AI capabilities on your mobile phone is a necessity of future applications. We are almost at the end of simple functional applications, and moving toward more intelligent and sophisticated user applications. These applications may need to use statistical machine learning techniques to provide unique functionalities to users. Even right now you can see many AI backed apps on your phone, such as Google assistant, Apple Siri and etc. The philosophy behind these interesting applications is to offload a clean and useful interface onto the user’s device and power all heavy AI tasks in a data center, such that all users inputs would (1) go into the data center, (2) Being processed by the servers, and (3) the output will return back to the user. It seems to be enough, right? Well, that may not be true. Imagine you have purchased an IPhone and you are so excited to use Siri for all your daily tasks. You may find English not being your first language and having some accent. This may make it hard for Siri to fully understand what you are saying, almost all the time. There may be an obvious and easy solution for this problem, which is custom training an AI model for every single person in the cloud. Well, this may bring a whole lots of challenges for the provided. Here are some of the existence challenges:

  • Doing inference per user request is cheap, fast and affordable. It doesn’t need massive amount of computation on the servers. It also would not generate too much heat, which is the #1 problem for big data centers. Training, on the other hand, is expensive and time consuming. It requires the provider to allocate considerable amount of resources for each user, turns out not being cost effective. It also can generate more heat, which increases difficulties in cooling. As a result, running continuous training for each user is not an option for the providers, at least with current technology.
  • Holding every user’s customized model may require large disk storage. Providers need to add more disks, preferably SSDs, in order to hold user’s final model and also relevant snapshots. This will increase cost for every data center.
  • Security is another issue with cloud service providers. Imagine all the data and models for every user being stored in a centralized data center. This makes security issues more challenging. Beside, user specified AI models says a lot about users private information, which makes protection and encryption more sensitive.

As a result, I believe customizing user models on cloud is totally doable, but at a high cost. Having AI capabilities integrated inside the mobile application will reduce operation costs and also bring real-time responsiveness into mobile apps. Unfortunately current mobile systems are not capable of training a network such as inception, locally. So it might not be practical to port AI codes into mobile, as is.

Recently one of my colleagues have came with an idea, which is retraining a new neural network from scratch while receiving mentorship from an already trained network. One can use this technique to retrain a neural network from scratch much easier, compared to non-mentored version. Now what if the new network could maintain a smaller size than the original network, but at the same time be able to represent the same knowledge? This may be a great idea, since it makes it possible to adapt the knowledge of a heavy neural network, while make training easier and faster? This is basically the idea we are going to expand, in order to bring training into mobile phones. Here you can find his paper draft: https://arxiv.org/pdf/1604.08220.pdf

So far there has been lot’s of related work, targeting only already-trained networks, such that you’ll get the model parameters and then apply specific techniques in order to reduce the size of the model. This may be (1) weights pruning and quantization, (2) convert 32 and 64 bit floating point values into 8 bit version, and etc. Unfortunately none of these techniques can helps the training phase. Shrinking model size for training phase can introduce extensive divergence of loss value, and will prevent the model from reaching a reasonable accuracy step-by-step. Our proposed technique can solve this issue. All these techniques so far are predecessor of an idea called Dark Knowledge.

So far I have talked about the problem and why it is important. Now let’s talk more about the technique being described above.

Consider a large Mentor network with n layer. Now consider a smaller Mentee network with m layers. Now we assume the large network is well-trained and stable on a general-enough dataset. We want the smaller network to classify a new dataset which may be less general or as general as Mentor. We will map each layer (filter) of the Mentee network to a filter in Mentor, and we will calculate the error between them, using RMSE (other metrics could be used too). While training the Mentee network, the network not only learn the difference between real and predicted labels, but also tries to adapt almost the same representation of the intermediate Mentor layers. This helps the Mentee not to deviate from mentor knowledge representation and be able to emulate it’s knowledge in a smaller scale. Users can specify the contribution of the final softmax loss and also the intermediate losses, which will control the deviation factor from the Mentor.

I have so far tested the idea on MNIST and VGG16 model and the accuracy numbers are interesting. Mentee being supervised by Mentor network is able to produce much higher accuracy compared to the independent Mentee. Choosing the size of Mentee would definitely affect the performance, but this could be tuned based on the computing limitations and also user’s tolerance over model accuracy.

Here is a schematic of the connection between the Mentor and Mentee network.

 The graph clears out how Mentee is being supervised by Mentor during training session. Later on I will share my code written in TensorFlow, which has more detail about the connection of these two graphs.

Now, how could it solve the mobile issue? Well, you can have a general brain on the cloud which is responsible for learning a really big model, representing a global knowledge. Now you are using this service through your phone and wanted to inject some more information about your usage habits and customize the model for your own needs. You can have a small representation of the model on the phone, and keep training that in the background while receiving supervision from the Mentor model. As a result, cloud service provide can only focus on the global knowledge and your device takes care of your own input data.

I think so far we had enough discussion about the background of the idea. It’s time to get our hands dirty and show how all these are possible with current technology that we have. Next part of this article will discuss about implementation details of the Mentee-Mentor Network.

Leaving to attend EITR Systems

I’m going to work for “EITR systems”, for the whole summer 16′. I would be the first technical person working out there, and I’m really excited to join an startup from scratch, instead of working in well-established big companies. This is an interesting opportunity to learn all different aspects of a baby business, whether it is technical issues or even communication and expenses issues. We are going to develop a unique and cheap kind of product, which is able to easily handle the data replication problem in current data centers. This problem is going to be a bigger one, especially as the 3DxPoint devices are on the way to the market. With the utilization of these new storage devices, the data center community is concerned how to achieve both replication guarantee and high throughput and low latency data access. My goal would be coming up with a final product, which is a complete solution for above mentioned problems.

I’ve used to work for software development companies while I was doing my bachelor and I somehow know how does it feel like to do something simply straightforward for a long time, with no specific innovation and challenge. I may be interesting for some people, but not for me at least.

I have started working on this startup about a year ago, with Dr. Raju Rangaswami. It was so simple, we came up with a funny idea in the Storage Systems class and then I have followed up with him to see how could we extend the idea and get one paper out of it. After several weeks discussions, we came up with a modified version which was interestingly pure and unique. We found ourselves in a situation were we came up with a great solution for data center replication issue. Since then, we have started research about requires hardware and software components and after one year the money successfully been raised. It is sufficient for the whole summer and the mission is to come up with a working prototype at the end of the summer, in order to move to the second stage of fund-raising.

Now that I’m comparing my current work with the work I was doing previously, I can easily say there is a huge difference, in all aspects you can think of. Running an startup and developing something completely new is a big challenge. You need to learn about a lot of stuff and sometimes talking to knowledgeable people in different areas to get straight information. It’s hardly like working in a well-established company with strongly defined objectives. The process is hard enough, that makes a thinker and innovator from you in the future. You will learn not to do the things regular people are doing. You will learn to put the work in the first place, instead of money and profit. For sure, we do all these for huge turnover. But for the beginning your main assignment is to develop something that could have people attention and appreciation.

Right now, I’m geared up to fully dedicated myself for this journey and see what happens next!