Short overview of problems with AI model deployment and pain points moving Deep Learning models into production environment.
For the last two years I had a chance to spoke with almost 50 developers and executives working in the field of AI with one mission - get their perspective on problems with AI model deployment in production. And despite the fact that it was a great opportunity to talk to so inspiring people, I also learned a lot. Therefore, I would like to share some treasured insights in the following article. I hope you enjoy it.
Two years ago I graduated at Imperial College London. At that time, I had a chance to be a part of BICI Lab working on advanced deep learning algorithms for healthcare. Getting plausible final results I decided to share my work with business people in the form of a sample app where users could test the solution using their own data, not just read the paper, or a code (which even for people in the field of AI was hard to understand). Unfortunately, it occurs to be not an easy task and I when even for my friends working for big companies it was a challenge to make it happen.
Since that time I decided to pursue the passion of fast and easy AI model deployment. Investigating different solutions and approaches I tried to reach people dealing with this topic daily.
Some of you might ask what exactly is AI model deployment? To clarify, “AI model deployment” is a process that starts when a model is trained and ends up with a model implemented in some mobile app or a webpage. Behind this process, there is a lot of iterative and very often arduous work.
Google engineers have pointed out in one of their papers that the ML code in only a small fraction of the whole ecosystem. The process of deployment plays a big role. It consists of many components like resource management, system monitoring, versioning, serving infrastructure.
From the architecture point of view one of the biggest problems that AI engineers face nowadays is the maintenance of models running in the production environment. Collecting events about behavior is one thing but extracting valuable information from it is still a tricky part. Sometimes it is hard to keep models up to date with business assumptions.
Since Machine Learning workflow is a fast-paced environment, versioning of models and datasets causes a lot of effort. Especially in a situation when we need to restore some previous configuration. Keeping up to date all versions clearly and understandably is a key to perform fast actions.
When it comes to algorithms, the next hard part is model optimization. Many popular Machine Learning and Data Science frameworks, like sci-kit-learn are simply not prepared for the production environment. Engineers need to spend a lot of time understanding the code created by scientists and optimize each part according to requirements.
It is also important to mention that most of those frameworks are written in python in which multithreading is not multithreading, due to the GIL (for heavy tech details please visit the following article). This problem is called parallelism. All those models very often end up rewritten to C language or some hardware-specific language, only weights left the same.
Going further into hardware there is a problem with hardware compatibility when we want to convert our model from a specific framework to edge or mobile. Lack of hardware-specific libraries slows down product delivery sometimes even for a couple of weeks.
Last but not least, from the business point of view there is a lack of Machine Learning Operations (MLOps). The people who stand between the Machine Learning team and operational engineers. People able to manage the production lifecycle and understand distributed systems. Because Machine Learning is emerging in practically all areas of our lives there will be growing demand for Engineers who specialize in Machine-Learning-To-Production areas.
The cost structure provided by many platforms is not clear and understandable. It is very hard to predict and plan expenditure. Sometimes even simple estimation causes a lot of problems, which causes many platforms to lose customers.
The more options have a platform, the better documented it should be to give a full overview of all possibilities and advantages over the competition. However, here we also have a gap — a weak code documentation. There is lack of tutorials, examples, and potential use cases with a short explanation.
On the market, there is a lack of platforms able to deploy models for mobile applications, where hardware specification plays an important role, since models are running on the device not on the cloud. Many platforms provide a lot of tools for general model deployment, however, only a couple of them are domain-specific platforms e.g for like healthcare or energy sector.
Popular optimization-for-hardware libraries can handle only a small fraction of popular frameworks, which is inconvenient when we deal with big infrastructures and models.
It is also important that switching from one platform to another also plays a big role. We have to always keep in mind how much time it will take to find a new tool vs how much time it will take to transfer the current solution to the new tool vs how much time do we have to the project deadline vs whether the new tool has all features we need.
All those remarks play an important role. However, we have to remember that at the end of the day for the customer the most important thing is the fact that it works. Speed and efficiency are no guarantee of success so it’s important to deliver.
To sum up we might ask, Is the whole AI model deployment process a problem — a matter of doubt and uncertainty, or a challenge — an opportunity for success and growth?
In my opinion, it is a challenge worth exploring more in detail. My final thoughts are as follows.
Currently, there is a small concern about who should be responsible for deployments and later maintenance. In most cases, ML Engineers / AI developers are not responsible for AI model deployment. Backend engineers are the ones who deal with it.
Future platforms have to be transparent in costs and able to version the model. There is a need for the software able to optimize models for specific hardware e.g. mobiles, edge, or websites.
Finally, led by inspirational Leonardo’s da Vinci quote “simplicity is the ultimate sophistication” I need to admit that there is also a place for simple and easy-to-use tools centered around one particular field of industry or task allowing fast prototyping and testing.
That is what inspired us to build Syndicai - a platform that solves problems with ai model deployment.