Containerizing Machine Learning Model in Docker

Swati Dhoke
5 min readJun 1, 2021

--

In this Machine Learning Model we are going to build a program that will estimate the salary of a person based on their years of experience.

Before starting the model, we should first be familiar with terms like Machine Learning, Docker and Containerization. So,

What is Machine Learning?

Machine learning(ML) is a method of data analysis that automates analytical model building. It involves computers discovering how they can perform tasks without being explicitly programmed to do so. It is a branch of Artificial Intelligence based on the idea that systems can learn from historical data, identify patterns and make decisions with minimal human intervention.

What is Docker?

Docker is the containerization platform which is used to package your application and all its dependencies together in the form of containers so to make sure that your application works seamlessly in any environment which can be development or test or production. Docker is a tool designed to make it easier to create, deploy, run applications by using containers. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.

What is Containerization?

Containerization is OS-based virtualization which creates multiple virtual units in the user space, known as Containers. Containers run on top of the same shared operating system kernel of the underlying host machine. In containers you don’t have to pre-allocate any RAM, it is allocated dynamically during the creation of containers while in VM’s you need to first pre-allocate the memory and then create the virtual machine. Containerization has better resource utilization compared to VMs and a short boot-up process. It is the next evolution in virtualization.

Containers are able to run virtually anywhere, greatly easy development and deployment: on Linux, Windows, and Mac operating systems; on virtual machines or bare metal, on a developer’s machine or in data centers on-premises; and of course, in the public cloud.

Overview of the Task:

  1. Pull the Docker container image of CentOS image from Docker Hub and create a new container.
  2. Install the Python software on the top of Docker container.
  3. In Container, copy/create the machine learning model which we have created in jupyter notebook.

Steps to Complete the Task:

Step 1 : Configure Docker

Using the command below we first need to create a file named docker.repo using vim(text editor) to configure docker.

vim docker.repo                             

After the file is created press ‘i’, it will take the text editor into insert mode and then we can start typing the code present in the image below to configure docker. To come out of the insert mode use ‘esc’ key and then press ‘:wq’ to save and exit the file.

Step 2 : Install and Start the Docker service

yum install docker-ce --nobest
systemctl start docker
systemctl enable docker
systemctl status docker

Step 3 : Pull CentOS Image from the Docker hub

docker pull centos:latest

Step 4 : Launch a new Container using CentOS image with name task1

docker run -it --name task1 centos:latest

In ‘-it’, ‘i’ stands for interact and ‘t’ for terminal which will help to launch a new container as well as interact with it.

Step 5 : Install python on the top of new Container

yum install python3

Step 6 : Install python libraries required for training the ML-Model

pip3 install pandas
pip3 install scikit-learn

Two more libraries NumPy and joblib are required for training our Machine Learning Model but we’ve not installed them because Pandas in itself installs NumPy and scikit-learn in itself installs joblib.

Step 7 : Copy the dataset from Docker host to Docker container

We need to copy the dataset that is present in our base OS i.e RedhatOS to the centos container as Machine Learning Model requires historical data and to do that…

Run the below command in Base OS

Syntax: docker cp <src> <container_name>:<destination>

docker cp /root/SalaryData.csv task1:/root

The above image shows the copied ‘SalaryData.csv’ dataset file in centos container.

Step 8 : Create python file to build a code for training the ML-Model

vim model.py
vim salary_predictor.py

Step 9 : Run the code

Using the below command run the code. In this case you have to enter the years of experience and it will predict the salary.

python3 salary_predictor.py

Our Machine Learning Model is created and trained in a way that it asks the user to enter their years of experience and predicts the salary.

Task Successfully Completed!!!

So, this is how we Containerize Machine Learning Model in Docker.

Thanks for reading the Article

--

--