Posts

Cross Validation and its Importance

Image
  In this post, we will take a look at Cross validation and it’s importance in building machine learning models. Overview: There are multiple approaches to split a dataset for training and estimating the accuracy of the model. One of the common approaches is to split the dataset into 3 sets, a training set, a validation set and a test set. While the training set is used for training the model, the validation set is used for choosing the best model from amongst many models. The test set is finally used to estimate the model performance on unseen data. Now, what if our split resulted in some important data points being grouped in the validation and test set. Our model would miss those important data points from being trained on and will result in a poorer model. To alleviate this, we use cross validation. There are multiple flavors of cross validation, viz., K-fold cross validation, Stratified cross validation and Leave one out cross validation (LOOC). K-Fold cross validation:

My External Publications

Here is a list of my Publications in external websites: 1. Article on Policy Iteration Algorithm explained using a simple game involving a pirate ship with rewards and benefits. This algorithm is a part of ReInforcement Learning, an offshoot of Machine Learning. Title:  Policy Iteration in RL: A step by step Illustration Link:  https://towardsdatascience.com/policy-iteration-in-rl-an-illustration-6d58bdcb87a7

Difference between port and targetPort in Kubernetes

Image
Overview: In the previous post , we deployed a  simple microservice to Kubernetes. In this post, we would take a look at the various port mappings available for an application in Kubernetes and the differences between them. Port Mappings: Let us take a look at the service yaml file created in previous post.  You must be wondering that there are so many port mappings available in the yaml file, and what each of those means. There are two port mappings available, port and targetPort. There is also a third option., when we use Service type as nodePort, we can also provide a specific nodePort which could be used. Explanation of Ports: NodePort: When we use nodePort Service type, Kubernetes assigns a static port through which external clients can access the service. In our example, it is 31869. Port: Port (8086), is the port through which other services in the same node or other nodes within the cluster can access the service.  TargetPort:

Deploying a Spring Boot App on Kubernetes

Image
Overview In this post, I will be explaining how to deploy a simple Spring Boot Application onto Kubernetes. I will be using MicroK8s for the Kubernetes environment. For a quick guide on how setup MicroK8s refer here . Spring Boot App Docker Image: Let us create a simple Spring Boot application and dockerize it. The source could be downloaded from github repo here . Build the jar file and then dockerize it. I will be using the docker image from local, we can also push/pull this image from a Docker Repository. Docker command to dockerize the app is: docker build -t basic-microservice:local . Run the command 'docker images' and you should be able to see the docker image ' basic-microservice:local'. I have tagged the image as 'local' and not as latest. This is because, since we are using local images for Kubernetes, MicroK8s has a workaround for using local images. Follow the instructions here , to upload docker image to MicroK8s cache.

Kubernetes Basics

Image
Overview: Kubernetes has become the leader in Container Orchestration and a working knowledge of Kubernetes is essential for every developer. In this post I am going to cover some very basic concepts of Kubernetes. Concepts: Pod is a logical unit containing one or more containers (if you use a side car). It is the smallest execution unit. Replica Set represents number of instances of pods which are running. Used for scaling of pods. Stateful Set represents unique pods which retain state while running. We use Stateful Set while working with stateful workloads like a database cluster or a master slave configuration of workload. Deployment represents abstraction used to represent and update Pods and ReplicaSets. We use deployment as a "kind" in the yaml files while deploying an app in Kubernetes. Service is used for defining how we expose/access an application within a Pod. Generally when we expose an application in Kubernetes using Service, the

Quick Introduction to Matplotlib

Image
Overview: In Datascience and Machine learning, plots and graphs play a very important role. To understand the data and its characteristics, it is very important to plot the data and make sense out of it. Matplotlib is a powerful Python library which is very handy in building effective plots. We will take a look at a brief intro to Matplotlib. Scatter Plot: Scatter Plot plots two variables one each along X and Y axis and provides a view of the relationship or correlation between the two variables. Let us create two variables, where one variable has an exponential relationship with the other (i.e logarithmic relationship). Let us hypothetically assume that age and salary are related exponentially. So as age increases, salary increases exponentially. Let us create two dataframes with this in mind and plot them. Note how we added labels for X and Y axis and also a title for the plot. Line Plot: We can also plot two variables and draw a line to reflect the relat

Quick Introduction to Pandas Library

Image
Overview: Pandas is a python library which provides very powerful tools to perform complex data manipulations and analysis. In this article, few of the commonly used operations are explained. A common datastructure is a 'Dataframe'. Dataframe is a matrix type of datastructure with easy access to rows and columns. Creation of Dataframe: Accessing data from Dataframe: Data in Dataframe could be accessed in various ways. Data can also be accessed by slicing the dataframe instead of loops. It is the most efficient way of accessing data. Creating Dataframe with Index: We can create Dataframe with an index. Let us create a dataframe with date as index. As we can see from above, the values are filled as NaN (Not a Number). We can replace NaN with zero using fillna() function.