Quick Introduction to Matplotlib
In Datascience and Machine learning, plots and graphs play a very important role. To understand the data and its characteristics, it is very important to plot the data and make sense out of it. Matplotlib is a powerful Python library which is very handy in building effective plots. We will take a look at a brief intro to Matplotlib.
Scatter Plot plots two variables one each along X and Y axis and provides a view of the relationship or correlation between the two variables. Let us create two variables, where one variable has an exponential relationship with the other (i.e logarithmic relationship).
Let us hypothetically assume that age and salary are related exponentially. So as age increases, salary increases exponentially. Let us create two dataframes with this in mind and plot them. Note how we added labels for X and Y axis and also a title for the plot.
We can also plot two variables and draw a line to reflect the relationship between them. Matplotlib works both with Numpy arrays and with Pandas dataframe. Let us create two variables one representing weight in Kilogram and the other in pounds. Obviously, the relationship between these two is linear. Let us plot this. Note how we provided a legend to the line. If there are multiple lines, we can provide a legend to each of the lines plotted.
Multiple Line Plots in a Graph:
We can draw multiple lines in a graph. This will be help us visualize how multiple variables are moving against each other. Let us plot the same age and salary which we plotted earlier, but instead of plotting age vs salary, let us plot age line and salary line independently. We will use date as an index just to see how to plot indexed dataframes.
Note how we used legend for each of the lines and also the position of the legend could be customized. Also note the line color has been customized as well.
A final note while using Matplotlib in unix systems. Typically we might want to save images to file system using plt.savefig(). However, when you invoke this multiple times for different plots, the plots tend to overwrite on each other. To avoid this, we would need to clear the canvas after plotting each graph. This could be done by using plt.clf()