It is all about to distance to the local min/max point.
Gradient descent is used mainly in AI applicaitons. I will not bother you to explain the details of it. But I will give you an answer for the question that I asked myself before. The question is that how can I create loss graph without using any framework like tensorflow or pytorch. I asked this to understand better what is going on in AI world. So in this post you will have a python script which will produce a graph for loss function and draw the related results dynamically on the mathematical equation.
So when you play with AI frameworks like yolov4, it can produce you a graph which explains the quality of your model, like :
I produced this graph, you can access whole story from this github link.
As you see in the graph, for x axis is iteration and y axis is the difference for loss value. This is important to understand to learn or to use the gradient descent method to produce graph like that.
I choose a function
We will find the local minimum point of this function using gradient descent method. You can not choose every functions, because for gradient descent function must be differentiable and continuous ( and there are a few limitations but it is enough to know for this post ).
Gradient descent, which is iterative method, uses the first derivative of the function. Then you should choose a point to start, according to value that gradient descent method gives, the point may move to the local min/max point. This movement can be many times according to iteration. If you give point from far away from local min/max then the difference will be huge. Difference means here is loss. Loss will be huge for the first iterations later it will be really small that it is not easy to differentiate the difference between points. If you iterate with small number, and draw the difference versus iteration number, finally you can have something like :
So what is the “difference or loss” ?
According to gradient descent method, first derivative of the function is used and the point is processed with first derivative with multiplication of the step(you can choose this). Step should be small, but if it is small then the whole process will take more time. It will consume more cpu and time. Then the number which multiplication of step with the result of the first derivative ( right side of the equation of Gradient Descent function) should be subtracted from the current value(for the first time it will be beginning point which you choose). If you choose the beginning point faaaar away from the local max/min this result will be high and after iterations it will be really small, Then in some point you should decide that iteration is enough or not. This is exactly where loss data will be parallel with x axis.(saturation). And that means your AI model trained good.
For instance:
In above graph, the red line is the function itself. So as you see there is a local minimum point between 2 and 3. I choose the start point for the iteration from the up around (6, 600) to see the movement better. For the first steps the distance between black dots are big as you see. But further it will be really small that you can not see from here. That means, the difference is really small and this can give a hint to use that we are really close to local min/max. Great ! ( loss or difference is the distance between black points. )
During this process, if you draw another graph which will show the difference vs iteration, (difference is the loss or result from gradient descent method) Then you will see the loss graph like yolov4 model or another AI model may produce during the training.
So how we can do that with python ? It is quite simple actually. I share the code here, just copy and run it. You should install matplot and numpy nothing more ! Then you will be able to see dynamically black dots and at the same time the loss progress. If you want to change the parameters like step or equation, it may not be easy to produce this kind of shapes, you should spent some time on it. ( current I run this code with python3)
import matplotlib.pyplot as plt import numpy as np xPrimeData = [] yPrimeData = [] xData = [] yData = [] xDataB = [] yDataB = [] f, (ax1, ax2, ax3) = plt.subplots(1, 3) x_old = 0 x_new = 6 eps = 0.00009 precision = 0.00000001 ax1.set_xlim(-100, 6000) ax1.set_ylim(-0.001, 0.050) line, = ax1.plot(xPrimeData, yPrimeData, 'r-') ax2.set_xlim(0, 6) ax2.set_ylim(-10, 600) line2, = ax2.plot(xData, yData, 'r-') # maximize the window manager = plt.get_current_fig_manager() manager.full_screen_toggle() # f(x) = x^{4}-3x^{3} def f(x): return x**4 - 3*x**3 ax3.set_xlim(0, 6) ax3.set_ylim(-10, 600) xDataB = np.linspace(0,6,1000, endpoint=False) for a in xDataB: yDataB.append(f(a)) line3, = ax3.plot(xDataB, yDataB, linewidth=3, color='red') def f_prime(x): return 4 * x**3 - 9 * x**2 it = 0 while abs(x_new - x_old) > precision: x_old = x_new x_new = x_old - eps * f_prime(x_old) it = it + 1 #print(x_old, " ",x_new, " ",f_prime(x_old)," ",eps*f_prime(x_old), " ",abs(x_new - x_old)) #print(x_new, " ", f(x_new)) xPrimeData.append(it) yPrimeData.append(abs(x_new - x_old)) line.set_xdata(xPrimeData) line.set_ydata(yPrimeData) xData.append(x_new) yData.append(f(x_new)) line2.set_xdata(x_new) line2.set_ydata(f(x_new)) ax1.plot(xPrimeData, yPrimeData, linewidth=3, color='red') ax1.title.set_text('iteration vs abs(xNew -xOld) \n xNew = xOld - eps * fPrime(xOld) \n cost function') ax1.set_xlabel("iteration") ax1.set_ylabel("difference - abs(xNew - xOld)") ax2.plot(xData, yData, linewidth=3, color='red') ax2.title.set_text('Function iteration - x vs y') ax2.set_xlabel("x") ax2.set_ylabel("y") ax3.plot(xData, yData, marker=".", markersize=6 , linestyle='',color='black') ax3.title.set_text('Function iteration on the \n drawn function with dots \n f(x) = x^4-3x^3') ax3.set_xlabel("x") ax3.set_ylabel("y") # to see the process in animation uncomment these plt.draw() plt.pause(0.000001) print("local minimum occurs at", x_new ," ", it) plt.show()
When you run the code you will see output like this. It will really slow and cpu consuming, so do not worry. If you want to see the result quickly, then just make comment the line 82 and 83
Code is here.
Dynamically it looks like this: