In the last article, I talked about two new abstract types: Kriging and RadialBasis. With these new types, I managed to construct the respective surrogates, and they did work quite well! However, there were still two problems to face:
How to effectively sample the space?
How to choose which points to evaluate my expensive objective function in such a way that I "fill" the space and also look for the minimum of the Surrogate?
I managed to solve both problems, as you can see on my repository: here and here.
I created many sampling Algorithms thanks to the suggestion of my mentor:
In this way I created a sample s with first dimension scaled between -1.0 and 4.5 and the second dimension between -5.0 and 7.0
I am quite happy with how this turned out because the syntax is indeed very straightforward.
Now that we have the samples, how do we proceed? That is, how do I pick the point to add to the Surrogates? For sure I cannot add all the sampling points, because the calculation of the objective function we want to approximate takes a long time.
Well, it turns out there are a lot of optimization methods to try, check them out in the section below!
For the development of this section, I took inspiration from the following sources:
"A stochastic radial basis function method for the global optimization of expensive functions" by Regis and Shoemaker.
Following those, I devised two optimization methods that are currently both working in one and multiple dimensions: SRBF and LCBS.
The main theme of this method is to minimize the following convex combination:
Where , , .
and are the minimum and maximum value of the surrogate at the sampled points, while and are the minimum and maximum distance from sample points and evaluated point.
It is clear that a big value of gives importance to surrogate values leading to discovery of the minimum value. On the other hand a small value of gives importance to points that are far away from other, leading to discovery of the sampled space.
It is important to avoid to choose points that are near to each other, because this would lead to singularities in the matrix of coefficients, the paper suggested a minimum tolerance of .
The main theme of this method is to instead minimize the following function:
Where is a fixed value, and are respectively the expected value and the expected error of the the surrogate evaluation. Note that this method can only work with Kriging at the moment, because it has a statistical interpretation.
The same thing happens also in the N dimensional case.
As a side note: the number of iterations and the number of samples is really small to have a good understanding of the example. If you let the methods run for more time, surely the algorithm will converge to the minimum while sampling.