Ludovico / Aug 12 2019
Remix of Julia by Nextjournal

More optimization methods and future surrogates

In the last article I talked about two optimization methods: SRBF and LCBS.

In the mean time I successfully coded other three optimization methods: DYCOR, EI and SOP. As I am writing I also added another surrogate, LinearSurrogate which is under review at: https://github.com/JuliaDiffEq/Surrogates.jl/pull/42

It was a very low hanging fruit which I didn't get around doing, so I blitzed it out in the last two days.

Let's now take a look at the three new optimization methods:

DYCOR

The DYCORS optimization method is an extension of the SRBF method. The main change is that there is a probability that a new point we want to add is perturbed.

From pySOT documentantion: "The main idea is that many objective functions depend only on a few directions so it may be advantageous to perturb only a few directions. In particular, we use a perturbation probability to perturb a given coordinate and decrease this probability after each function evaluation so fewer coordinates are perturbed later in the optimization."

Below the code:

Shift+Enter to run
surrogate_optimize(obj,::DYCORS,lb,ub,surrn::AbstractSurrogate,sample_type)
    x_best = collect(surrn.x[argmin(surrn.y)])
    y_best = minimum(surrn.y)
    sigma_n = 0.2*norm(ub-lb)
    d = length(lb)
    sigma_min = 0.2*(0.5)^6*norm(ub-lb)
    t_success = 3
    t_fail = max(d,5)
    C_success = 0
    C_fail = 0
    for k = 1:maxiters
        p_select = min(20/d,1)*(1-log(k))/log(maxiters-1)
        new_points = zeros(eltype(surrn.x[1]),num_new_samples,d)
        for j = 1:num_new_samples
            w = sample(d,0,1,sample_type)
            I_perturb = w .< p_select
            if ~(true in I_perturb)
                val = rand(1:d)
                I_perturb = vcat(zeros(Int,val-1),1,zeros(Int,d-val))
            end
            I_perturb = Int.(I_perturb)
            for i = 1:d
                if I_perturb[i] == 1
                    new_points[j,i] = x_best[i] + rand(Normal(0,sigma_n))
                else
                    new_points[j,i] = x_best[i]
                end
            end

        end

        for i = 1:num_new_samples
            for j = 1:d
                while new_points[i,j] < lb[j] || new_points[i,j] > ub[j]
                    if new_points[i,j] > ub[j]
                        new_points[i,j] = maximum(surrn.x)[j] - norm(new_points[i,j] - maximum(surrn.x)[j])
                    end
                    if new_points[i,j] < lb[j]
                        new_points[i,j] = minimum(surrn.x)[j] + norm(new_points[i]-minimum(surrn.x)[j])
                    end
                end
            end
        end

        x_new = select_evaluation_point_ND(new_points,surrn,k,maxiters)
        f_new = obj(x_new)

        if f_new < y_best
            C_success = C_success + 1
            C_fail = 0
        else
            C_fail = C_fail + 1
            C_success = 0
        end

        sigma_n,C_success,C_fail = adjust_step_size(sigma_n,sigma_min,C_success,t_success,C_fail,t_fail)

        if f_new < y_best
            x_best = x_new
            y_best = f_new
            add_point!(surrn,Tuple(x_best),y_best)
        end
    end
end

There are two accessories functions:

  • select_evaluation_point_ND
  • adjust_step_size

The former selects the point the minimizes best a convex sum, just like in SRBF.

The latter is self-explanatory, given the amount of fails and successes it changes the step sigma.

EI

The EI optimization method stands for: expected improvement.

The main idea is to maximize the expected value of improvement:

As you can see, an expected value pops up. We can only use this optimization method under a Gaussian Process prior, in our case: Kriging.

Shift+Enter to run
surrogate_optimize(obj,::EI,lb,ub,krig::Kriging,sample_type)
        dtol = 1e-3*norm(ub-lb)
        eps = 0.01
        for i = 1:maxiters
            new_sample = sample(num_new_samples,lb,ub,sample_type)
            f_max = maximum(krig.y)
            evaluations = zeros(eltype(krig.x[1]),num_new_samples)
            point_found = false
            new_x_max = zero(eltype(krig.x[1]))
            new_y_max = zero(eltype(krig.x[1]))
            while point_found == false
                for j = 1:length(new_sample)
                    std = std_error_at_point(krig,new_sample[j])
                    u = krig(new_sample[j])
                    if abs(std) > 1e-6
                        z = (u - f_max - eps)/std
                    else
                        z = 0
                    end
                    evaluations[j] = (u-f_max-eps)*cdf(Normal(),z) + std*pdf(Normal(),z)
                end
                index_max = argmax(evaluations)
                x_new = new_sample[index_max]
                y_new = maximum(evaluations)
                diff_x = abs.(krig.x .- x_new)
                bit_x = diff_x .> dtol
                #new_min_x has to have some distance from krig.x
                if false in bit_x
                    #The new_point is not actually that new, discard it!
                    deleteat!(evaluations,index_max)
                    deleteat!(new_sample,index_max)
                    if length(new_sample) == 0
                        println("Out of sampling points")
                        return
                    end
                 else
                    point_found = true
                    new_x_max = x_new
                    new_y_max = y_new
                end
            end
            if new_y_max < 1e-6*norm(maximum(krig.y)-minimum(krig.y))
                return
            end
            add_point!(krig,new_x_max,obj(new_x_max))
        end
end

As you can see, we made use of probability theory when calling PDF and CDF of the normal. The merit function is the surrogate itself.

SOP

The SOP optimization method is implemented following this paper: "SOP: parallel surrogate global optimization with Pareto center selection for computationally expensive single objective problems" by Shoemaker.

There are many new ideas:

  • Look for new points to add in parallel.
  • Two tier raking of previously evaluated points
  • Points are perturbated with a truncated normal.
  • Keep a list of tabu points to avoid that did not produce any improvement.
  • Measure the improvement using "Expected Hypervolume improvement" (EHI).

At the moment, the code does not run in parallel but I expect to add this feature someday in the near future. At this time my mentor and I do not understand EHI super well from the paper definition, so it is ignored at this stage. This means that the code converges slowly to the minimum, nevertheless it behaves quite well and search the whole sample space uniformly.

Code below:

1.2s
surrogate_optimize(obj,sop1::SOP,lb,ub,surrSOP::AbstractSurrogate,sample_type)
     d = length(lb)
     N_fail = 3
     N_tenure = 5
     tau = 10^-5
     num_P = sop1.p
     centers_global = surrSOP.x
     r_centers_global = 0.2*norm(ub-lb)*ones(length(surrSOP.x))
     N_failures_global = zeros(length(surrSOP.x))
     tabu = []
     N_tenures_tabu = []
     for k = 1:maxiters
         N_tenures_tabu .+= 1
         #deleting points that have been in tabu for too long
         del = N_tenures_tabu .> N_tenure

          if length(del) > 0
             for i = length(del)
                 if del[i]
                     del[i] = i
                 end
             end
             deleteat!(N_tenures_tabu,del)
             deleteat!(tabu,del)
         end

 
          ##### P CENTERS ######
         C = []

          #S(x) set of points already evaluated
         #Rank points in S with:
         #1) Non dominated sorting
         Fronts_I = I_tier_ranking_1D(centers_global,surrSOP)

          #2) Second tier ranking
         Fronts = II_tier_ranking_1D(Fronts_I,surrSOP)
         ranked_list = []
         for i = 1:length(Fronts)
             for j = 1:length(Fronts[i])
                 push!(ranked_list,Fronts[i][j])
             end
         end
         ranked_list = eltype(surrSOP.x[1]).(ranked_list)

          centers_full = 0
         i = 1
         while i <= length(ranked_list) && centers_full == 0
             flag = 0
             for j = 1:length(ranked_list)
                 for k = 1:length(tabu)
                     if abs(ranked_list[j]-tabu[k]) < tau
                         flag = 1
                     end
                 end
                 for l = 1:length(centers_global)
                     if abs(ranked_list[j]-centers_global[l]) < tau
                         flag = 1
                     end
                 end
             end
             if flag == 1
                 skip
             else
                 push!(C,ranked_list[i])
                 if length(C) == num_P
                     centers_full = 1
                 end
             end
             i = i + 1
         end

          # I examined all the points in the ranked list but num_selected < num_p
         # I just iterate again using only radius rule
         if length(C) < num_P
             i = 1
             while i <= length(ranked_list) && centers_full == 0
                 flag = 0
                 for j = 1:length(ranked_list)
                     for k = 1:length(centers_global)
                         if abs(centers_global[i] - ranked_list[i]) < tau
                             flag = 1
                         end
                     end
                 end
                 if flag == 1
                     skip
                 else
                     push!(C,ranked_list[i])
                     if length(C) == num_P
                         centers_full = 1
                     end
                 end
                 i = i + 1
             end
         end

          #If I still have num_selected < num_P, I double down on some centers iteratively
         if length(C) < num_P
             i = 1
             while i <= length(ranked_list)
                 push!(C,ranked_list[i])
                 if length(C) == num_P
                     centers_full = 1
                 end
             end
         end

          #Here I have C = [] containing the centers
         r_centers = 0.2*norm(ub-lb)*ones(num_P)
         N_failures = zeros(num_P)
         #2.3 Candidate search
         new_points = zeros(eltype(surrSOP.x[1]),num_P,2)
         for i = 1:num_P
             N_candidates = zeros(eltype(surrSOP.x[1]),num_new_samples)
             #Using phi(n) just like DYCORS, merit function = surrogate
             #Like in DYCORS, I_perturb = 1 always
             evaluations = zeros(eltype(surrSOP.y[1]),num_new_samples)
             for j = 1:num_new_samples
                 a = lb - C[i]
                 b = ub - C[i]
                 N_candidates[j] = C[i] + rand(TruncatedNormal(0,r_centers[i],a,b))
                 evaluations[j] = surrSOP(N_candidates[j])
             end
             x_best = N_candidates[argmin(evaluations)]
             y_best = minimum(evaluations)
             new_points[i,1] = x_best
             new_points[i,2] = y_best
         end

          #new_points[i] now contains:
         #[x_1,y_1; x_2,y_2,...,x_{num_new_samples},y_{num_new_samples}]
         #2.4 Adaptive learning and tabu archive
         for i=1:num_P
             if new_points[i,1] in centers_global
                 r_centers[i] = r_centers_global[i]
                 N_failures[i] = N_failures_global[i]
             end
             #1D it is just the length of the interval
   					 #Need further explanation
             #println(Hypervolume_Pareto_improving_1D(new_points[i,1],Fronts[1]))
             #=
             if (Hypervolume_Pareto_improving_1D(new_points[i,1],Fronts[1])<tau)
                 #failure
                 r_centers[i] = r_centers[i]/2
                 N_failures[i] += 1
                 if N_failures[i] > N_fail
                     push!(tabu,C[i])
                     push!(N_tenures_tabu,0)
                 end
             else
             =#
                 #P_i is success
                 #Adaptive_learning
                 add_point!(surrSOP,new_points[i,1],new_points[i,2])
                 push!(r_centers_global,r_centers[i])
                 push!(N_failures_global,N_failures[i])
             #end
         end
     end
 end

As you can see I commented out the bit on EHI, given that there are many conditions to pass to even get to that point the algorithm is still performing OK.

Another Surrogate and plans for the future

At the moment I am reading these papers: "Multidimensional Lobachevsky Spline Integration on Scattered Data" and "Numerical integration on multivariate scattered data by Lobachevsky splines" by Giampiero Allasia. The main focus is to build a new surrogate type called Lobachesky spline to fit data. I expect to open a PR very soon with the 1D version of it. This is following the issue: https://github.com/JuliaDiffEq/Surrogates.jl/issues/17

I love the fact that there are still many Surrogates to build so I can pick whatever suits my skills the most! (Knowing fully that I will implement them all sooner or later :-D)

Happy Coding,

Ludovico