# [SciPy-User] optimize a piece of code pruning arrays based on distance Classic List Threaded 2 messages Reply | Threaded
Open this post in threaded view
|

## [SciPy-User] optimize a piece of code pruning arrays based on distance

 Hello,[Sorry if this is a duplicate. I realized I sent my previous email from the wrong address]I am trying to find a way to use scipy/numpy to optimize a piece of code. From profiling I already know over 90% of the runtime of the function is in 4 lines of code. This code is from selSPEA2 in DEAP in case that matters.for i in range(N):    for j in range(1, size - 1):            if sorted_indices[i,j] == min_pos:                sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]                sorted_indices[i,j:size] = min_pos                breakThe inner loop is inherently parallel since it is each iteration does not depend on any other. With C++ I would use OpenMP or TBB to parallelize the outer loop. For a more complete picture I have also included the surrounding code. The basic problem is if the number of entries (size) is greater than the number allowed to be kept (k) for the next generation then the most similar items (based on distance) need to be removed until size == k. At each iteration a solution with the lowest distance to another solution is removed and then the distance from the remove solution to all other solutions is set as infinity. This prevents removing all points in a cluster.Thank youwhile size > k:    # Search for minimal distance    min_pos = 0    for i in range(1, N):        for j in range(1, size):            dist_i_sorted_j = distances[i,sorted_indices[i,j]]            dist_min_sorted_j = distances[min_pos,sorted_indices[min_pos,j]]            if dist_i_sorted_j < dist_min_sorted_j:                min_pos = i                break            elif dist_i_sorted_j > dist_min_sorted_j:                break        distances[:,min_pos] = float("inf")    distances[min_pos,:] = float("inf")    #This part is still expensive but I don't know a better way to do it yet.    #Essentially all remaining time in this function is in this section    #It may even make sense to do this in C++ later since it is trivially parallel    for i in range(N):        for j in range(1, size - 1):            if sorted_indices[i,j] == min_pos:                sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]                sorted_indices[i,j:size] = min_pos                break        # Remove corresponding individual from chosen_indices    to_remove.append(min_pos)    size -= 1 _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## Re: optimize a piece of code pruning arrays based on distance

 You may be able to vectorise that piece of code with the np.where function.Otherway, your code may be optimized with the just-in-time compiler numba.Good luck !2018-01-24 11:44 GMT+01:00 William Heymann :Hello,[Sorry if this is a duplicate. I realized I sent my previous email from the wrong address]I am trying to find a way to use scipy/numpy to optimize a piece of code. From profiling I already know over 90% of the runtime of the function is in 4 lines of code. This code is from selSPEA2 in DEAP in case that matters.for i in range(N):    for j in range(1, size - 1):            if sorted_indices[i,j] == min_pos:                sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]                sorted_indices[i,j:size] = min_pos                breakThe inner loop is inherently parallel since it is each iteration does not depend on any other. With C++ I would use OpenMP or TBB to parallelize the outer loop. For a more complete picture I have also included the surrounding code. The basic problem is if the number of entries (size) is greater than the number allowed to be kept (k) for the next generation then the most similar items (based on distance) need to be removed until size == k. At each iteration a solution with the lowest distance to another solution is removed and then the distance from the remove solution to all other solutions is set as infinity. This prevents removing all points in a cluster.Thank youwhile size > k:    # Search for minimal distance    min_pos = 0    for i in range(1, N):        for j in range(1, size):            dist_i_sorted_j = distances[i,sorted_indices[i,j]]            dist_min_sorted_j = distances[min_pos,sorted_indices[min_pos,j]]            if dist_i_sorted_j < dist_min_sorted_j:                min_pos = i                break            elif dist_i_sorted_j > dist_min_sorted_j:                break        distances[:,min_pos] = float("inf")    distances[min_pos,:] = float("inf")    #This part is still expensive but I don't know a better way to do it yet.    #Essentially all remaining time in this function is in this section    #It may even make sense to do this in C++ later since it is trivially parallel    for i in range(N):        for j in range(1, size - 1):            if sorted_indices[i,j] == min_pos:                sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]                sorted_indices[i,j:size] = min_pos                break        # Remove corresponding individual from chosen_indices    to_remove.append(min_pos)    size -= 1 _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user