[SciPy-User] Speeding things up - how to use more than one computer core

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Speeding things up - how to use more than one computer core

Troels Emtekær Linnet
Dear Scipy users.

I am doing analysis of some NMR data, where I repeatability are doing leastsq fitting.
But I get a little impatient for the time-consumption. For a run of my data, it takes 
approx 3-5 min, but it in this testing phase, it is to slow.

A look in my  task manager, show that I only consume 25%=1 core on my computer.
And I have access to a computer with 24 cores, so I would like to speed things up.
------------------------------------------------
I have been looking at the descriptions of multithreading/Multiprocess


But I hope someone can guide me, which of these two methods I should go for, and how to implement it?
I am little unsure about GIL, synchronisation?, and such things, which I know none about.

For the real data, I can see that I am always waiting for the call of the leastsq fitting.
How can start a pool of cores when I go through fitting?

I have this test script, which exemplifies my problem:
Fitting single peaks N=300 0:00:00.045000
Done with fitting single peaks 0:00:01.146000

Make a test on chisqr 0:00:01.147000
Done with test on chisqr 0:00:01.148000

Prepare for global fit 0:00:01.148000
Doing global fit 0:00:01.152000
Done with global fit 0:00:17.288000
global fit unpacked 0:00:17.301000 

Making figure 0:00:17.301000
--------------------------------------
import pylab as pl
#import matplotlib.pyplot as pl
import numpy as np
import scipy.optimize
import lmfit
from datetime import datetime
startTime = datetime.now()
#
############# Fitting functions ################
def sim(pars,x,data=None,eps=None):
    a = pars['a'].value
    b = pars['b'].value
    c = pars['c'].value
    model = a*np.exp(-b*x)+c
    if data is None:
        return model
    if eps is None:
        return (model - data)
    return (model-data)/eps
#
def err_global(pars,x_arr,y_arr,sel_p):
    toterr = np.array([])
    for i in range(len(sel_p)):
        p = sel_p[i]
        par = lmfit.Parameters()
        par.add('b', value=pars['b'].value, vary=True, min=0.0)
        par.add('a', value=pars['a%s'%p].value, vary=True)
        par.add('c', value=pars['c%s'%p].value, vary=True)
        x = x_arr[i]
        y = y_arr[i]
        Yfit = sim(par,x)
        erri = Yfit - y
        toterr = np.concatenate((toterr, erri))
    #print len(toterr), type(toterr)
    return toterr
#
def unpack_global(dic, p_list):
    for i in range(len(p_list)):
        p = p_list[i]
        par = lmfit.Parameters()
        b = dic['gfit']['par']['b']
        a = dic['gfit']['par']['a%s'%p]
        c = dic['gfit']['par']['c%s'%p]
        par['b'] = b; par['a'] = a; par['c'] = c
        dic[str(p)]['gfit']['par'] = par
        # Calc other parameters for the fit
        Yfit = sim(par, dic[str(p)]['X'])
        dic[str(p)]['gfit']['Yfit'] = Yfit
        residual = Yfit - dic[str(p)]['Yran']
        dic[str(p)]['gfit']['residual'] = residual
        chisq = sum(residual**2)
        dic[str(p)]['gfit']['chisq'] = chisq
        NDF = len(residual)-len(par)
        dic[str(p)]['gfit']['NDF'] = NDF
        dic[str(p)]['gfit']['what_is_this_called'] = np.sqrt(chisq/NDF)
        dic[str(p)]['gfit']['redchisq'] = chisq/NDF
    return()
################ Random peak data generator ###########################
def gendat(nr):
    pd = {}
    for i in range(1,nr+1):
        b = 0.15
        a = np.random.random_integers(1, 80)/10.
        c = np.random.random_integers(1, 80)/100.
        par = lmfit.Parameters(); par.add('b', value=b, vary=True); par.add('a', value=a, vary=True); par.add('c', value=c, vary=True)
        pd[str(i)] = par
    return(pd)
#############################################################################
## Start
#############################################################################
limit = 0.6   # Limit set for chisq test, to select peaks
#############################################################################
# set up the data
data_x = np.linspace(0, 20, 50)
pd = {} # Parameter dictionary, the "true" values of the data sets
par = lmfit.Parameters(); par.add('b', value=0.15, vary=True); par.add('a', value=2.5, vary=True); par.add('c', value=0.5, vary=True)
pd['1'] = par # parameters for the first trajectory
par = lmfit.Parameters(); par.add('b', value=0.15, vary=True); par.add('a', value=4.2, vary=True); par.add('c', value=0.2, vary=True)
pd['2'] = par       # parameters for the second trajectory, same b
par = lmfit.Parameters(); par.add('b', value=0.15, vary=True); par.add('a', value=1.2, vary=True); par.add('c', value=0.3, vary=True)
pd['3'] = par       # parameters for the third trajectory, same b
pd = gendat(300)  # You can generate a large number of peaks to test
#
#Start making a dictionary, which holds all data
dic = {}; dic['peaks']=range(1,len(pd)+1)
for p in dic['peaks']:
    dic['%s'%p] = {}
    dic[str(p)]['X'] = data_x
    dic[str(p)]['Y'] = sim(pd[str(p)],data_x)
    dic[str(p)]['Yran'] = dic[str(p)]['Y'] + np.random.normal(size=len(dic[str(p)]['Y']), scale=0.12)
    dic[str(p)]['fit'] = {}  # Make space for future fit results
    dic[str(p)]['gfit'] = {}  # Male space for future global fit results
#print "keys for start dictionary:", dic.keys()
#
# independent fitting of the trajectories
print "Fitting single peaks N=%s %s"%(len(pd),(datetime.now()-startTime))
for p in dic['peaks']:
    par = lmfit.Parameters(); par.add('b', value=2.0, vary=True, min=0.0); par.add('a', value=2.0, vary=True); par.add('c', value=2.0, vary=True)
    lmf = lmfit.minimize(sim, par, args=(dic[str(p)]['X'], dic[str(p)]['Yran']),method='leastsq')
    dic[str(p)]['fit']['par']= par
    dic[str(p)]['fit']['lmf']= lmf
    Yfit = sim(par,dic[str(p)]['X'])
    #Yfit2 = dic[str(p)]['Yran']+lmf.residual
    #print sum(Yfit-Yfit2), "Test for difference in two ways to get the fitted Y-values "
    dic[str(p)]['fit']['Yfit'] = Yfit
    #print "Best fit parameter for peak %s. %3.2f %3.2f %3.2f."%(p,par['b'].value,par['a'].value,par['c'].value),
    #print "Compare to real paramaters. %3.2f %3.2f %3.2f."%(pd[str(p)]['b'].value,pd[str(p)]['a'].value,pd[str(p)]['c'].value)
print "Done with fitting single peaks %s\n"%(datetime.now()-startTime)
#
# Make a selection flag, based on some test. Now a chisq value, but could be a Ftest between a simple and advanced model fit.
print "Make a test on chisqr %s"%(datetime.now()-startTime)
sel_p = []
for p in dic['peaks']:
    chisq = dic[str(p)]['fit']['lmf'].chisqr
    #chisq2 = sum((dic[str(p)]['fit']['Yfit']-dic[str(p)]['Yran'])**2)
    #print chisq - chisq2 "Test for difference in two ways to get chisqr"
    if chisq < limit:
        dic[str(p)]['Pval'] = 1.0
        #print "Peak %s passed test"%p
        sel_p.append(p)
    else:
        dic[str(p)]['Pval'] = False
print 'Done with test on chisqr %s\n'%(datetime.now()-startTime)
#print sel_p
#
# Global fitting
# Pick up x,y-values and parameters that passed the test
X_arr = []
Y_arr = []
P_arr = lmfit.Parameters(); P_arr.add('b', value=1.0, vary=True, min=0.0)
dic['gfit'] = {} # Make room for globat fit result
print "Prepare for global fit %s"%(datetime.now()-startTime)
for p in sel_p:
    par = dic[str(p)]['fit']['par']
    X_arr.append(dic[str(p)]['X'])
    Y_arr.append(dic[str(p)]['Yran'])
    P_arr.add('a%s'%p, value=par['a'].value, vary=True)
    P_arr.add('c%s'%p, value=par['c'].value, vary=True)
print "Doing global fit %s"%(datetime.now()-startTime)
lmf = lmfit.minimize(err_global, P_arr, args=(X_arr, Y_arr, sel_p),method='leastsq')
print "Done with global fit %s"%(datetime.now()-startTime)
dic['gfit']['par']= P_arr
dic['gfit']['lmf']= lmf
unpack_global(dic, sel_p) # Unpack the paramerts into the selected peaks
print "global fit unpacked %s \n"%(datetime.now()-startTime)
#
# Check result
#for p in sel_p:
#    ip= pd[str(p)]; sp = dic[str(p)]['fit']['par']; gp = dic[str(p)]['gfit']['par']
    #print p, "Single fit. %3.2f %3.2f %3.2f"%(sp['b'].value,sp['a'].value,sp['c'].value),
    #print "Global fit. %3.2f %3.2f %3.2f"%(gp['b'].value,gp['a'].value,gp['c'].value)
    #print p, "Single fit. %3.2f %3.2f %3.2f"%(sp['b'].value-ip['b'].value,sp['a'].value-ip['a'].value,sp['c'].value-ip['c'].value),
    #print "Global fit. %3.2f %3.2f %3.2f"%(gp['b'].value-ip['b'].value,gp['a'].value-ip['a'].value,gp['c'].value-ip['c'].value)##
#
# Start plotting
print "Making figure %s"%(datetime.now()-startTime)
fig = pl.figure()
sel_p = sel_p[:9]
for i in range(len(sel_p)):
    p = sel_p[i]
    # Create figure
    ax = fig.add_subplot('%s1%s'%(len(sel_p),i+1))
    X = dic[str(p)]['X']
    Y = dic[str(p)]['Y']
    Ymeas = dic[str(p)]['Yran']
    Yfit = dic[str(p)]['fit']['Yfit']
    Yfit_global = dic[str(p)]['gfit']['Yfit']
    rpar = pd[str(p)]
    fpar = dic[str(p)]['fit']['par']
    gpar = dic[str(p)]['gfit']['par']
    fchisq = dic[str(p)]['fit']['lmf'].chisqr
    gchisq = dic[str(p)]['gfit']['chisq']
    # plot
    ax.plot(X,Y,".-",label='real. Peak: %s'%p)
    ax.plot(X,Ymeas,'o',label='Measured (real+noise)')
    ax.plot(X,Yfit,'.-',label='leastsq fit. chisq:%3.3f'%fchisq)
    ax.plot(X,Yfit_global,'.-',label='global fit. chisq:%3.3f'%gchisq)
    # annotate
    ax.annotate('p%s. real    par: %1.3f %1.3f %1.3f'%(p, rpar['b'].value,rpar['a'].value,rpar['c'].value), xy=(1,1), xycoords='data', xytext=(0.4, 0.8), textcoords='axes fraction')
    ax.annotate('p%s. single  par: %1.3f %1.3f %1.3f'%(p, fpar['b'].value,fpar['a'].value,fpar['c'].value), xy=(1,1), xycoords='data', xytext=(0.4, 0.6), textcoords='axes fraction')
    ax.annotate('p%s. global  par: %1.3f %1.3f %1.3f'%(p, gpar['b'].value,gpar['a'].value,gpar['c'].value), xy=(1,1), xycoords='data', xytext=(0.4, 0.4), textcoords='axes fraction')
    # set title and axis name
    #ax.set_title('Fitting for peak %s'%p)
    ax.set_ylabel('Decay')
    # Put legend to the right
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height]) # Shink current axis by 20%
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5),prop={'size':8}) # Put a legend to the right of the current axis
    ax.grid('on')
ax.set_xlabel('Time')
#
print "ready to show figure %s"%(datetime.now()-startTime)
pl.show()










_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Calvin Morrison

I typically use the pool method. If you are doing the same function wirh separate datasets many times it is a great way to get speedups

I used it here recently:

https://github.com/mutantturkey/python-quikr/blob/master/src/python/multifasta_to_otu

Hope that helps

On Apr 6, 2013 11:41 AM, "Troels Emtekær Linnet" <[hidden email]> wrote:
Dear Scipy users.

I am doing analysis of some NMR data, where I repeatability are doing leastsq fitting.
But I get a little impatient for the time-consumption. For a run of my data, it takes 
approx 3-5 min, but it in this testing phase, it is to slow.

A look in my  task manager, show that I only consume 25%=1 core on my computer.
And I have access to a computer with 24 cores, so I would like to speed things up.
------------------------------------------------
I have been looking at the descriptions of multithreading/Multiprocess


But I hope someone can guide me, which of these two methods I should go for, and how to implement it?
I am little unsure about GIL, synchronisation?, and such things, which I know none about.

For the real data, I can see that I am always waiting for the call of the leastsq fitting.
How can start a pool of cores when I go through fitting?

I have this test script, which exemplifies my problem:
Fitting single peaks N=300 0:00:00.045000
Done with fitting single peaks 0:00:01.146000

Make a test on chisqr 0:00:01.147000
Done with test on chisqr 0:00:01.148000

Prepare for global fit 0:00:01.148000
Doing global fit 0:00:01.152000
Done with global fit 0:00:17.288000
global fit unpacked 0:00:17.301000 

Making figure 0:00:17.301000
--------------------------------------
import pylab as pl
#import matplotlib.pyplot as pl
import numpy as np
import scipy.optimize
import lmfit
from datetime import datetime
startTime = datetime.now()
#
############# Fitting functions ################
def sim(pars,x,data=None,eps=None):
    a = pars['a'].value
    b = pars['b'].value
    c = pars['c'].value
    model = a*np.exp(-b*x)+c
    if data is None:
        return model
    if eps is None:
        return (model - data)
    return (model-data)/eps
#
def err_global(pars,x_arr,y_arr,sel_p):
    toterr = np.array([])
    for i in range(len(sel_p)):
        p = sel_p[i]
        par = lmfit.Parameters()
        par.add('b', value=pars['b'].value, vary=True, min=0.0)
        par.add('a', value=pars['a%s'%p].value, vary=True)
        par.add('c', value=pars['c%s'%p].value, vary=True)
        x = x_arr[i]
        y = y_arr[i]
        Yfit = sim(par,x)
        erri = Yfit - y
        toterr = np.concatenate((toterr, erri))
    #print len(toterr), type(toterr)
    return toterr
#
def unpack_global(dic, p_list):
    for i in range(len(p_list)):
        p = p_list[i]
        par = lmfit.Parameters()
        b = dic['gfit']['par']['b']
        a = dic['gfit']['par']['a%s'%p]
        c = dic['gfit']['par']['c%s'%p]
        par['b'] = b; par['a'] = a; par['c'] = c
        dic[str(p)]['gfit']['par'] = par
        # Calc other parameters for the fit
        Yfit = sim(par, dic[str(p)]['X'])
        dic[str(p)]['gfit']['Yfit'] = Yfit
        residual = Yfit - dic[str(p)]['Yran']
        dic[str(p)]['gfit']['residual'] = residual
        chisq = sum(residual**2)
        dic[str(p)]['gfit']['chisq'] = chisq
        NDF = len(residual)-len(par)
        dic[str(p)]['gfit']['NDF'] = NDF
        dic[str(p)]['gfit']['what_is_this_called'] = np.sqrt(chisq/NDF)
        dic[str(p)]['gfit']['redchisq'] = chisq/NDF
    return()
################ Random peak data generator ###########################
def gendat(nr):
    pd = {}
    for i in range(1,nr+1):
        b = 0.15
        a = np.random.random_integers(1, 80)/10.
        c = np.random.random_integers(1, 80)/100.
        par = lmfit.Parameters(); par.add('b', value=b, vary=True); par.add('a', value=a, vary=True); par.add('c', value=c, vary=True)
        pd[str(i)] = par
    return(pd)
#############################################################################
## Start
#############################################################################
limit = 0.6   # Limit set for chisq test, to select peaks
#############################################################################
# set up the data
data_x = np.linspace(0, 20, 50)
pd = {} # Parameter dictionary, the "true" values of the data sets
par = lmfit.Parameters(); par.add('b', value=0.15, vary=True); par.add('a', value=2.5, vary=True); par.add('c', value=0.5, vary=True)
pd['1'] = par # parameters for the first trajectory
par = lmfit.Parameters(); par.add('b', value=0.15, vary=True); par.add('a', value=4.2, vary=True); par.add('c', value=0.2, vary=True)
pd['2'] = par       # parameters for the second trajectory, same b
par = lmfit.Parameters(); par.add('b', value=0.15, vary=True); par.add('a', value=1.2, vary=True); par.add('c', value=0.3, vary=True)
pd['3'] = par       # parameters for the third trajectory, same b
pd = gendat(300)  # You can generate a large number of peaks to test
#
#Start making a dictionary, which holds all data
dic = {}; dic['peaks']=range(1,len(pd)+1)
for p in dic['peaks']:
    dic['%s'%p] = {}
    dic[str(p)]['X'] = data_x
    dic[str(p)]['Y'] = sim(pd[str(p)],data_x)
    dic[str(p)]['Yran'] = dic[str(p)]['Y'] + np.random.normal(size=len(dic[str(p)]['Y']), scale=0.12)
    dic[str(p)]['fit'] = {}  # Make space for future fit results
    dic[str(p)]['gfit'] = {}  # Male space for future global fit results
#print "keys for start dictionary:", dic.keys()
#
# independent fitting of the trajectories
print "Fitting single peaks N=%s %s"%(len(pd),(datetime.now()-startTime))
for p in dic['peaks']:
    par = lmfit.Parameters(); par.add('b', value=2.0, vary=True, min=0.0); par.add('a', value=2.0, vary=True); par.add('c', value=2.0, vary=True)
    lmf = lmfit.minimize(sim, par, args=(dic[str(p)]['X'], dic[str(p)]['Yran']),method='leastsq')
    dic[str(p)]['fit']['par']= par
    dic[str(p)]['fit']['lmf']= lmf
    Yfit = sim(par,dic[str(p)]['X'])
    #Yfit2 = dic[str(p)]['Yran']+lmf.residual
    #print sum(Yfit-Yfit2), "Test for difference in two ways to get the fitted Y-values "
    dic[str(p)]['fit']['Yfit'] = Yfit
    #print "Best fit parameter for peak %s. %3.2f %3.2f %3.2f."%(p,par['b'].value,par['a'].value,par['c'].value),
    #print "Compare to real paramaters. %3.2f %3.2f %3.2f."%(pd[str(p)]['b'].value,pd[str(p)]['a'].value,pd[str(p)]['c'].value)
print "Done with fitting single peaks %s\n"%(datetime.now()-startTime)
#
# Make a selection flag, based on some test. Now a chisq value, but could be a Ftest between a simple and advanced model fit.
print "Make a test on chisqr %s"%(datetime.now()-startTime)
sel_p = []
for p in dic['peaks']:
    chisq = dic[str(p)]['fit']['lmf'].chisqr
    #chisq2 = sum((dic[str(p)]['fit']['Yfit']-dic[str(p)]['Yran'])**2)
    #print chisq - chisq2 "Test for difference in two ways to get chisqr"
    if chisq < limit:
        dic[str(p)]['Pval'] = 1.0
        #print "Peak %s passed test"%p
        sel_p.append(p)
    else:
        dic[str(p)]['Pval'] = False
print 'Done with test on chisqr %s\n'%(datetime.now()-startTime)
#print sel_p
#
# Global fitting
# Pick up x,y-values and parameters that passed the test
X_arr = []
Y_arr = []
P_arr = lmfit.Parameters(); P_arr.add('b', value=1.0, vary=True, min=0.0)
dic['gfit'] = {} # Make room for globat fit result
print "Prepare for global fit %s"%(datetime.now()-startTime)
for p in sel_p:
    par = dic[str(p)]['fit']['par']
    X_arr.append(dic[str(p)]['X'])
    Y_arr.append(dic[str(p)]['Yran'])
    P_arr.add('a%s'%p, value=par['a'].value, vary=True)
    P_arr.add('c%s'%p, value=par['c'].value, vary=True)
print "Doing global fit %s"%(datetime.now()-startTime)
lmf = lmfit.minimize(err_global, P_arr, args=(X_arr, Y_arr, sel_p),method='leastsq')
print "Done with global fit %s"%(datetime.now()-startTime)
dic['gfit']['par']= P_arr
dic['gfit']['lmf']= lmf
unpack_global(dic, sel_p) # Unpack the paramerts into the selected peaks
print "global fit unpacked %s \n"%(datetime.now()-startTime)
#
# Check result
#for p in sel_p:
#    ip= pd[str(p)]; sp = dic[str(p)]['fit']['par']; gp = dic[str(p)]['gfit']['par']
    #print p, "Single fit. %3.2f %3.2f %3.2f"%(sp['b'].value,sp['a'].value,sp['c'].value),
    #print "Global fit. %3.2f %3.2f %3.2f"%(gp['b'].value,gp['a'].value,gp['c'].value)
    #print p, "Single fit. %3.2f %3.2f %3.2f"%(sp['b'].value-ip['b'].value,sp['a'].value-ip['a'].value,sp['c'].value-ip['c'].value),
    #print "Global fit. %3.2f %3.2f %3.2f"%(gp['b'].value-ip['b'].value,gp['a'].value-ip['a'].value,gp['c'].value-ip['c'].value)##
#
# Start plotting
print "Making figure %s"%(datetime.now()-startTime)
fig = pl.figure()
sel_p = sel_p[:9]
for i in range(len(sel_p)):
    p = sel_p[i]
    # Create figure
    ax = fig.add_subplot('%s1%s'%(len(sel_p),i+1))
    X = dic[str(p)]['X']
    Y = dic[str(p)]['Y']
    Ymeas = dic[str(p)]['Yran']
    Yfit = dic[str(p)]['fit']['Yfit']
    Yfit_global = dic[str(p)]['gfit']['Yfit']
    rpar = pd[str(p)]
    fpar = dic[str(p)]['fit']['par']
    gpar = dic[str(p)]['gfit']['par']
    fchisq = dic[str(p)]['fit']['lmf'].chisqr
    gchisq = dic[str(p)]['gfit']['chisq']
    # plot
    ax.plot(X,Y,".-",label='real. Peak: %s'%p)
    ax.plot(X,Ymeas,'o',label='Measured (real+noise)')
    ax.plot(X,Yfit,'.-',label='leastsq fit. chisq:%3.3f'%fchisq)
    ax.plot(X,Yfit_global,'.-',label='global fit. chisq:%3.3f'%gchisq)
    # annotate
    ax.annotate('p%s. real    par: %1.3f %1.3f %1.3f'%(p, rpar['b'].value,rpar['a'].value,rpar['c'].value), xy=(1,1), xycoords='data', xytext=(0.4, 0.8), textcoords='axes fraction')
    ax.annotate('p%s. single  par: %1.3f %1.3f %1.3f'%(p, fpar['b'].value,fpar['a'].value,fpar['c'].value), xy=(1,1), xycoords='data', xytext=(0.4, 0.6), textcoords='axes fraction')
    ax.annotate('p%s. global  par: %1.3f %1.3f %1.3f'%(p, gpar['b'].value,gpar['a'].value,gpar['c'].value), xy=(1,1), xycoords='data', xytext=(0.4, 0.4), textcoords='axes fraction')
    # set title and axis name
    #ax.set_title('Fitting for peak %s'%p)
    ax.set_ylabel('Decay')
    # Put legend to the right
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height]) # Shink current axis by 20%
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5),prop={'size':8}) # Put a legend to the right of the current axis
    ax.grid('on')
ax.set_xlabel('Time')
#
print "ready to show figure %s"%(datetime.now()-startTime)
pl.show()










_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

ralfgommers
In reply to this post by Troels Emtekær Linnet



On Sat, Apr 6, 2013 at 5:40 PM, Troels Emtekær Linnet <[hidden email]> wrote:
Dear Scipy users.

I am doing analysis of some NMR data, where I repeatability are doing leastsq fitting.
But I get a little impatient for the time-consumption. For a run of my data, it takes 
approx 3-5 min, but it in this testing phase, it is to slow.

A look in my  task manager, show that I only consume 25%=1 core on my computer.
And I have access to a computer with 24 cores, so I would like to speed things up.
------------------------------------------------
I have been looking at the descriptions of multithreading/Multiprocess


But I hope someone can guide me, which of these two methods I should go for, and how to implement it?
I am little unsure about GIL, synchronisation?, and such things, which I know none about.

For the real data, I can see that I am always waiting for the call of the leastsq fitting.
How can start a pool of cores when I go through fitting?

Have a look at http://pythonhosted.org/joblib/parallel.html, that should allow you to use all cores without much effort. It uses multiprocessing under the hood. That's assuming you have multiple fits that can run in parallel, which I think is the case. I at least see some fits in a for-loop.

Ralf


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Troels Emtekær Linnet
And the winner was joblib :-)

Method was normal
Done :0:00:00.291000
[49990.0, 49991.0, 49992.0, 49993.0, 49994.0, 49995.0, 49996.0, 49997.0, 49998.0, 49999.0] <type 'numpy.float64'>

Method was Pool
Done :0:00:01.558000
[49990.0, 49991.0, 49992.0, 49993.0, 49994.0, 49995.0, 49996.0, 49997.0, 49998.0, 49999.0] <type 'numpy.float64'>

Method was joblib
Done :0:00:00.003000
[49990, 49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999] <type 'int'>

Method was joblib delayed
Done :0:00:00
[49990, 49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999] <type 'int'>

--------------------------------------
import multiprocessing
from datetime import datetime
from joblib import Parallel, delayed

def getsqrt(n):
    res = sqrt(n**2)
    return(res)

def main():
    jobs = multiprocessing.cpu_count()-1
    a = range(50000)
    for method in ['normal','Pool','joblib','joblib delayed']:
        startTime = datetime.now()
        sprint=True
        if method=='normal':
            res = []
            for i in a:
                b = getsqrt(i)
                res.append(b)
        elif method=='Pool':
            pool = Pool(processes=jobs)
            res = pool.map(getsqrt, a)
        elif method=='joblib':
            Parallel(n_jobs=jobs)
            func,res = (getsqrt, a)
        elif method=='joblib delayed':
            Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2' for all cores=-1
            func,res = delayed(getsqrt), a
        else:
            sprint=False
        if sprint:
            print "Method was %s"%method
            print "Done :%s"%(datetime.now()-startTime)
            print res[-10:], type(res[-1])
    return(res)

if __name__ == "__main__":
    res = main()

Troels Emtekær Linnet

2013/4/6 Ralf Gommers <[hidden email]>



On Sat, Apr 6, 2013 at 5:40 PM, Troels Emtekær Linnet <[hidden email]> wrote:
Dear Scipy users.

I am doing analysis of some NMR data, where I repeatability are doing leastsq fitting.
But I get a little impatient for the time-consumption. For a run of my data, it takes 
approx 3-5 min, but it in this testing phase, it is to slow.

A look in my  task manager, show that I only consume 25%=1 core on my computer.
And I have access to a computer with 24 cores, so I would like to speed things up.
------------------------------------------------
I have been looking at the descriptions of multithreading/Multiprocess


But I hope someone can guide me, which of these two methods I should go for, and how to implement it?
I am little unsure about GIL, synchronisation?, and such things, which I know none about.

For the real data, I can see that I am always waiting for the call of the leastsq fitting.
How can start a pool of cores when I go through fitting?

Have a look at http://pythonhosted.org/joblib/parallel.html, that should allow you to use all cores without much effort. It uses multiprocessing under the hood. That's assuming you have multiple fits that can run in parallel, which I think is the case. I at least see some fits in a for-loop.

Ralf


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Gael Varoquaux
On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
> Method was joblib delayed
> Done :0:00:00

Hum, this is fishy, isn't it?

>         elif method=='joblib delayed':
>             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2' for all
> cores=-1
>             func,res = delayed(getsqrt), a

I have a hard time reading your code, but it seems to me that you haven't
computed anything here, just instanciated to Parallel object.

You need to do:

    res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)

I would expect joblib to be on the same order of magnitude speed-wise as
multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
going to be more robust code than instanciating manually a Pool (deal
better with error, and optionally dispatching on-demand computation).

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Troels Emtekær Linnet

Thanks for pointing that out.
I did not understand the tuble way to call the function.

But now I get these results:
Why is joblib so slow?
And should I go for threading or processes?

-------------------------------
Method was normal
Done :0:00:00.040000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was multi Pool
Done :0:00:00.422000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was joblib delayed
Done :0:00:02.569000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was handythread
Done :0:00:00.582000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

------------------------------------------------------------------

import numpy as np
import multiprocessing
from multiprocessing import Pool
from datetime import datetime
from joblib import Parallel, delayed
#http://www.scipy.org/Cookbook/Multithreading?action=AttachFile&do=view&target=test_handythread.py
from handythread import foreach

def getsqrt(n):
    res = np.sqrt(n**2)
    return(res)

def main():
    jobs = multiprocessing.cpu_count()-1
    a = range(10000)
    for method in ['normal','multi Pool','joblib delayed','handythread']:
        startTime = datetime.now()
        sprint=True
        if method=='normal':
            res = []
            for i in a:
                b = getsqrt(i)
                res.append(b)
        elif method=='multi Pool':
            pool = Pool(processes=jobs)
            res = pool.map(getsqrt, a)
        elif method=='joblib delayed':
            res = Parallel(n_jobs=jobs)(delayed(getsqrt)(i) for i in a)
        elif method=='handythread':
            res = foreach(getsqrt,a,threads=jobs,return_=True)
        else:
            sprint=False
        if sprint:
            print "Method was %s"%method
            print "Done :%s"%(datetime.now()-startTime)
            print res[-10:], type(res[-1])
    return(res)

if __name__ == "__main__":
    res = main()

Troels

[hidden email]>

On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
> Method was joblib delayed
> Done :0:00:00

Hum, this is fishy, isn't it?

>         elif method=='joblib delayed':
>             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2' for all
> cores=-1
>             func,res = delayed(getsqrt), a

I have a hard time reading your code, but it seems to me that you haven't
computed anything here, just instanciated to Parallel object.

You need to do:

    res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)

I would expect joblib to be on the same order of magnitude speed-wise as
multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
going to be more robust code than instanciating manually a Pool (deal
better with error, and optionally dispatching on-demand computation).

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Gael Varoquaux
On Sun, Apr 07, 2013 at 02:11:07PM +0200, Troels Emtekær Linnet wrote:
> Why is joblib so slow?

I am not sure why joblib is slower than multiprocessing: it uses the same
core mechanism.

Anyhow, I think that your benchmark is not very interesting for practicle
use: it measures mostly the time it takes to create and spawn workers.
The problem in your benchmark is that the individual operations that you
are trying to run in parallel take virtually no time. You need to
dispatch long-running operations, otherwise the overhead of
parallelisation will kill any possible gain.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Daπid
In reply to this post by Troels Emtekær Linnet
This benchmark is poor because you are not taking into account many things that will happen in your real case. A quick glance at your code tells me (correct me if I am wrong) that you are doing some partial fitting (I think this is your parallelization target), and then a global fit of some sort. I don't know about these particular functions you are using, but you must be aware that several NumPy functions have a lot of optimizations under the hood, like automatic parallelization, and so on. Also, a very important issue here, specially having so many cores, is feeding data to the CPU: probably, a fair share of your computing time is spent with the CPU waiting for data to come in.

The performance of a Python program is quite unpredictable, as there are so many things going on. I think the best thing you can do is to profile your code, see where are the bottlenecks, and try with the different parallel methods *on that block* which one works best. Consider also how difficult is to program and debug it, I have had hard times struggling with multiprocessing on a very simple program until I got it working.

Regarding the difference between processes and threads: they are both executing in parallel, but a thread will be bounded to the Python GIL: only one line of Python will be executed at the time, but this does not apply to C code in NumPy, or system calls (waiting for data to be written to file). On the other hand, sharing data between threads is much cheaper than between processes. On the other hand, multiprocessing will trully execute them in parallel, using one core for each process, but creating a bigger overhead. I would say you want multiprocessing, but depending on how is time spent in your code, and how is NumPy releasing the GIL, you may actually get a better result with multithreading. Again, if you want to be sure, test it; but if your first try is good enough for you, you may as well leave it as it is.

BTW, if you want to read more about memory and parallelization, take a look at Francesc Alted's fantastic talk on the Advanced Scientific Python Course: https://python.g-node.org/python-summerschool-2012/starving_cpu , and apply if you can.


David.


On 7 April 2013 14:11, Troels Emtekær Linnet <[hidden email]> wrote:

Thanks for pointing that out.
I did not understand the tuble way to call the function.

But now I get these results:
Why is joblib so slow?
And should I go for threading or processes?

-------------------------------
Method was normal
Done :0:00:00.040000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was multi Pool
Done :0:00:00.422000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was joblib delayed
Done :0:00:02.569000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was handythread
Done :0:00:00.582000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

------------------------------------------------------------------

import numpy as np
import multiprocessing
from multiprocessing import Pool


from datetime import datetime
from joblib import Parallel, delayed
#http://www.scipy.org/Cookbook/Multithreading?action=AttachFile&do=view&target=test_handythread.py
from handythread import foreach

def getsqrt(n):
    res = np.sqrt(n**2)
    return(res)

def main():
    jobs = multiprocessing.cpu_count()-1
    a = range(10000)
    for method in ['normal','multi Pool','joblib delayed','handythread']:

        startTime = datetime.now()
        sprint=True
        if method=='normal':
            res = []
            for i in a:
                b = getsqrt(i)
                res.append(b)
        elif method=='multi Pool':

            pool = Pool(processes=jobs)
            res = pool.map(getsqrt, a)
        elif method=='joblib delayed':
            res = Parallel(n_jobs=jobs)(delayed(getsqrt)(i) for i in a)
        elif method=='handythread':
            res = foreach(getsqrt,a,threads=jobs,return_=True)

        else:
            sprint=False
        if sprint:
            print "Method was %s"%method
            print "Done :%s"%(datetime.now()-startTime)
            print res[-10:], type(res[-1])
    return(res)

if __name__ == "__main__":
    res = main()

Troels

[hidden email]>

On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
> Method was joblib delayed
> Done :0:00:00

Hum, this is fishy, isn't it?

>         elif method=='joblib delayed':
>             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2' for all
> cores=-1
>             func,res = delayed(getsqrt), a

I have a hard time reading your code, but it seems to me that you haven't
computed anything here, just instanciated to Parallel object.

You need to do:

    res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)

I would expect joblib to be on the same order of magnitude speed-wise as
multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
going to be more robust code than instanciating manually a Pool (deal
better with error, and optionally dispatching on-demand computation).

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Troels Emtekær Linnet
Thank you for all your answers. :-)

I think I am fit to understand and try some things now. 

Best
Troels


2013/4/7 Daπid <[hidden email]>
This benchmark is poor because you are not taking into account many things that will happen in your real case. A quick glance at your code tells me (correct me if I am wrong) that you are doing some partial fitting (I think this is your parallelization target), and then a global fit of some sort. I don't know about these particular functions you are using, but you must be aware that several NumPy functions have a lot of optimizations under the hood, like automatic parallelization, and so on. Also, a very important issue here, specially having so many cores, is feeding data to the CPU: probably, a fair share of your computing time is spent with the CPU waiting for data to come in.

The performance of a Python program is quite unpredictable, as there are so many things going on. I think the best thing you can do is to profile your code, see where are the bottlenecks, and try with the different parallel methods *on that block* which one works best. Consider also how difficult is to program and debug it, I have had hard times struggling with multiprocessing on a very simple program until I got it working.

Regarding the difference between processes and threads: they are both executing in parallel, but a thread will be bounded to the Python GIL: only one line of Python will be executed at the time, but this does not apply to C code in NumPy, or system calls (waiting for data to be written to file). On the other hand, sharing data between threads is much cheaper than between processes. On the other hand, multiprocessing will trully execute them in parallel, using one core for each process, but creating a bigger overhead. I would say you want multiprocessing, but depending on how is time spent in your code, and how is NumPy releasing the GIL, you may actually get a better result with multithreading. Again, if you want to be sure, test it; but if your first try is good enough for you, you may as well leave it as it is.

BTW, if you want to read more about memory and parallelization, take a look at Francesc Alted's fantastic talk on the Advanced Scientific Python Course: https://python.g-node.org/python-summerschool-2012/starving_cpu , and apply if you can.


David.



On 7 April 2013 14:11, Troels Emtekær Linnet <[hidden email]> wrote:

Thanks for pointing that out.
I did not understand the tuble way to call the function.

But now I get these results:
Why is joblib so slow?
And should I go for threading or processes?

-------------------------------
Method was normal
Done :0:00:00.040000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was multi Pool
Done :0:00:00.422000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was joblib delayed
Done :0:00:02.569000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

Method was handythread
Done :0:00:00.582000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0, 9999.0] <type 'numpy.float64'>

------------------------------------------------------------------

import numpy as np
import multiprocessing
from multiprocessing import Pool


from datetime import datetime
from joblib import Parallel, delayed
#http://www.scipy.org/Cookbook/Multithreading?action=AttachFile&do=view&target=test_handythread.py
from handythread import foreach

def getsqrt(n):
    res = np.sqrt(n**2)
    return(res)

def main():
    jobs = multiprocessing.cpu_count()-1
    a = range(10000)
    for method in ['normal','multi Pool','joblib delayed','handythread']:

        startTime = datetime.now()
        sprint=True
        if method=='normal':
            res = []
            for i in a:
                b = getsqrt(i)
                res.append(b)
        elif method=='multi Pool':

            pool = Pool(processes=jobs)
            res = pool.map(getsqrt, a)
        elif method=='joblib delayed':
            res = Parallel(n_jobs=jobs)(delayed(getsqrt)(i) for i in a)
        elif method=='handythread':
            res = foreach(getsqrt,a,threads=jobs,return_=True)

        else:
            sprint=False
        if sprint:
            print "Method was %s"%method
            print "Done :%s"%(datetime.now()-startTime)
            print res[-10:], type(res[-1])
    return(res)

if __name__ == "__main__":
    res = main()

Troels

[hidden email]>

On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
> Method was joblib delayed
> Done :0:00:00

Hum, this is fishy, isn't it?

>         elif method=='joblib delayed':
>             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2' for all
> cores=-1
>             func,res = delayed(getsqrt), a

I have a hard time reading your code, but it seems to me that you haven't
computed anything here, just instanciated to Parallel object.

You need to do:

    res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)

I would expect joblib to be on the same order of magnitude speed-wise as
multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
going to be more robust code than instanciating manually a Pool (deal
better with error, and optionally dispatching on-demand computation).

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Neal Becker
In reply to this post by Daπid
Daπid wrote:

...
> Regarding the difference between processes and threads:
...
> On the other hand, sharing data between threads is much cheaper than
> between processes.

I have to take issue with this statement.  Sharing data could suffer no overhead
at all, if you use shared memory for example.

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Gael Varoquaux
On Sun, Apr 07, 2013 at 01:11:09PM -0400, Neal Becker wrote:
> > Regarding the difference between processes and threads:
> ...
> > On the other hand, sharing data between threads is much cheaper than
> > between processes.

> I have to take issue with this statement.  Sharing data could suffer no
> overhead at all, if you use shared memory for example.

How do you use shared memory between processes?

There are solutions, but hardly any are easy to use. I'd even say that
most are very challenging, and the easiest option is to rely on memapped
arrays, but even that is a bit technical, and will clearly introduce
overhead.

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Neal Becker
Gael Varoquaux wrote:

> On Sun, Apr 07, 2013 at 01:11:09PM -0400, Neal Becker wrote:
>> > Regarding the difference between processes and threads:
>> ...
>> > On the other hand, sharing data between threads is much cheaper than
>> > between processes.
>
>> I have to take issue with this statement.  Sharing data could suffer no
>> overhead at all, if you use shared memory for example.
>
> How do you use shared memory between processes?
>
> There are solutions, but hardly any are easy to use. I'd even say that
> most are very challenging, and the easiest option is to rely on memapped
> arrays, but even that is a bit technical, and will clearly introduce
> overhead.
>
> Gaël

Why do you think memmaped arrays would introduce overhead?  The only overhead
should be if you have to add some sort of synchronization between writers and
readers (e.g., semaphores).  The actual data access is as fast as any other
memory access.

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

J. David Lee
In reply to this post by Gael Varoquaux
On 04/07/2013 12:25 PM, Gael Varoquaux wrote:
On Sun, Apr 07, 2013 at 01:11:09PM -0400, Neal Becker wrote:
Regarding the difference between processes and threads:
...
On the other hand, sharing data between threads is much cheaper than
between processes.

      
I have to take issue with this statement.  Sharing data could suffer no
overhead at all, if you use shared memory for example.
How do you use shared memory between processes?

There are solutions, but hardly any are easy to use. I'd even say that
most are very challenging, and the easiest option is to rely on memapped
arrays, but even that is a bit technical, and will clearly introduce
overhead.
I've used shared memory arrays in the past, and it's actually quite easy. They can be created using the multiprocessing module in a couple of lines,

mp_arr = multiprocessing.Array(ctypes.c_double, 100)
arr = np.frombuffer(mp_arr.get_obj())

I've wondered in the past why creating a shared memory array isn't a single line of code using numpy, as it can be so useful.

If you can, you might want to consider writing your code in a C module and using openMP if it works for you. I've had very good luck with that, and it's really easy to use.

David

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Gael Varoquaux
On Mon, Apr 08, 2013 at 07:44:20AM -0500, J. David Lee wrote:
> I've used shared memory arrays in the past, and it's actually quite easy. They
> can be created using the multiprocessing module in a couple of lines,

> mp_arr = multiprocessing.Array(ctypes.c_double, 100)
> arr = np.frombuffer(mp_arr.get_obj())

I believe that this does synchronization by message passing. Look at the
corresponding multiprocessing code if you want to convince yourself. Thus
you are not in fact sharing the memory between processes.

> I've wondered in the past why creating a shared memory array isn't a single
> line of code using numpy, as it can be so useful.

Because there is no easy cross-platform way of doing it.

> If you can, you might want to consider writing your code in a C module and
> using openMP if it works for you. I've had very good luck with that, and it's
> really easy to use.

In certain cases, I would definitaly agree with you here. Recent versions
of cython make that really easy.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Gael Varoquaux
In reply to this post by Neal Becker
On Mon, Apr 08, 2013 at 07:35:30AM -0400, Neal Becker wrote:
> > There are solutions, but hardly any are easy to use. I'd even say that
> > most are very challenging, and the easiest option is to rely on memapped
> > arrays, but even that is a bit technical, and will clearly introduce
> > overhead.

> Why do you think memmaped arrays would introduce overhead?

If you are able to instanciate the arrays that matter directly with a
memmap once and for all, I agree with you. Now, if you do something like
the example posted by the OP, in which the loop is very short-lived, then
chances are that the arrays will be allocated for the loop and
deallocated after. Then the creation of the memmap induces overhead.

> The only overhead should be if you have to add some sort of
> synchronization between writers and readers (e.g., semaphores).  The
> actual data access is as fast as any other memory access.

Granted, the data access is excellent. It's the creation/deletion that I
was talking about.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

Pauli Virtanen-3
In reply to this post by Gael Varoquaux
Hi,

08.04.2013 19:55, Gael Varoquaux kirjoitti:

> On Mon, Apr 08, 2013 at 07:44:20AM -0500, J. David Lee wrote:
>> I've used shared memory arrays in the past, and it's actually quite easy. They
>> can be created using the multiprocessing module in a couple of lines,
>
>> mp_arr = multiprocessing.Array(ctypes.c_double, 100)
>> arr = np.frombuffer(mp_arr.get_obj())
>
> I believe that this does synchronization by message passing. Look at the
> corresponding multiprocessing code if you want to convince yourself. Thus
> you are not in fact sharing the memory between processes.

I think this uses memory obtained from a mmap, which is shared between
the parent child processes.

--
Pauli Virtanen

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speeding things up - how to use more than one computer core

J. David Lee
In reply to this post by Gael Varoquaux


On 04/08/2013 11:55 AM, Gael Varoquaux wrote:
> On Mon, Apr 08, 2013 at 07:44:20AM -0500, J. David Lee wrote:
>> I've used shared memory arrays in the past, and it's actually quite easy. They
>> can be created using the multiprocessing module in a couple of lines,
>> mp_arr = multiprocessing.Array(ctypes.c_double, 100)
>> arr = np.frombuffer(mp_arr.get_obj())
> I believe that this does synchronization by message passing. Look at the
> corresponding multiprocessing code if you want to convince yourself. Thus
> you are not in fact sharing the memory between processes.
I think you are right, but it looks like you can fix that trivially by
replacing Array with RawArray.

David
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user