# [SciPy-User] Molecular Index Calculation

6 messages
Open this post in threaded view
|

## [SciPy-User] Molecular Index Calculation

 I am a newcomer to Python and am attempting to translate a FORTRAN program that I wrote about 20 years ago into Python, specifically v-3.6. Let me say that I am not asking someone to write the program for me, but only to point me is the right direction. So far, I have bumbled my way to the point that I can get all of the input data resulting from a quantum mechanical calculation of a very simple organic molecule in to a Python program, but am encountering problems with processing the data. The equation that I a want to evaluate (the one that I programmed in FORTRAN) is equation (7) in the attached scan of one of the pages of the following document. Stephen P. Molnar and James W. King, Theory and Applications of the Integrated Molecular Transform and the Normalized Molecular Moment Structure Descriptors: QSAR and QSPR Paradigms, Int. J Quantum Chem., 85, 662 (2001). the equation is I(s) = sum from i=2toN sum from j=1toi-1 Z_sub_i*Z_sub_j *     sin  s*r_sub_ij    -----------------  Equation (7)         s*r_sub_ij What I have managed to do so far is for a simple organic molecule containing 5 atoms (CHFClBr): import numpy as np import numpy, pandas, scipy.spatial.distance as dist r=[]        #distances between pairs of atoms z = []      #atomic numbers of atoms Z_dist = [] #products of successive atomic numbers I = []      #vector calculated in equation  (7) start=1 finish=31 points=300 s = np.linspace(start,finish,points) name = input("Enter Molecule ID: ") name = str(name) name_in = name+'.dat' df = pandas.read_table(name_in, skiprows=2, sep=" ", skipinitialspace=True) z = numpy.array(df["ZA"]) N =  numpy.ma.size(z)  ## of atoms in molecule a = numpy.array(df[["X", "Y", "Z"]]) dist.squareform(dist.pdist(a, "euclidean")) anrows, ancols = np.shape(a) a_new = a.reshape(anrows, 1, ancols) diff = a_new - a r = (diff ** 2).sum(2) r = np.sqrt(r) for j in range(0,N-1):       for k in range(j+1,N):           Z_diff = (z[j]*z[k]) For the test molecule I get the following: MASS =  [ 12.011   1.008  79.9    35.453  18.998]  (Atomic weights of atoms) r:  [ 0.          2.059801    3.60937686  3.32591826  2.81569212]       [ 2.059801    0.          4.71452879  4.45776445  4.00467382]       [ 3.60937686  4.71452879  0.          5.66500917  5.26602175]       [ 3.32591826  4.45776445  5.66500917  0.          5.02324896]       [ 2.81569212  4.00467382  5.26602175  5.02324896  0.        ] z: 6.0 210.0 102.0 54.0 35.0 17.0 9.0 595.0 315.0 153.0 I have checked these calculations with a spreadsheet calculation, consequently I'm correct as far as I have gotten. However, my problem is calculating the value of I for the 300 x-coordinate values required for the molecular index calculation. The problem that I am currently having is calculating the product of the pre-sine function and the sine term for the 300 values of s. Unfortunately, the wealth of  function in Python makes it difficult for me to ascertain just how to proceed.  It seems that I have the problem of seeing the trees because of the size of the forest.  At this point Goggling only makes my confusion deeper.  Any pointers in the correct direction will be greatly appreciated. Thanks in advance. -- Stephen P. Molnar, Ph.D. Life is a fuzzy set www.molecular-modeling.net Stochastic and multivariate (614)312-7528 (c) Skype: smolnar1 _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user I_Formula.pdf (125K) Download Attachment
Open this post in threaded view
|

## Re: Molecular Index Calculation

 Hi Stephen,I see that N = 10 (ie. the length of Z), and that r is indexed by i and j, which run up to N. However, r is only a 5x5 array. Is this correct?Otherwise, something like this might work:import numpy as npstart=1finish=31points=300s = np.linspace(start, finish, points)MASS =  np.array([12.011, 1.008, 79.9, 35.453, 18.998])r = np.array([[0., 2.059801, 3.60937686, 3.32591826, 2.81569212],              [2.059801, 0., 4.71452879, 4.45776445, 4.00467382],              [3.60937686, 4.71452879, 0., 5.66500917, 5.26602175],              [3.32591826, 4.45776445, 5.66500917, 0., 5.02324896],              [2.81569212, 4.00467382, 5.26602175, 5.02324896, 0.]])r[r == 0.] = 0.0001Z = np.array([6.0, 210.0, 102.0, 54.0, 35.0, 17.0, 9.0, 595.0, 315.0, 153.0])N = Z.sizeI = np.zeros(s.shape)for i in range(1, N):    for j in range(j):        I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j])Do you have an example of the correct output?Best,HjalmarOn Sat, Apr 15, 2017 at 6:29 AM, Stephen P. Molnar wrote:I am a newcomer to Python and am attempting to translate a FORTRAN program that I wrote about 20 years ago into Python, specifically v-3.6. Let me say that I am not asking someone to write the program for me, but only to point me is the right direction. So far, I have bumbled my way to the point that I can get all of the input data resulting from a quantum mechanical calculation of a very simple organic molecule in to a Python program, but am encountering problems with processing the data. The equation that I a want to evaluate (the one that I programmed in FORTRAN) is equation (7) in the attached scan of one of the pages of the following document. Stephen P. Molnar and James W. King, Theory and Applications of the Integrated Molecular Transform and the Normalized Molecular Moment Structure Descriptors: QSAR and QSPR Paradigms, Int. J Quantum Chem., 85, 662 (2001). the equation is I(s) = sum from i=2toN sum from j=1toi-1 Z_sub_i*Z_sub_j *    sin  s*r_sub_ij   -----------------  Equation (7)        s*r_sub_ij What I have managed to do so far is for a simple organic molecule containing 5 atoms (CHFClBr): import numpy as np import numpy, pandas, scipy.spatial.distance as dist r=[]        #distances between pairs of atoms z = []      #atomic numbers of atoms Z_dist = [] #products of successive atomic numbers I = []      #vector calculated in equation  (7) start=1 finish=31 points=300 s = np.linspace(start,finish,points) name = input("Enter Molecule ID: ") name = str(name) name_in = name+'.dat' df = pandas.read_table(name_in, skiprows=2, sep=" ", skipinitialspace=True) z = numpy.array(df["ZA"]) N =  numpy.ma.size(z)  ## of atoms in molecule a = numpy.array(df[["X", "Y", "Z"]]) dist.squareform(dist.pdist(a, "euclidean")) anrows, ancols = np.shape(a) a_new = a.reshape(anrows, 1, ancols) diff = a_new - a r = (diff ** 2).sum(2) r = np.sqrt(r) for j in range(0,N-1):      for k in range(j+1,N):          Z_diff = (z[j]*z[k]) For the test molecule I get the following: MASS =  [ 12.011   1.008  79.9    35.453  18.998]  (Atomic weights of atoms) r:  [ 0.          2.059801    3.60937686  3.32591826  2.81569212]      [ 2.059801    0.          4.71452879  4.45776445  4.00467382]      [ 3.60937686  4.71452879  0.          5.66500917  5.26602175]      [ 3.32591826  4.45776445  5.66500917  0.          5.02324896]      [ 2.81569212  4.00467382  5.26602175  5.02324896 0.        ] z: 6.0 210.0 102.0 54.0 35.0 17.0 9.0 595.0 315.0 153.0 I have checked these calculations with a spreadsheet calculation, consequently I'm correct as far as I have gotten. However, my problem is calculating the value of I for the 300 x-coordinate values required for the molecular index calculation. The problem that I am currently having is calculating the product of the pre-sine function and the sine term for the 300 values of s. Unfortunately, the wealth of  function in Python makes it difficult for me to ascertain just how to proceed.  It seems that I have the problem of seeing the trees because of the size of the forest.  At this point Goggling only makes my confusion deeper.  Any pointers in the correct direction will be greatly appreciated. Thanks in advance. -- Stephen P. Molnar, Ph.D.                Life is a fuzzy set www.molecular-modeling.net              Stochastic and multivariate (614)312-7528 (c) Skype: smolnar1 _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Molecular Index Calculation

 On 04/15/2017 04:07 PM, Hjalmar Turesson wrote: > Hi Stephen, > > > I see that N = 10 (ie. the length of Z), and that r is indexed by i and > j, which run up to N. However, r is only a 5x5 array. Is this correct? > > Otherwise, something like this might work: > > import numpy as np > > start=1 > finish=31 > points=300 > s = np.linspace(start, finish, points) > > MASS =  np.array([12.011, 1.008, 79.9, 35.453, 18.998]) > > r = np.array([[0., 2.059801, 3.60937686, 3.32591826, 2.81569212], >                [2.059801, 0., 4.71452879, 4.45776445, 4.00467382], >                [3.60937686, 4.71452879, 0., 5.66500917, 5.26602175], >                [3.32591826, 4.45776445, 5.66500917, 0., 5.02324896], >                [2.81569212, 4.00467382, 5.26602175, 5.02324896, 0.]]) > > r[r == 0.] = 0.0001 > > Z = np.array([6.0, 210.0, 102.0, 54.0, 35.0, 17.0, 9.0, 595.0, 315.0, > 153.0]) > > N = Z.size > > I = np.zeros(s.shape) > > for i in range(1, N): >      for j in range(j): >          I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j]) > > > Do you have an example of the correct output? > > Best, > Hjalmar > > On Sat, Apr 15, 2017 at 6:29 AM, Stephen P. Molnar > <[hidden email] > wrote: > >     I am a newcomer to Python and am attempting to translate a FORTRAN >     program that I wrote about 20 years ago into Python, specifically v-3.6. > >     Let me say that I am not asking someone to write the program for me, >     but only to point me is the right direction. > >     So far, I have bumbled my way to the point that I can get all of the >     input data resulting from a quantum mechanical calculation of a very >     simple organic molecule in to a Python program, but am encountering >     problems with processing the data. > >     The equation that I a want to evaluate (the one that I programmed in >     FORTRAN) is equation (7) in the attached scan of one of the pages of >     the following document. > >     Stephen P. Molnar and James W. King, Theory and Applications of the >     Integrated Molecular Transform and the Normalized Molecular Moment >     Structure Descriptors: QSAR and QSPR Paradigms, Int. J Quantum >     Chem., 85, 662 (2001). > >     the equation is > >     I(s) = sum from i=2toN sum from j=1toi-1 Z_sub_i*Z_sub_j * > >         sin  s*r_sub_ij >        -----------------  Equation (7) >             s*r_sub_ij > >     What I have managed to do so far is for a simple organic molecule >     containing 5 atoms (CHFClBr): > >     import numpy as np >     import numpy, pandas, scipy.spatial.distance as dist > >     r=[]        #distances between pairs of atoms >     z = []      #atomic numbers of atoms >     Z_dist = [] #products of successive atomic numbers >     I = []      #vector calculated in equation  (7) > >     start=1 >     finish=31 >     points=300 >     s = np.linspace(start,finish,points) > > >     name = input("Enter Molecule ID: ") >     name = str(name) >     name_in = name+'.dat' > >     df = pandas.read_table(name_in, skiprows=2, sep=" ", >     skipinitialspace=True) >     z = numpy.array(df["ZA"]) >     N =  numpy.ma.size(z)  ## of atoms in molecule >     a = numpy.array(df[["X", "Y", "Z"]]) >     dist.squareform(dist.pdist(a, "euclidean")) >     anrows, ancols = np.shape(a) >     a_new = a.reshape(anrows, 1, ancols) >     diff = a_new - a > >     r = (diff ** 2).sum(2) >     r = np.sqrt(r) > >     for j in range(0,N-1): >           for k in range(j+1,N): >               Z_diff = (z[j]*z[k]) > > >     For the test molecule I get the following: > >     MASS =  [ 12.011   1.008  79.9    35.453  18.998]  (Atomic weights >     of atoms) > >     r:  [ 0.          2.059801    3.60937686  3.32591826  2.81569212] >           [ 2.059801    0.          4.71452879  4.45776445  4.00467382] >           [ 3.60937686  4.71452879  0.          5.66500917  5.26602175] >           [ 3.32591826  4.45776445  5.66500917  0.          5.02324896] >           [ 2.81569212  4.00467382  5.26602175  5.02324896 0 >     .        ] > >     z: > >     6.0 >     210.0 >     102.0 >     54.0 >     35.0 >     17.0 >     9.0 >     595.0 >     315.0 >     153.0 > >     I have checked these calculations with a spreadsheet calculation, >     consequently I'm correct as far as I have gotten. > >     However, my problem is calculating the value of I for the 300 >     x-coordinate values required for the molecular index calculation. > >     The problem that I am currently having is calculating the product of >     the pre-sine function and the sine term for the 300 values of s. > >     Unfortunately, the wealth of  function in Python makes it difficult >     for me to ascertain just how to proceed.  It seems that I have the >     problem of seeing the trees because of the size of the forest.  At >     this point Goggling only makes my confusion deeper.  Any pointers in >     the correct direction will be greatly appreciated. > >     Thanks in advance. > >     -- >     Stephen P. Molnar, Ph.D.                Life is a fuzzy set >     www.molecular-modeling.net >              Stochastic and multivariate >     (614)312-7528 (c) >     Skype: smolnar1 > > >     _______________________________________________ >     SciPy-User mailing list >     [hidden email] >     https://mail.python.org/mailman/listinfo/scipy-user>     > > > > > _______________________________________________ > SciPy-User mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/scipy-user> Hjalmar Thanks for your response. I'm sorry that my original request wasn't a bit clearer.  I presume that the paper I attached to the request help was deleted before the email was posted to the list. Here is how I incorporated your suggestion in the code. import numpy as np import numpy, pandas, scipy.spatial.distance as dist r = [] s=[] z = [] Z = [] I = [] start=1 finish=31 points=300 s = np.linspace(start,finish,points) np.savetxt('Number of Points',s,delimiter=' ') name = input("Enter Molecule ID: ") name = str(name) name_in = name+'.dat' df = pandas.read_table(name_in, skiprows=2, sep=" ", skipinitialspace=True) Z = numpy.array(df["ZA"]) print('z = ',z) N =  numpy.ma.size(z) a = numpy.array(df[["X", "Y", "Z"]]) dist.squareform(dist.pdist(a, "euclidean")) anrows, ancols = np.shape(a) a_new = a.reshape(anrows, 1, ancols) diff = a_new - a r = (diff ** 2).sum(2) r = np.sqrt(z) r[r == 0.] = 0.0001 N = Z.size I = np.zeros(s.shape) for i in range(1, N):      for j in range(j):          I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j])          print('I: ',I) np.savetxt('I',I,delimiter=' ') The for j in range(j): has a marginal note (I'm using the Spyder IDE) "undefined name 'y' " and I is a vector with three hundred 0.0 terms. also, please note that I removed 'Z = np.array([6.0, 210.0, 102.0, 54.0, 35.0, 17.0, 9.0, 595.0, 315.0, 153.0])' from your suggestion as that is the result of the per-sine term 'Z[i] * Z[j]' Regards,         Steve -- Stephen P. Molnar, Ph.D. Life is a fuzzy set www.molecular-modeling.net Stochastic and multivariate (614)312-7528 (c) Skype: smolnar1 _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Molecular Index Calculation

 I escaped a cluttered Fortran/C/Pascal/Basic/Perl jumble by "living" in stackoverflow. Mostly by reading questions and answers, and learning how not to get my questions closed in milliseconds. Break your program into small pieces, and look for how each piece might be done differently. If you try to find ways to do things without ever manually checking the sizes of arrays, you'll naturally be guided towards "pythonic" methods and techniques. There is almost always a better way.Computers are so fast these days, don't worry about ultimate speed until you really hit a brick wall. Letting python, numpy, or scipy methods handle housekeeping for you means you are usually defaulting to very well written compiled routines.So for example, let python do the thinking...r = np.sqrt((diff ** 2).sum(2))r[r < 0.0001] = 0.0001  # or justr = np.maximum(np.sqrt((diff ** 2).sum(2)), 0.0001)# andZij = Z * Z[:, None]triangle = np.triu(np.ones_like(Zij), k=1)I = Zij*trangle.sum()# or justI = np.triu(Z * Z[:, None], k=1).sum()On Sun, Apr 16, 2017 at 8:25 PM, Stephen P. Molnar wrote:On 04/15/2017 04:07 PM, Hjalmar Turesson wrote: Hi Stephen, I see that N = 10 (ie. the length of Z), and that r is indexed by i and j, which run up to N. However, r is only a 5x5 array. Is this correct? Otherwise, something like this might work: import numpy as np start=1 finish=31 points=300 s = np.linspace(start, finish, points) MASS =  np.array([12.011, 1.008, 79.9, 35.453, 18.998]) r = np.array([[0., 2.059801, 3.60937686, 3.32591826, 2.81569212],                [2.059801, 0., 4.71452879, 4.45776445, 4.00467382],                [3.60937686, 4.71452879, 0., 5.66500917, 5.26602175],                [3.32591826, 4.45776445, 5.66500917, 0., 5.02324896],                [2.81569212, 4.00467382, 5.26602175, 5.02324896, 0.]]) r[r == 0.] = 0.0001 Z = np.array([6.0, 210.0, 102.0, 54.0, 35.0, 17.0, 9.0, 595.0, 315.0, 153.0]) N = Z.size I = np.zeros(s.shape) for i in range(1, N):      for j in range(j):          I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j]) Do you have an example of the correct output? Best, Hjalmar On Sat, Apr 15, 2017 at 6:29 AM, Stephen P. Molnar <[hidden email] > wrote:     I am a newcomer to Python and am attempting to translate a FORTRAN     program that I wrote about 20 years ago into Python, specifically v-3.6.     Let me say that I am not asking someone to write the program for me,     but only to point me is the right direction.     So far, I have bumbled my way to the point that I can get all of the     input data resulting from a quantum mechanical calculation of a very     simple organic molecule in to a Python program, but am encountering     problems with processing the data.     The equation that I a want to evaluate (the one that I programmed in     FORTRAN) is equation (7) in the attached scan of one of the pages of     the following document.     Stephen P. Molnar and James W. King, Theory and Applications of the     Integrated Molecular Transform and the Normalized Molecular Moment     Structure Descriptors: QSAR and QSPR Paradigms, Int. J Quantum     Chem., 85, 662 (2001).     the equation is     I(s) = sum from i=2toN sum from j=1toi-1 Z_sub_i*Z_sub_j *         sin  s*r_sub_ij        -----------------  Equation (7)             s*r_sub_ij     What I have managed to do so far is for a simple organic molecule     containing 5 atoms (CHFClBr):     import numpy as np     import numpy, pandas, scipy.spatial.distance as dist     r=[]        #distances between pairs of atoms     z = []      #atomic numbers of atoms     Z_dist = [] #products of successive atomic numbers     I = []      #vector calculated in equation  (7)     start=1     finish=31     points=300     s = np.linspace(start,finish,points)     name = input("Enter Molecule ID: ")     name = str(name)     name_in = name+'.dat'     df = pandas.read_table(name_in, skiprows=2, sep=" ",     skipinitialspace=True)     z = numpy.array(df["ZA"])     N =  numpy.ma.size(z)  ## of atoms in molecule     a = numpy.array(df[["X", "Y", "Z"]])     dist.squareform(dist.pdist(a, "euclidean"))     anrows, ancols = np.shape(a)     a_new = a.reshape(anrows, 1, ancols)     diff = a_new - a     r = (diff ** 2).sum(2)     r = np.sqrt(r)     for j in range(0,N-1):           for k in range(j+1,N):               Z_diff = (z[j]*z[k])     For the test molecule I get the following:     MASS =  [ 12.011   1.008  79.9    35.453  18.998]  (Atomic weights     of atoms)     r:  [ 0.          2.059801    3.60937686  3.32591826  2.81569212]           [ 2.059801    0.          4.71452879  4.45776445  4.00467382]           [ 3.60937686  4.71452879  0.          5.66500917  5.26602175]           [ 3.32591826  4.45776445  5.66500917  0.          5.02324896]           [ 2.81569212  4.00467382  5.26602175  5.02324896 0     .        ]     z:     6.0     210.0     102.0     54.0     35.0     17.0     9.0     595.0     315.0     153.0     I have checked these calculations with a spreadsheet calculation,     consequently I'm correct as far as I have gotten.     However, my problem is calculating the value of I for the 300     x-coordinate values required for the molecular index calculation.     The problem that I am currently having is calculating the product of     the pre-sine function and the sine term for the 300 values of s.     Unfortunately, the wealth of  function in Python makes it difficult     for me to ascertain just how to proceed.  It seems that I have the     problem of seeing the trees because of the size of the forest.  At     this point Goggling only makes my confusion deeper.  Any pointers in     the correct direction will be greatly appreciated.     Thanks in advance.     --     Stephen P. Molnar, Ph.D.                Life is a fuzzy set     www.molecular-modeling.net              Stochastic and multivariate     (614)312-7528 (c)     Skype: smolnar1     _______________________________________________     SciPy-User mailing list     [hidden email]     https://mail.python.org/mailman/listinfo/scipy-user     _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user Hjalmar Thanks for your response. I'm sorry that my original request wasn't a bit clearer.  I presume that the paper I attached to the request help was deleted before the email was posted to the list. Here is how I incorporated your suggestion in the code. import numpy as np import numpy, pandas, scipy.spatial.distance as dist r = [] s=[] z = [] Z = [] I = [] start=1 finish=31 points=300 s = np.linspace(start,finish,points) np.savetxt('Number of Points',s,delimiter=' ') name = input("Enter Molecule ID: ") name = str(name) name_in = name+'.dat' df = pandas.read_table(name_in, skiprows=2, sep=" ", skipinitialspace=True) Z = numpy.array(df["ZA"]) print('z = ',z) N =  numpy.ma.size(z) a = numpy.array(df[["X", "Y", "Z"]]) dist.squareform(dist.pdist(a, "euclidean")) anrows, ancols = np.shape(a) a_new = a.reshape(anrows, 1, ancols) diff = a_new - a r = (diff ** 2).sum(2) r = np.sqrt(z) r[r == 0.] = 0.0001 N = Z.size I = np.zeros(s.shape) for i in range(1, N):     for j in range(j):         I += Z[i] * Z[j] * np.sin(s * r[i, j])/(s * r[i, j])         print('I: ',I) np.savetxt('I',I,delimiter=' ') The for j in range(j): has a marginal note (I'm using the Spyder IDE) "undefined name 'y' " and I is a vector with three hundred 0.0 terms. also, please note that I removed 'Z = np.array([6.0, 210.0, 102.0, 54.0, 35.0, 17.0, 9.0, 595.0, 315.0, 153.0])' from your suggestion as that is the result of the per-sine term 'Z[i] * Z[j]' Regards,         Steve -- Stephen P. Molnar, Ph.D.                Life is a fuzzy set www.molecular-modeling.net              Stochastic and multivariate (614)312-7528 (c) Skype: smolnar1 _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user