# Re: [Numpy-discussion] Why slicing Pandas column and then subtract gives NaN? Classic List Threaded 4 messages Open this post in threaded view
|

## Re: [Numpy-discussion] Why slicing Pandas column and then subtract gives NaN?

 This is more a question for the pandas list, but since i'm here i'll take a crack.numpy aligns arrays by position. pandas aligns by label.So what you did in pandas is roughly equivalent to the following:a = pandas.Series([85, 86, 87, 86], name='a').iloc[1:4].to_frame()b = pandas.Series([15, 72, 2, 3], name='b').iloc[0:3].to_frame()result = a.join(b,how='outer').assign(diff=lambda df: df['a'] - df['b'])print(result) ``` a b diff 0 NaN 15.0 NaN 1 86.0 72.0 14.0 2 87.0 2.0 85.0 3 86.0 NaN NaN So what I think you want would be the following: ``````a = pandas.Series([85, 86, 87, 86], name='a')b = pandas.Series([15, 72, 2, 3], name='b')result = a.subtract(b.shift()).dropna()print(result)1 71.0 2 15.0 3 84.0 dtype: float64 ``` On Wed, Feb 13, 2019 at 2:51 PM C W <[hidden email]> wrote:Dear list,I have the following to Pandas Series: a, b. I want to slice and then subtract. Like this: a[1:4] - b[0:3]. Why does it give me NaN? But it works in Numpy.Example 1: did not work>>>a = pd.Series([85, 86, 87, 86])>>>b = pd.Series([15, 72, 2, 3])>>> a[1:4]-b[0:3] 0   NaN 1   14.0 2   85.0 3   NaN>>> type(a[1:4])Example 2: workedIf I use values() method, it's converted to a Numpy object. And it works!>>> a.values[1:4]-b.values[0:3]array([71, 15, 84])>>> type(a.values[1:4])What's the reason that Pandas in example 1 did not work? Isn't Numpy built on top of Pandas? So, why is everything ok in Numpy, but not in Pandas?Thanks in advance! _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: [Numpy-discussion] Why slicing Pandas column and then subtract gives NaN?

 Maybe it's useful to look a bit more at what pandas is doing and why. The 'index' on a series or dataframe labels each row - e.g. if your series is measuring total sales for each day, its index would be the dates. When you combine (e.g. subtract) two series, pandas automatically lines up the indices. So it will join up the numbers for February 14th, even if they're not in the same position in the data.In your example, you haven't specified an index, so pandas generates an integer index which doesn't really mean anything, and aligning on it doesn't do what you want.What are you trying to do? If Numpy does exactly what you want, then the answer might be to use Numpy.> Isn't Numpy built on top of Pandas?It's the other way round: pandas is built on Numpy. Pandas indices are an extra layer of functionality on top of what Numpy does.On Thu, 14 Feb 2019 at 20:22, C W <[hidden email]> wrote:Hi Paul,Thanks for your response! I did not find a Pandas list for users, only for developers. I'd love to be on there.`result = a.subtract(b.shift()).dropna()`This seems verbose, several layers of parenthesis follow by a dot method. I'm new to Python, I thought Python code would be pity and short. Is this what everyone will write?Thank you!On Wed, Feb 13, 2019 at 6:50 PM Paul Hobson <[hidden email]> wrote:This is more a question for the pandas list, but since i'm here i'll take a crack.numpy aligns arrays by position. pandas aligns by label.So what you did in pandas is roughly equivalent to the following:a = pandas.Series([85, 86, 87, 86], name='a').iloc[1:4].to_frame()b = pandas.Series([15, 72, 2, 3], name='b').iloc[0:3].to_frame()result = a.join(b,how='outer').assign(diff=lambda df: df['a'] - df['b'])print(result) ``` a b diff 0 NaN 15.0 NaN 1 86.0 72.0 14.0 2 87.0 2.0 85.0 3 86.0 NaN NaN So what I think you want would be the following: ``````a = pandas.Series([85, 86, 87, 86], name='a')b = pandas.Series([15, 72, 2, 3], name='b')result = a.subtract(b.shift()).dropna()print(result)1 71.0 2 15.0 3 84.0 dtype: float64 ``` On Wed, Feb 13, 2019 at 2:51 PM C W <[hidden email]> wrote:Dear list,I have the following to Pandas Series: a, b. I want to slice and then subtract. Like this: a[1:4] - b[0:3]. Why does it give me NaN? But it works in Numpy.Example 1: did not work>>>a = pd.Series([85, 86, 87, 86])>>>b = pd.Series([15, 72, 2, 3])>>> a[1:4]-b[0:3] 0   NaN 1   14.0 2   85.0 3   NaN>>> type(a[1:4])Example 2: workedIf I use values() method, it's converted to a Numpy object. And it works!>>> a.values[1:4]-b.values[0:3]array([71, 15, 84])>>> type(a.values[1:4])What's the reason that Pandas in example 1 did not work? Isn't Numpy built on top of Pandas? So, why is everything ok in Numpy, but not in Pandas?Thanks in advance! _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
 Thanks a lot, Thomas. I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it! I also don’t what numpy methods wrapped within methods. They work, but hard do understand. How would you do it? In Matlab or R, it’s very simple, one line.   From: SciPy-User on behalf of Thomas Kluyver <[hidden email]> Sent: Thursday, February 14, 2019 4:54 PM To: SciPy Users List Cc: Discussion of Numerical Python Subject: Re: [SciPy-User] [Numpy-discussion] Why slicing Pandas column and then subtract gives NaN?   Maybe it's useful to look a bit more at what pandas is doing and why. The 'index' on a series or dataframe labels each row - e.g. if your series is measuring total sales for each day, its index would be the dates. When you combine (e.g. subtract) two series, pandas automatically lines up the indices. So it will join up the numbers for February 14th, even if they're not in the same position in the data. In your example, you haven't specified an index, so pandas generates an integer index which doesn't really mean anything, and aligning on it doesn't do what you want. What are you trying to do? If Numpy does exactly what you want, then the answer might be to use Numpy. > Isn't Numpy built on top of Pandas? It's the other way round: pandas is built on Numpy. Pandas indices are an extra layer of functionality on top of what Numpy does. On Thu, 14 Feb 2019 at 20:22, C W <[hidden email]> wrote: Hi Paul, Thanks for your response! I did not find a Pandas list for users, only for developers. I'd love to be on there. `result = a.subtract(b.shift()).dropna()` This seems verbose, several layers of parenthesis follow by a dot method. I'm new to Python, I thought Python code would be pity and short. Is this what everyone will write? Thank you! On Wed, Feb 13, 2019 at 6:50 PM Paul Hobson <[hidden email]> wrote: This is more a question for the pandas list, but since i'm here i'll take a crack. numpy aligns arrays by position. pandas aligns by label. So what you did in pandas is roughly equivalent to the following: a = pandas.Series([85, 86, 87, 86], name='a').iloc[1:4].to_frame() b = pandas.Series([15, 72, 2, 3], name='b').iloc[0:3].to_frame() result = a.join(b,how='outer').assign(diff=lambda df: df['a'] - df['b']) print(result) ``` a b diff 0 NaN 15.0 NaN 1 86.0 72.0 14.0 2 87.0 2.0 85.0 3 86.0 NaN NaN So what I think you want would be the following: ``` ```a = pandas.Series([85, 86, 87, 86], name='a')b = pandas.Series([15, 72, 2, 3], name='b')result = a.subtract(b.shift()).dropna()print(result)1 71.0 2 15.0 3 84.0 dtype: float64 ``` On Wed, Feb 13, 2019 at 2:51 PM C W <[hidden email]> wrote: Dear list, I have the following to Pandas Series: a, b. I want to slice and then subtract. Like this: a[1:4] - b[0:3]. Why does it give me NaN? But it works in Numpy. Example 1: did not work >>>a = pd.Series([85, 86, 87, 86]) >>>b = pd.Series([15, 72, 2, 3]) >>> a[1:4]-b[0:3] 0   NaN 1   14.0 2   85.0 3   NaN >>> type(a[1:4]) Example 2: worked If I use values() method, it's converted to a Numpy object. And it works! >>> a.values[1:4]-b.values[0:3] array([71, 15, 84]) >>> type(a.values[1:4]) What's the reason that Pandas in example 1 did not work? Isn't Numpy built on top of Pandas? So, why is everything ok in Numpy, but not in Pandas? Thanks in advance! _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
 > I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it!That sounds like you want Numpy. Pandas objects always have an index, even if it's the default integer index. You've already found how to extract a Numpy array from a pandas series.On Fri, 15 Feb 2019 at 06:02, Mike C <[hidden email]> wrote: Thanks a lot, Thomas. I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it! I also don’t what numpy methods wrapped within methods. They work, but hard do understand. How would you do it? In Matlab or R, it’s very simple, one line.   From: SciPy-User on behalf of Thomas Kluyver <[hidden email]> Sent: Thursday, February 14, 2019 4:54 PM To: SciPy Users List Cc: Discussion of Numerical Python Subject: Re: [SciPy-User] [Numpy-discussion] Why slicing Pandas column and then subtract gives NaN?   Maybe it's useful to look a bit more at what pandas is doing and why. The 'index' on a series or dataframe labels each row - e.g. if your series is measuring total sales for each day, its index would be the dates. When you combine (e.g. subtract) two series, pandas automatically lines up the indices. So it will join up the numbers for February 14th, even if they're not in the same position in the data. In your example, you haven't specified an index, so pandas generates an integer index which doesn't really mean anything, and aligning on it doesn't do what you want. What are you trying to do? If Numpy does exactly what you want, then the answer might be to use Numpy. > Isn't Numpy built on top of Pandas? It's the other way round: pandas is built on Numpy. Pandas indices are an extra layer of functionality on top of what Numpy does. On Thu, 14 Feb 2019 at 20:22, C W <[hidden email]> wrote: Hi Paul, Thanks for your response! I did not find a Pandas list for users, only for developers. I'd love to be on there. `result = a.subtract(b.shift()).dropna()` This seems verbose, several layers of parenthesis follow by a dot method. I'm new to Python, I thought Python code would be pity and short. Is this what everyone will write? Thank you! On Wed, Feb 13, 2019 at 6:50 PM Paul Hobson <[hidden email]> wrote: This is more a question for the pandas list, but since i'm here i'll take a crack. numpy aligns arrays by position. pandas aligns by label. So what you did in pandas is roughly equivalent to the following: a = pandas.Series([85, 86, 87, 86], name='a').iloc[1:4].to_frame() b = pandas.Series([15, 72, 2, 3], name='b').iloc[0:3].to_frame() result = a.join(b,how='outer').assign(diff=lambda df: df['a'] - df['b']) print(result) ``` a b diff 0 NaN 15.0 NaN 1 86.0 72.0 14.0 2 87.0 2.0 85.0 3 86.0 NaN NaN So what I think you want would be the following: ``` ```a = pandas.Series([85, 86, 87, 86], name='a')b = pandas.Series([15, 72, 2, 3], name='b')result = a.subtract(b.shift()).dropna()print(result)1 71.0 2 15.0 3 84.0 dtype: float64 ``` On Wed, Feb 13, 2019 at 2:51 PM C W <[hidden email]> wrote: Dear list, I have the following to Pandas Series: a, b. I want to slice and then subtract. Like this: a[1:4] - b[0:3]. Why does it give me NaN? But it works in Numpy. Example 1: did not work >>>a = pd.Series([85, 86, 87, 86]) >>>b = pd.Series([15, 72, 2, 3]) >>> a[1:4]-b[0:3] 0   NaN 1   14.0 2   85.0 3   NaN >>> type(a[1:4]) Example 2: worked If I use values() method, it's converted to a Numpy object. And it works! >>> a.values[1:4]-b.values[0:3] array([71, 15, 84]) >>> type(a.values[1:4]) What's the reason that Pandas in example 1 did not work? Isn't Numpy built on top of Pandas? So, why is everything ok in Numpy, but not in Pandas? Thanks in advance! _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user