BUG: Segfault on `np.maximum(series, ...)` #60611

ssche · 2024-12-27T20:48:32Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
a = [-3.22, 4]
x = pd.Series(a)
np.maximum(x, 0, where=x > 2)

Issue Description

Segmentation fault (core dumped) when executing above code.

np.maximum(...) goes into an infinite call cycle which eventually exceeds the max. stack size.

Call stack (bottom up):

...
array_ufunc, arraylike.py:399
__array_ufunc__, generic.py:2171
array_ufunc, arraylike.py:399
__array_ufunc__, generic.py:2171
array_ufunc, arraylike.py:399
__array_ufunc__, generic.py:2171

`__array_ufunc__, generic.py:2171` (`core/generic.py`):

class NDFrame
    ...
    @final
    def __array_ufunc__(
        self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any
    ):
        return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)  <--

`array_ufunc, arraylike.py:399` (`core/arraylike.py`):


    elif self.ndim == 1:
        # ufunc(series, ...)
        inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
        result = getattr(ufunc, method)(*inputs, **kwargs)   <--
    else:
        # ufunc(dataframe)
        if method == "__call__" and not kwargs:

Expected Behavior

No recursion and successful execution of code. This used to work fine in pandas==2.1.1 (or perhaps even higher).

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.13.1
python-bits : 64
OS : Linux
OS-release : 6.12.5-200.fc41.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Sun Dec 15 16:48:23 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8

pandas : 2.2.3
numpy : 2.2.1
pytz : 2020.4
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : 3.0.11
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : 1.4.2
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : None
pyarrow : 18.1.0
pyreadstat : None
pytest : 8.3.4
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.1
sqlalchemy : None
tables : 3.10.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

rhshadrach · 2024-12-28T14:46:03Z

Thanks for the report, I am not able to get the example working on pandas 2.1.1. Can you post the environment details where you get this working?

Versions

INSTALLED VERSIONS
------------------
commit              : e86ed377639948c64c429059127bcf5b359ab6be
python              : 3.11.11.final.0
python-bits         : 64
OS                  : Linux
OS-release          : 6.8.0-49-generic
Version             : #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2
machine             : x86_64
processor           : x86_64
byteorder           : little
LC_ALL              : None
LANG                : en_US.UTF-8
LOCALE              : en_US.UTF-8

pandas              : 2.1.1
numpy               : 1.26.4
pytz                : 2024.2
dateutil            : 2.9.0.post0
setuptools          : 59.6.0
pip                 : 24.2
Cython              : 3.0.11
pytest              : 8.3.3
hypothesis          : 6.112.1
sphinx              : 8.0.2
blosc               : 1.11.2
feather             : None
xlsxwriter          : 3.2.0
lxml.etree          : 5.3.0
html5lib            : 1.1
pymysql             : 1.4.6
psycopg2            : 2.9.9
jinja2              : 3.1.4
IPython             : 8.27.0
pandas_datareader   : None
bs4                 : 4.12.3
bottleneck          : 1.4.0
dataframe-api-compat: None
fastparquet         : 2024.5.0
fsspec              : 2024.9.0
gcsfs               : 2024.9.0post1
matplotlib          : 3.9.2
numba               : 0.60.0
numexpr             : 2.10.1
odfpy               : None
openpyxl            : 3.1.5
pandas_gbq          : None
pyarrow             : 17.0.0
pyreadstat          : 1.2.7
pyxlsb              : 1.0.10
s3fs                : 2024.9.0
scipy               : 1.14.1
sqlalchemy          : 2.0.35
tables              : 3.10.1
tabulate            : 0.9.0
xarray              : 2024.9.0
xlrd                : 2.0.1
zstandard           : 0.23.0
tzdata              : 2024.1
qtpy                : None
pyqt5               : None

ssche · 2024-12-29T00:14:46Z

Interesting. It works for me, right off the bat. See this:

>>> import numpy as np
>>> import pandas as pd
>>> a = [-3.22, 4]
>>> x = pd.Series(a)
>>> np.maximum(x, 0, where=x > 2)
0    6.900705e-310
1     4.000000e+00
dtype: float64
>>> 
>>> pd.show_versions()
virtualenv/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit              : e86ed377639948c64c429059127bcf5b359ab6be
python              : 3.11.11.final.0
python-bits         : 64
OS                  : Linux
OS-release          : 6.12.5-200.fc41.x86_64
Version             : #1 SMP PREEMPT_DYNAMIC Sun Dec 15 16:48:23 UTC 2024
machine             : x86_64
processor           : 
byteorder           : little
LC_ALL              : None
LANG                : en_AU.UTF-8
LOCALE              : en_AU.UTF-8

pandas              : 2.1.1
numpy               : 1.24.3
pytz                : 2020.4
dateutil            : 2.8.2
setuptools          : 67.7.2
pip                 : 24.0
Cython              : 0.29.34
pytest              : 7.3.1
hypothesis          : None
sphinx              : None
blosc               : None
feather             : None
xlsxwriter          : 0.9.6
lxml.etree          : None
html5lib            : None
pymysql             : None
psycopg2            : 2.9.6
jinja2              : 2.11.2
IPython             : None
pandas_datareader   : None
bs4                 : None
bottleneck          : 1.3.5
dataframe-api-compat: None
fastparquet         : None
fsspec              : None
gcsfs               : None
matplotlib          : 3.9.2
numba               : None
numexpr             : 2.8.4
odfpy               : None
openpyxl            : 3.1.2
pandas_gbq          : None
pyarrow             : 11.0.0
pyreadstat          : None
pyxlsb              : None
s3fs                : None
scipy               : 1.10.1
sqlalchemy          : 1.3.23
tables              : 3.8.0
tabulate            : None
xarray              : None
xlrd                : 2.0.1
zstandard           : None
tzdata              : 2023.4
qtpy                : None
pyqt5               : None

I'm using numpy 1.24.3, while you tried with numpy 1.26.4. With numpy 1.26.4, I'm running into the same issue that I described (and which you are probably also experiencing with your venv).

ssche · 2024-12-29T00:21:36Z

I ran some tests with pandas 2.1.1 and the issue occurred first with numpy 1.25.0, so numpy 1.24.4 was the last version this has been working with pandas 2.1.1.

There's been some changes around __array_ufunc__ in numpy 1.25.0 which may have contributed to the regression. One I found which may be relevant is https://numpy.org/doc/stable/release/1.25.0-notes.html#array-likes-that-define-array-ufunc-can-now-override-ufuncs-if-used-as-where

If the where keyword argument of a numpy.ufunc is a subclass of numpy.ndarray or is a duck type that defines numpy.class.__array_ufunc__ it can override the behavior of the ufunc using the same mechanism as the input and output arguments. Note that for this to work properly, the where.__array_ufunc__ implementation will have to unwrap the where argument to pass it into the default implementation of the ufunc or, for numpy.ndarray subclasses before using super().__array_ufunc__.

Indeed, when I use straight numpy arrays instead of series for the where mask and the first argument, the problem goes away.

>>> import numpy as np
>>> import pandas as pd
>>> a = [-3.22, 4]
>>> x = pd.Series(a)
>>> np.maximum(x.values, 0, where=(x > 2).values)
array([0., 4.])

ssche added Bug Needs Triage Issue that has not been reviewed by a pandas team member Regression Functionality that used to work in a prior pandas version ufuncs __array_ufunc__ and __array_function__ labels Dec 27, 2024

rhshadrach added the Needs Info Clarification about behavior needed to assess issue label Dec 28, 2024

ssche removed the Needs Info Clarification about behavior needed to assess issue label Dec 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Segfault on `np.maximum(series, ...)` #60611

BUG: Segfault on `np.maximum(series, ...)` #60611

ssche commented Dec 27, 2024 •

edited

Loading

INSTALLED VERSIONS

rhshadrach commented Dec 28, 2024

ssche commented Dec 29, 2024

ssche commented Dec 29, 2024 •

edited

Loading

BUG: Segfault on np.maximum(series, ...) #60611

BUG: Segfault on np.maximum(series, ...) #60611

Comments

ssche commented Dec 27, 2024 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

__array_ufunc__, generic.py:2171 (core/generic.py):

array_ufunc, arraylike.py:399 (core/arraylike.py):

Expected Behavior

Installed Versions

INSTALLED VERSIONS

rhshadrach commented Dec 28, 2024

ssche commented Dec 29, 2024

ssche commented Dec 29, 2024 • edited Loading

BUG: Segfault on `np.maximum(series, ...)` #60611

BUG: Segfault on `np.maximum(series, ...)` #60611

ssche commented Dec 27, 2024 •

edited

Loading

`__array_ufunc__, generic.py:2171` (`core/generic.py`):

`array_ufunc, arraylike.py:399` (`core/arraylike.py`):

ssche commented Dec 29, 2024 •

edited

Loading