UMAP n_neighbors must be greater than 1 #30

jeffreyzhanghc · 2024-04-06T23:13:57Z

Hi team, currently I am building with raptor to achieve the open-domain QA as following:
we have data stored as question-answer pair, and when user have a input query, I try to match the query with top-k most related questions asked in my data and concatenate their answer, and then use raptor to try to get a answer for the input query, but when the length of docs in RA.add_documents(docs) gets longer, it gives me "n_neighbors must be greater than 1" error for UMAP part at fit transform in this code chunk:
def global_cluster_embeddings(
embeddings: np.ndarray,
dim: int,
n_neighbors: Optional[int] = None,
metric: str = "cosine",
) -> np.ndarray:
if n_neighbors is None:
n_neighbors = int((len(embeddings) - 1) ** 0.5)
reduced_embeddings = umap.UMAP(
n_neighbors=n_neighbors, n_components=dim, metric=metric
).fit_transform(embeddings)
return reduced_embeddings
Is there any way to resolve UMAP issue in this case?

cuichenxu · 2024-04-10T03:07:45Z

Got the same case, have you solved it?

fatlism · 2024-04-10T03:46:32Z

I also encountered the same problem, is there any solution?

isConic · 2024-04-10T22:42:53Z

@jeffreyzhanghc
can you pinpoint where in the repo this line of code is?

jeffreyzhanghc · 2024-04-10T22:52:15Z

@cuichenxu @fatlism Hi, I have not totally understand the case yet, but my initial guess will be during the embedding process I use the original raptor model to train Chinese content, which in longer context yield to this bug very often, yet when I customize my embedding/summarization model for Chinese, this does not shows up for a while. My suggestion will be, if you are training longer text in different language, you might consider try a customized embedding methods specifically for that language, but I am not sure if that can solve the issue

jeffreyzhanghc · 2024-04-10T22:54:00Z

@jeffreyzhanghc can you pinpoint where in the repo this line of code is?

it is under raptor/cluster_utils.py, line 33

jeffreyzhanghc · 2024-04-10T22:55:53Z

@jeffreyzhanghc can you pinpoint where in the repo this line of code is?

and for the umap package it is in umap_.py line 2379 in .fit, and lead to error from line 1777 from _validate_parameters()

cuichenxu · 2024-04-11T00:49:06Z

@cuichenxu @fatlism Hi, I have not totally understand the case yet, but my initial guess will be during the embedding process I use the original raptor model to train Chinese content, which in longer context yield to this bug very often, yet when I customize my embedding/summarization model for Chinese, this does not shows up for a while. My suggestion will be, if you are training longer text in different language, you might consider try a customized embedding methods specifically for that language, but I am not sure if that can solve the issue

Hi, thanks for your insights!
I just use texts that include English only. And the embedding model is SBertEmbeddingModel in raptor/EmbeddingModels.py, and it still suffer this, I really do not understand why.

By the way, can you run this to satisfy your aims successfully? Could you please share your custom embedding model code? I tried to implement one, but an error occurred.....

fatlism · 2024-04-11T03:45:45Z

if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))

I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

cuichenxu · 2024-04-11T03:51:29Z

if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))
I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

How long does it take when the context is long?

fatlism · 2024-04-12T02:14:42Z

if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))
I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.
How long does it take when the context is long?

A single-threaded execution might take several hours.

lixinze777 · 2024-05-28T06:12:51Z

if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))
I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

I tried this solution and this is what i got:

File "/home/miniconda3/envs/lib/python3.8/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1605, in eigsh
raise TypeError("Cannot use scipy.linalg.eigh for sparse A with "
TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Wu-tn · 2024-07-22T05:24:05Z

if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))
I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.
I tried this solution and this is what i got:

File "/home/miniconda3/envs/lib/python3.8/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1605, in eigsh raise TypeError("Cannot use scipy.linalg.eigh for sparse A with " TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

I met the same wrong, how you handle it?

jsvan · 2024-07-22T15:23:54Z

I don't know about this bug specifically but found that updating the requirements list to download the current version of all reqs instead of legacy solved most of my problems.

Wu-tn · 2024-07-29T06:30:49Z

I don't know about this bug specifically but found that updating the requirements list to download the current version of all reqs instead of legacy solved most of my problems.

Can you list your requirements version and python version?

Wu-tn · 2024-07-29T07:17:21Z

if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))
I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

It seems that use code above will occur another error:
ValueError: n_components must be greater than 0

AnhLD2610 · 2024-08-22T10:52:03Z

I have the same problem. Can anyone fix it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UMAP n_neighbors must be greater than 1 #30

UMAP n_neighbors must be greater than 1 #30

jeffreyzhanghc commented Apr 6, 2024

cuichenxu commented Apr 10, 2024

fatlism commented Apr 10, 2024

isConic commented Apr 10, 2024

jeffreyzhanghc commented Apr 10, 2024

jeffreyzhanghc commented Apr 10, 2024

jeffreyzhanghc commented Apr 10, 2024

cuichenxu commented Apr 11, 2024

fatlism commented Apr 11, 2024

cuichenxu commented Apr 11, 2024

fatlism commented Apr 12, 2024

lixinze777 commented May 28, 2024

Wu-tn commented Jul 22, 2024

jsvan commented Jul 22, 2024

Wu-tn commented Jul 29, 2024

Wu-tn commented Jul 29, 2024

AnhLD2610 commented Aug 22, 2024

UMAP n_neighbors must be greater than 1 #30

UMAP n_neighbors must be greater than 1 #30

Comments

jeffreyzhanghc commented Apr 6, 2024

cuichenxu commented Apr 10, 2024

fatlism commented Apr 10, 2024

isConic commented Apr 10, 2024

jeffreyzhanghc commented Apr 10, 2024

jeffreyzhanghc commented Apr 10, 2024

jeffreyzhanghc commented Apr 10, 2024

cuichenxu commented Apr 11, 2024

fatlism commented Apr 11, 2024

cuichenxu commented Apr 11, 2024

fatlism commented Apr 12, 2024

lixinze777 commented May 28, 2024

Wu-tn commented Jul 22, 2024

jsvan commented Jul 22, 2024

Wu-tn commented Jul 29, 2024

Wu-tn commented Jul 29, 2024

AnhLD2610 commented Aug 22, 2024