APP generating unexpected number of testing points #26

lucasnil · 2023-12-18T14:51:46Z

lucasnil
Dec 18, 2023

Hello. I'm having some trouble trying to figure out why I'm getting an unexpected number of samples from APP.
I have n_classe=4, n_prevpoints=21, and n_repeats=10, which should result in 17710 testing points according to:

import quapy.functional as F
n_prevpoints = 21
n_classes = 4

F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=10)

So here is a piece of my code:

from quapy.method.aggregative import CC
collection = dataset_prep

model = CC(newLR())

for run, data in enumerate(qp.data.Dataset.kFCV(collection, nfolds=3, nrepeats=1, random_state=0)):
# model selection (hyperparameter optimization for a quantification-oriented loss)
###Feature Extraction
data = qp.data.preprocessing.text2tfidf(data)
train, test = data.train_test
_, val = train.split_stratified(random_state=0, train_prop=0.7)
model.fit(train)

    # model evaluation
    true_prevalences, estim_prevalences = qp.evaluation.prediction(
        model,
        protocol=APP(test, n_prevalences=21, repeats=10),
        aggr_speedup=False
    )

So, this code is a slight adaption of a code found in https://github.com/HLT-ISTI/QuaPy/blob/master/examples/uci_experiments.py

My problem is that my code returns 17570 testing points verified by the resulting len(true_prevalences), differently than the expected 17710.
I tried a bunch of things but did not manage to fix or understand why this is happening.

Even if set n_repeats=1 it still returns 1757 testing points with 4 classes and n_prevalences=21

I'll be very grateful if someone could help me.

Answered by AlexMoreo

Dec 18, 2023

You are right! The problem was that quapy check's for combinations of values that generate plausible prevalence vectors, i.e., prevance vectors summing up to 1. Unfortunately, when you work with floats, it sometimes happens that the rounding error accumulates and produces values that were slightly >1 (e.g., 1.000000001) resulting some values to be discarded. I have fixed it now in the master branch (thanks for noticing it!).
In any case, the APP protocol is falling into disuse in favor of modern protocols like the UPP. This protocol implements the Kraemer sampling algorithm for yielding samples with prevalence values uniformly distributed. In UPP you can specify how many samples you want …

View full answer

AlexMoreo · 2023-12-18T16:24:22Z

AlexMoreo
Dec 18, 2023
Maintainer

You are right! The problem was that quapy check's for combinations of values that generate plausible prevalence vectors, i.e., prevance vectors summing up to 1. Unfortunately, when you work with floats, it sometimes happens that the rounding error accumulates and produces values that were slightly >1 (e.g., 1.000000001) resulting some values to be discarded. I have fixed it now in the master branch (thanks for noticing it!).
In any case, the APP protocol is falling into disuse in favor of modern protocols like the UPP. This protocol implements the Kraemer sampling algorithm for yielding samples with prevalence values uniformly distributed. In UPP you can specify how many samples you want to generate, so you are not constrained to generate all possible combinations (in your case, are simply too many).
Hope this helps!

1 reply

lucasnil Dec 18, 2023
Author

Nice! Thanks for the advice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APP generating unexpected number of testing points #26

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

APP generating unexpected number of testing points #26

lucasnil Dec 18, 2023

Replies: 1 comment · 1 reply

AlexMoreo Dec 18, 2023 Maintainer

lucasnil Dec 18, 2023 Author

lucasnil
Dec 18, 2023

Replies: 1 comment 1 reply

AlexMoreo
Dec 18, 2023
Maintainer

lucasnil Dec 18, 2023
Author