-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #427 #428
base: main
Are you sure you want to change the base?
Closes #427 #428
Conversation
biodatasets/mediqa_ans/mediqa_ans.py
Outdated
def _source_to_t2t(self, example): | ||
example_ = {} | ||
example_["document_id"] = "" | ||
example_["text_1_name"] = "" | ||
example_["text_2_name"] = "" | ||
|
||
text1 = "" | ||
text1 += "Question ID: " + example["question_id"] + "\n" | ||
text1 += "Question: " + example["question"] + "\n" | ||
for article in example["articles"]: | ||
text1 += "Answer ID: " + article["answer_id"] + "\n" | ||
text1 += "Answer: " + article["text"] + "\n" | ||
text1 += "Rating: " + article["rating"] + "\n" | ||
example_["text_1"] = text1 | ||
|
||
example_["text_2"] = example["summary"] | ||
|
||
return example_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the transformation of the source data to fit the t2t schema.
Basically the summarization works like: question + answer -> summarized_answer so for t2t schema I concatenated all interesting values with "\n" for the value of text_1
.
An of example page2answer_single_abstractive
:
"1_Answer4": {
"summary": "Abetalipoproteimemia, also known as Bassen-Kornzweig syndrome, ... ",
"articles": " Bassen-Kornzweig syndrome Abetalipoproteinemia Acanthocytosis Apolipoprotein B deficiency...",
"question": "abetalipoproteimemia hi, I would like to know if there is any support for those suffering with abetalipoproteinemia? ...",
"question_id": "1",
"rating": "3-Incomplete"
}
where "1_Answer4" is answer_id
above and "articles" corresponds to article["text"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nomisto In the description part, can you add information about subset_id (and mediqa_ans_all implements only source)? Confirmed that all other 8 subset id pass unit tests
Hi @sunnnymskang , Sure, I've added a description to the value of _DESCRIPTION and the docstring. |
@nomisto Can you remind me why this fits the t2t schema better than question answering? We want to merge this PR asap; it looks mostly ok. |
Hi @hakunanatasha , the name of this dataset is a little misleading: It is a summarization task, more specifically an answer summarization task. So the input is question + answer and the task is to generate a summarization of that answer. |
@nomisto got it; I'll merge this later today. Sorry for the hold up. I assume since it's a summarization, the text-1/2-name are also blank as there is nothing to update here. |
Closes #427
Dataset contains 8 different subset_id's (different dataset settings), each with a
bigbio
andsource
schema.Furthermore there is an subset called
mediqa_ans_all
which includes all data (articles, sections, URLs of documents, all four different kinds of summaries, ...). I did not implement abigbio
schema for the all view as I think this does not make sense here. Since thebigbio
schema is missing forall
tests fail for subsetmediqa_ans_all
.Tests: