Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(example): add example csqa #189

Merged
merged 11 commits into from
Dec 31, 2024
Merged
29 changes: 20 additions & 9 deletions kag/common/benchmarks/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from typing import List
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed

from .evaUtils import get_em_f1
from .evaUtils import compare_summarization_answers
Expand Down Expand Up @@ -70,7 +72,7 @@ def getBenchMark(self, predictionlist: List[str], goldlist: List[str]):

def getSummarizationMetrics(self, queries: List[str], answers1: List[str], answers2: List[str], *,
api_key="EMPTY", base_url="http://127.0.0.1:38080/v1", model="gpt-4o-mini",
language="English", retries=3):
language="English", retries=3, max_workers=50):
"""
Calculates and returns QFS (query-focused summarization) evaluation metrics
for the given queries, answers1 and answers2.
Expand All @@ -87,17 +89,18 @@ def getSummarizationMetrics(self, queries: List[str], answers1: List[str], answe
model (str): model name to use when invoke the evaluating LLM.
language (str): language of the explanation
retries (int): number of retries
max_workers (int): number of workers

Returns:
dict: Dictionary containing the average metrics and the responses
generated by the evaluating LLM.
"""
responses = []
responses = [None] * len(queries)
all_keys = "Comprehensiveness", "Diversity", "Empowerment", "Overall"
all_items = "Score 1", "Score 2"
average_metrics = {key: {item: 0.0 for item in all_items} for key in all_keys}
success_count = 0
for index, (query, answer1, answer2) in enumerate(zip(queries, answers1, answers2)):
def process_sample(index, query, answer1, answer2):
metrics = compare_summarization_answers(query, answer1, answer2,
api_key=api_key, base_url=base_url, model=model,
language=language, retries=retries)
Expand All @@ -106,12 +109,20 @@ def getSummarizationMetrics(self, queries: List[str], answers1: List[str], answe
f" query: {query}\n"
f" answer1: {answer1}\n"
f" answer2: {answer2}\n")
responses.append(metrics)
if metrics is not None:
for key in all_keys:
for item in all_items:
average_metrics[key][item] += metrics[key][item]
success_count += 1
else:
responses[index] = metrics
return metrics
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = [executor.submit(process_sample, index, query, answer1, answer2)
for index, (query, answer1, answer2)
in enumerate(zip(queries, answers1, answers2))]
for future in tqdm(as_completed(futures), total=len(futures), desc="Evaluating: "):
metrics = future.result()
if metrics is not None:
for key in all_keys:
for item in all_items:
average_metrics[key][item] += metrics[key][item]
success_count += 1
if success_count > 0:
for key in all_keys:
for item in all_items:
Expand Down
3 changes: 3 additions & 0 deletions kag/examples/csqa/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ckpt/
/cs.jsonl
/solver/data/csqa_kag_answers.json
48 changes: 48 additions & 0 deletions kag/examples/csqa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# KAG Example: CSQA

The [UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain/tree/main)
``cs.jsonl`` dataset contains 10 documents in Computer Science and
100 questions with their answers about those documents.

Here we demonstrate how to build a knowledge graph for those documents,
generate answers to those questions with KAG and compare KAG generated
answers with those from other RAG systems.

## Steps to reproduce

1. Follow the Quick Start guide of KAG to install OpenSPG server and KAG.

2. (Optional) Download [UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain/tree/main)
``cs.jsonl`` and execute ``generate_data.py`` to generate data files in
``./builder/data`` and ``./solver/data``. Since the generated files
was committed, this step is optional.

3. Update ``llm`` and ``vectorizer_model`` configuration in ``kag_config.yaml``
properly. ``splitter`` and ``num_threads_per_chain`` may also be updated
to match with other systems.

4. Restore the KAG project.

```bash
knext project restore --host_addr http://127.0.0.1:8887 --proj_path .
```

5. Commit the schema.

```bash
knext schema commit
```

6. Execute ``indexer.py`` in the ``builder`` directory to build the knowledge graph.

7. Execute ``eval.py`` in the ``solver`` directory to generate the answers.

The results are saved to ``./solver/data/csqa_kag_answers.json``.

8. (Optional) Follow the LightRAG [Reproduce](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#reproduce)
steps to generate answers to the questions and save the results to
``./solver/data/csqa_lightrag_answers.json``. Since a copy was committed,
this step is optional.

9. Update LLM configurations in ``summarization_metrics.py`` and ``factual_correctness.py``
and execute them to get the metrics.
14 changes: 14 additions & 0 deletions kag/examples/csqa/builder/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright 2023 OpenSPG Authors
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied.

"""
Builder Dir.
"""
14 changes: 14 additions & 0 deletions kag/examples/csqa/builder/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright 2023 OpenSPG Authors
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied.

"""
Place the files to be used for building the index in this directory.
"""
Loading
Loading