Does prompt make sense? #243

vwxyzjn · 2024-08-12T14:13:40Z

This PR includes a simple script to judge the quality of the SFT prompts in the dataset.

hamishivi · 2024-08-12T16:14:35Z

As a very basic first pass this makes sense, but I wonder if:
(a) we can be more specific / ask for fine-grained scores (e.g., coherence, length, etc.). I feel like recent llm-as-a-judge work is trending to more fine-grained scores (e.g. ultrafeedback finegrained vs overall).
(b) If we can make this more quantitative somehow. Can we have some analysis of false positives or similar? Or maybe some small val set that we can reason about and quality-check with?

hamishivi

Maybe we could make some sort of experimental folder for a rough script like this? Unsure, this sort of thing feels way more rough that e.g. the submit_eval/finetune scripts.

hamishivi · 2024-08-12T16:14:53Z

scripts/does_prompt_make_sense.py

+@dataclass
+class LLMJudgeConfig:
+    n: int = 64
+    model: str = "gpt-3.5-turbo-0125"


should we use 4o?

Does prompt make sense?

94aa4d7

vwxyzjn requested review from ValentinaPy, yizhongw and hamishivi August 12, 2024 14:13

hamishivi approved these changes Aug 12, 2024

View reviewed changes

push changes

656693a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does prompt make sense? #243

Does prompt make sense? #243

vwxyzjn commented Aug 12, 2024

hamishivi commented Aug 12, 2024

hamishivi left a comment

hamishivi Aug 12, 2024

Does prompt make sense? #243

Are you sure you want to change the base?

Does prompt make sense? #243

Conversation

vwxyzjn commented Aug 12, 2024

hamishivi commented Aug 12, 2024

hamishivi left a comment

Choose a reason for hiding this comment

hamishivi Aug 12, 2024

Choose a reason for hiding this comment