-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEATURE: Tool name validation #842
base: main
Are you sure you want to change the base?
Conversation
32252ee
to
0d8c3a4
Compare
|
||
private | ||
|
||
def check_name_availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add a unique index to ensure uniqueness ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll double-check it later, but we did have that unique option on that column in the first place https://github.com/discourse/discourse-ai/blob/main/db/migrate/20240618080148_create_ai_tools.rb#L6, it's highly likely that we used the wrong option syntax for it, it should be
t.string :name, index: { unique: true }
I think we may need a bit more of a refactor here, technically you may want 2 tools called "search" but then at the persona level pick the particular search you are after. So maybe we want an extra field here (tool_name tool name presented to LLM):
Then the "persona" save should ensure that the subset of tools you picked have distinct names. This also fixes "built-in" and "byo" clashes. I am happy though with being more strict about what letters we allow for "tool_name" cause it will generally be safer to keep to a stricter amount of rules here. |
config/locales/client.en.yml
Outdated
name: "Name" | ||
name_help: "The unique name of the tool as used by the language model" | ||
tool_name: "Tool Name" | ||
tool_name_help: "Tool name, only include numbers, letters, and underscores" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you think there is a better copy for this tool_name_help and also the name_help
copy as its value isn't unique actually.
app/models/ai_tool.rb
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
class AiTool < ActiveRecord::Base | |||
validates :name, presence: true, length: { maximum: 100 } | |||
validates :tool_name, presence: true, length: { maximum: 100 }, uniqueness: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tool_name should not be unique... it is a constraint on "persona" not on tool, the same tool name can exist multiple times with no problem
app/models/ai_tool.rb
Outdated
@@ -202,7 +222,8 @@ def self.presets | |||
}, | |||
{ | |||
preset_id: "stock_quote", | |||
name: "stock_quote", | |||
name: "Stock Quote", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stock Quote (AlphaVantage) - feels like a better name.
its getting there, but we need to be clearer on what "name" means and what "tool_name" means. Name will show up in the Discourse UI and is the short identifier you will use to find the tool in various settings, it should be distinct (it is required) Tool Name is presented to the large language model. It is not distinct, but it is distinct per persona. (persona validates on save) I would say we don't allow nulls in either columns and during the migration just copy over old Name into Tool Name. We probably want a distinct constraint on Name which means we need to append (1,2,3 etc) onto name during the migration if there are dupes. |
I am not sure we need any of the "check_name" stuff though? we will just pick up issues on save. '' the big thing missing though is the validation on save for ai_persona, that is where we want to make sure we did not add any duplicate tool names. |
I’m on board with you regarding the The purpose of this PR is to ensure we don’t have any invalid function names when invoking them via the LLM provider. As for the uniqueness tool validation on save for ai_persona, that feels like a separate matter to me. But challenge accepted! I’ll take care of it—it’s actually quite fun. |
@@ -10,7 +10,6 @@ def tools | |||
Tools::Google, | |||
Tools::Image, | |||
Tools::Read, | |||
Tools::Image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha moment.
Hello @SamSaffron, would it be possible for you to review my latest changes in this PR once again? This would ensure that we're on the same page about what we've been discussing above. |
app/models/ai_tool.rb
Outdated
@@ -251,7 +272,8 @@ def self.presets | |||
}, | |||
{ | |||
preset_id: "image_generation", | |||
name: "image_generation", | |||
name: "Image Generation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image Generation (Flux)
Its looking great to me... it is getting a bit late on a Friday, will merge it first thing next week. |
- Add unique index to the name column of the ai_tools table - correct our tests for AiToolController - tool_name field which will be used to represent to LLM - Add tool_name to Tools's presets - Add duplicate tools validation for AiPersona - Add unique constraint to the name column of the ai_tools table
2a8b067
to
5f797ed
Compare
was about to merge this but the validation is not looking right to me. The "name" stored in Persona should be:
The duplication check is against tool_name. So in the case of something like The UI should show Name everywhere. We should have a test to cover this so it is clear how the validation works. |
Could we adjust |
we can migrate the information on tools so it is clearer, but its a very painful refactor: Eg:
Storing name though is a no-go cause we would too easily drift (when you rename in the tools UI we would be stuck amending internal cached refs in persona.) I think the only easy way of validating easily is loading up the information from custom tool in a batch during validate. |
Even if we refactor the custom tool data structure saved in the Persona, as in your example, we would still need to load the custom tool data to retrieve their You're right that adjusting the inner_name to include tool_name is a no-go. I’ll load the custom tool data in batches and handle the validation that way. Does that work for you? However, I’m also open to a more robust solution in this PR since there’s no rush to merge it. |
Yes sounds good! |
Given that most LLM providers have their own function name formats (except for Ollama, as far as I have tested), they typically accept alphanumeric values (only including characters, numbers, and underscores). If we create a custom tool with the wrong name format, we won’t be able to invoke these tools. Other built-in tools we have won’t encounter this problem since their names are valid.
Ref:
https://ai.google.dev/gemini-api/docs/function-calling#key-parameters-best-practices
https://meta.discourse.org/t/validate-the-name-field-for-ai-tools/330830
In this PR, we introduce the tool name validation in: