Skip to content

Releases: ShishirPatil/gorilla

Berkeley Function Calling Leaderboard Updates (v1.2)

05 Jan 04:39
79b1c60
Compare
Choose a tag to compare

Highlights

🏆 Berkeley Function Calling Leaderboard V3 with Multi-step and Multi-turn function call evaluation

What's Changed

Read more

Berkeley Function Calling Leaderboard Updates (v1.1)

27 Aug 06:13
3850c2b
Compare
Choose a tag to compare

Highlights

🏆 Berkeley Function Calling Leaderboard V2 along with Live data

What's Changed

Full Changelog: v1.0...v1.1

Berkeley Function Calling Leaderboard Updates (v1.0)

15 Aug 04:35
9df5c34
Compare
Choose a tag to compare

Highlights

🏆 We are thrilled to announce the stable v1.0 release of the Berkeley Function Calling Leaderboard data-set and eval-pipeline! A heartfelt thank you to all our contributors and users for your enthusiastic engagement and support throughout v1. We are just getting started! Buckle-up for v2 🚀 🚀 🚀

What's Changed

  • better handle float value comparison by @vandyxiaowei in #407
  • Bump pymysql from 1.1.0 to 1.1.1 in /goex by @dependabot in #453
  • Fixes For NexusHandler by @VenkatKS in #437
  • [BFCL] PR#407 Evaluation Pipeline Robustness Patch by @HuanzhiMao in #462
  • Add firefunction-v2 to the leaderboard by @pgarbacki in #470
  • [BFCL] Add Claude 3.5 Sonnet Function Calling Infernece Inference by @Fanjia-Yan in #480
  • [BFCL] Standardize Model Name Among handler_map and eval_runner_helper by @HuanzhiMao in #439
  • Remove redundant tokens from GPT-handler by @hellovai in #490
  • [GoEx] Undo Minor Bug Fix + README Minor Improvement by @royh02 in #468
  • [BFCL] Add ability to evaluate Nemotron-4-340B-Instruct by @Fanjia-Yan in #489
  • fix some data issues in parallel/parallel multiple answers by @vandyxiaowei in #423
  • [BFCL] Add Support for GLM-4-9B function calling inference by @Fanjia-Yan in #474
  • [BFCL] Sanity check is now optional by @ShishirPatil in #496
  • [BFCL] Improved tree-sitter java, javascript installation by @CharlieJCJ in #505
  • [BFCL] Fix Possible Answer for AST Parallel and Parallel_Multiple Category by @HuanzhiMao in #503
  • [BFCL] Add Test Dataset to Repository by @HuanzhiMao in #504
  • [BFCL] Support Category-Specific Generation for OSS Model, Remove eval_data_compilation Step by @HuanzhiMao in #512
  • [BFCL] Fix Double-Casting Issue in model_handler for Java and JS category. by @HuanzhiMao in #516
  • [BFCL] Fix Dataset Issue for executable_parallel_multiple Category by @HuanzhiMao in #522
  • [BFCL] add ibm-granite-20b-functioncallling model by @MayankAgarwal in #525
  • [BFCL] Overhaul apply_function_credential_config.py for Enhanced Usability by @HuanzhiMao in #508
  • Fixed the warning message "Setting pad_token_id to eos_token_id:1… by @dineshkumarsarangapani in #110
  • [BFCL] Specify package version in requirements.txt by @HuanzhiMao in #515
  • [BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py by @HuanzhiMao in #506
  • fix line return by @fantasist in #531
  • [BFCL] Apply Fix to Newly Introduced Model Handler Missed in Previous PR Merge by @HuanzhiMao in #536
  • [RAFT] Fix Datapoint Field in Formatter for Data Generation by @HuanzhiMao in #535
  • [BFCL] Fix language_specific_pre_processing for Java and JavaScript Test Category by @HuanzhiMao in #538
  • [BFCL] Patch Generation Script for Locally Hosted OSS model by @HuanzhiMao in #537
  • [BFCL] Support Multi-Model Multi-Category Generation; Add Index to Dataset; Handle vLLM Benign Error by @HuanzhiMao in #540
  • Add NousResearch/{Hermes-2-Pro-Llama-3-8B,Hermes-2-Theta-Llama-3-8B} models by @alonsosilvaallende in #542
  • [BFCL] Fix Dataset Pre-Processing for Java and JavaScript Test Category, Part 2 by @HuanzhiMao in #545
  • Add Salesforce xLAM handler and fix minor issues by @zuxin666 in #532
  • Add NousResearch/Hermes-2-{Pro-Llama-3-80B,Theta-Llama-3-80B} by @alonsosilvaallende in #556
  • Add Yi Handler by @fantasist in #543
  • Add more descriptive error message in eval_runner.py by @alonsosilvaallende in #552
  • [BFCL] Fix JS type converter to handle dictionaries with array values by @CharlieJCJ in #549
  • [BFCL] Handling rate limits by @ShishirPatil in #559
  • [BFCL] Fix Dataset and Possible Answer Issue by @HuanzhiMao in #557
  • [BFCL] Dataset Question Fix for Executable Parallel Category by @HuanzhiMao in #568
  • [BFCL] Add New Model gpt-4o-2024-08-06, gpt-4o-mini-2024-07-18 by @HuanzhiMao in #569
  • [BFCL] Add New Model open-mistral-nemo-2407, open-mixtral-8x22b, open-mixtral-8x7b by @HuanzhiMao in #570
  • [BFCL] Improve Warning Message when Aggregating Results by @HuanzhiMao in #517
  • [BFCL] Add New Model functionary-small-v3.1, functionary-small-v3.2, functionary-medium-v3.1; Update Token Price by @HuanzhiMao in #573
  • [BFCL] Set Model Temperature to 0.001 for All Models by @HuanzhiMao in #574
  • [BFCL] Support Parallel Inference for Hosted Models by @HuanzhiMao in #571
  • [BFCL Chore] Fix Functionary Medium 3.1 model name & add readme parallel inference by @CharlieJCJ in #577

New Contributors

Full Changelog: v0.3...v1.0

GoEx and Berkeley Function Calling Leaderboard Updates

05 Jun 05:43
33cabef
Compare
Choose a tag to compare

😍 v0.3 release 🚀

Highlights

⚡️ Released GoEx: A runtime that presents abstractions for safe execution of LLM generated code, APIs, actions, etc

🏆 Updates to Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard) : Newer models including GPT-4o, gemini-flash and 1.5-pro, Hermes-2-Pro, etc. All measured along P95 and P99 latency, and costs besides accuracy.

What's Changed

New Contributors

Full Changelog: v0.2...v0.3

RAFT and Berkeley Function Calling Leaderboard Updates

11 Apr 03:38
e23476b
Compare
Choose a tag to compare

😍 v0.2 release 🚀

Highlights

🎯 Berkeley Function Calling Leaderboard (BFCL): How do models stack up for function calling?

  • Now includes latency and cost
  • More open-source and closed-source models
  • Bug fixes in dataset.

RAFT: Fine-tuning technique to improve LLMs for in-domain RAG!

What's Changed

New Contributors

Full Changelog: v0.1...v0.2

Gorilla v0.1: OpenFunctions-v2, Berkeley Function Calling Leaderboard, and more.

12 Mar 07:56
5cb5213
Compare
Choose a tag to compare

😍 v0.1 release 🚀

Highlights

  • 🎯 Berkeley Function Calling Leaderboard (BFCL): How do models stack up for function calling? Evaluation code for the Berkeley Function Calling Leaderboard.
  • 🏆 Gorilla OpenFunctions v2: Inference examples for OpenFunctions-v2 - SoTA open-source LLM for function calling. On-par with GPT-4 🙌 Supports more languages 👌.
  • API Zoo Index: An accessible collection of API documentation for humans to search through, and for LLMs to use as tools 👀

We are excited about our long due v0.1 release! Here's more:

What's Changed

New Contributors

Full Changelog: v0.0.1...v0.1

Gorilla release v0.0.1

18 Jul 08:14
29f5ffb
Compare
Choose a tag to compare

🦍 Gorilla: An API store for LLMs 🚀

🚀 After 50,000 user requests through our hosted APIs, we are happy to tear the first release for Gorilla 💪

🤩 In this release:

💻 gorilla-cli, LLMs for your CLI!
🟢 Commercially usable, Apache 2.0 licensed Gorilla models
🚀 CLI interface to chat with Gorilla!
🚀 Torch Hub and TensorFlow Hub Models!
🚀 The first Gorilla model! Colab or 🤗!
🔥 APIZoo contribution guide for community API contributions!
🔥 APIBench dataset and the evaluation code of Gorilla!