Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Benchmarks back into agbench and updates to agbench #3711

Merged
merged 30 commits into from
Oct 11, 2024

Conversation

husseinmozannar
Copy link
Contributor

Why are these changes needed?

Added benchmarks back into autogen, notably GAIA, AssistantBench, HumanEval and WebArena. Renamed everything to be consistent with new package, enabled creation of endpoint from environment file.

Modified readme of magentic-one with a few changes regarding installation instructions

Fixed renaming issues in agbench to be consistent with new package structure.

@husseinmozannar
Copy link
Contributor Author

@gagb and @afourney please review changes to agbench.

python/packages/agbench/dashboard_tabulate.py Fixed Show resolved Hide resolved
python/packages/agbench/dashboard_tabulate.py Fixed Show resolved Hide resolved
python/packages/agbench/dashboard_tabulate.py Fixed Show resolved Hide resolved
@rysweet rysweet requested review from afourney and gagb October 10, 2024 21:14
@husseinmozannar
Copy link
Contributor Author

should be good to merge!

@ekzhu ekzhu merged commit 373adc9 into microsoft:main Oct 11, 2024
30 of 31 checks passed
rysweet added a commit that referenced this pull request Oct 14, 2024
* CodeQL advanced config (#3736)

* CodeQL advanced config with dotnet build

* Update codeql.yml

---------

Co-authored-by: Ryan Sweet <[email protected]>

* The /python/benchmarks folder simply contained a note indicating that the benchmarks were moved. This commit deletes this note and directory. (#3735)

* Update quick start examples to illustrate how to set up model client completely (#3739)

* Update FunctionCallGenerator.cs to address race condition (#3758)

Update FunctionCallGenerator.cs to address file name race condition with simultaneous builds

* Skip Bing tests when no API key is present. (#3757)

* Skip Bing pytests when no API key is present.

* Fixed formatting.

---------

Co-authored-by: Ryan Sweet <[email protected]>

* Move docker code exec to autogen-ext (#3733)

* move docker code exec to autogen-ext

* fix test

* rename docker subpackage

* add missing renamed package

---------

Co-authored-by: Leonardo Pinheiro <[email protected]>

* Add initial extensions doc page (#3762)

* Update README.md (#3768)

* Adding Benchmarks back into agbench and updates to agbench (#3711)

* Correcting Typo in README.md (#3770)

* Make sure exceptions in process publish is logged. (#3774)

* Support structured output (#3732)

* Support structured output

* use ruff format

* add type checking for cookbook

* add the notebook to index.md

* fix the type error

* pass response_format explicitly

* remove casting

* ensure type are correct

* seperate response_format arg

* fix type and resolve pyright errors

---------

Co-authored-by: Eric Zhu <[email protected]>

* Update README.md (#3777)

* Update indexes for better navigation (#3779)

* Update indexes for better navigation

* Fix link

* Fix link

* Create a notebook to demonstrate handoff pattern (#3778)

* Add work in progress message to agentchat on home (#3784)

* Fill spelling mistake (#3786)

* Update README.md to clarify v2 vs v4 (#3785)

Co-authored-by: gagb <[email protected]>

---------

Co-authored-by: Jack Gerrits <[email protected]>
Co-authored-by: afourney <[email protected]>
Co-authored-by: Eric Zhu <[email protected]>
Co-authored-by: Max Golovanov <[email protected]>
Co-authored-by: Leonardo Pinheiro <[email protected]>
Co-authored-by: Leonardo Pinheiro <[email protected]>
Co-authored-by: Hussein Mozannar <[email protected]>
Co-authored-by: vikas434 <[email protected]>
Co-authored-by: Sunil Sattiraju <[email protected]>
Co-authored-by: gagb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants