Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Benchmarks to agbench #3803

Merged
merged 17 commits into from
Oct 18, 2024

Conversation

husseinmozannar
Copy link
Contributor

Why are these changes needed?

Adding benchmarks back into autogen. Namely GAIA, Assistantebench, WebArena and HumanEval.

PR is a draft while we figure out licenses.

Related issue number

Related PR #3711 - was part of that but moved to separate PR to not block agbench

Checks

@husseinmozannar husseinmozannar marked this pull request as ready for review October 17, 2024 17:13
@husseinmozannar
Copy link
Contributor Author

@afourney added license files to benchmarks

@rysweet rysweet requested a review from afourney October 17, 2024 17:34
@afourney
Copy link
Member

Can we move the licenses into the folder where we are using code? So eval_utils, and common? We should also get sign-off from folks who know better how this is usually done. To be clear, these license don't apply to most of the code... only the small eval scripts.

@husseinmozannar
Copy link
Contributor Author

Can we move the licenses into the folder where we are using code? So eval_utils, and common? We should also get sign-off from folks who know better how this is usually done. To be clear, these license don't apply to most of the code... only the small eval scripts.

Done!

Copy link
Member

@afourney afourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good.

@ekzhu ekzhu merged commit e11d84b into microsoft:main Oct 18, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants