Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add evaluation root project #501

Merged
merged 3 commits into from
May 18, 2024
Merged

Conversation

kbeaugrand
Copy link
Contributor

@kbeaugrand kbeaugrand commented May 18, 2024

Implementing Evaluation Based on RAGAS Framework

Description

This pull request marks the beginning of our implementation of evaluation metrics for our Retrieval Augmented Generation (RAG) pipelines using the RAGAS framework.

Background

RAGAS (RAG Assessment) is a comprehensive framework designed to evaluate RAG pipelines. RAG pipelines utilize external data to enhance the context provided to Large Language Models (LLMs). While building these pipelines is facilitated by existing tools, evaluating their performance quantitatively remains a challenge. RAGAS addresses this gap by offering tools based on cutting-edge research to evaluate LLM-generated text and provide valuable insights into the effectiveness of RAG pipelines.

Features to be Implemented

The implementation will leverage Kernel Memory to deliver the following evaluation features:

  • Faithfulness: Ensuring the generated text accurately represents the source information.
  • Answer Relevancy: Assessing the pertinence of the answer in relation to the query.
  • Context Recall: Measuring the proportion of relevant context retrieved.
  • Context Precision: Evaluating the accuracy of the retrieved context.
  • Context Relevancy: Determining the relevance of the provided context to the query.
  • Context Entity Recall: Checking the retrieval of key entities within the context.
  • Answer Semantic Similarity: Comparing the semantic similarity between the generated answer and the expected answer.
  • Answer Correctness: Verifying the factual correctness of the generated answers.

Integration

RAGAS will be integrated into our CI/CD pipeline to enable continuous performance monitoring and evaluation of our RAG pipelines. This integration will ensure that our RAG systems consistently meet the desired performance benchmarks.

Next Steps

  • Implement evaluation metrics: Develop the specified evaluation features using Kernel Memory.
  • Unit tests: Tests the framework.
  • Integrate with CI/CD: Configure the evaluation checks to run automatically in our CI/CD pipeline.

@kbeaugrand kbeaugrand requested a review from dluc as a code owner May 18, 2024 07:34
dluc
dluc previously approved these changes May 18, 2024
@dluc
Copy link
Collaborator

dluc commented May 18, 2024

Looks like the Release build is broken, maybe something's been removed from the solution?

@kbeaugrand
Copy link
Contributor Author

Looks like the Release build is broken, maybe something's been removed from the solution?

I'll take a look asap.

@dluc
Copy link
Collaborator

dluc commented May 18, 2024

Looks like the Release build is broken, maybe something's been removed from the solution?

I'll take a look asap.

no worries I just pushed a fix

@dluc dluc merged commit d34b750 into microsoft:main May 18, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants