Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector database #219

Open
nyck33 opened this issue Apr 13, 2024 · 1 comment
Open

vector database #219

nyck33 opened this issue Apr 13, 2024 · 1 comment

Comments

@nyck33
Copy link

nyck33 commented Apr 13, 2024

I briefly read the paper and came across:
"Theoretically, penetration testing tool outputs could be archived
in the vector database. In practice, though, we observe that
many results closely resemble and vary in only nuanced
ways. This similarity often leads to confused information
retrieval. Solely relying on a vector database fails to over-
come context loss in penetration testing tasks. Integrating
the vector database into the design of PENTEST GPT is an
avenue for future research."

There is a PTT, Pen Testing Task Tree, so would labelling the testing tool outputs with metadata such as:

  1. At which branch of the PTT the output comes from.
  2. What actions came before it.
  3. Etc.

Then similar entries in the vector database are distinguishable by the metadata and queries can sort by metadata column values. Is this something worth pursuing?

I built a Custom GPT using the ChatGPT Retrieval Plugin and Supabase so somewhat familiar with the process although in this case this repo's app would need to make direct API calls out to any server acting as the client for the vector database.

@GreyDGL
Copy link
Owner

GreyDGL commented Apr 13, 2024

Yes that is possible. All you need to do is to link the generation results from the reasoning_module to the sub-task that the tool is trying to resolve.
However, I don't see the necessities of using a vector DB for this particular case. The design rationale of the reasoning module is to include all the contexts in one conversation, without retrieving any external resources or memories. Do you have any particular thoughts/design ideas on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants