vector database #219

nyck33 · 2024-04-13T03:42:54Z

I briefly read the paper and came across:
"Theoretically, penetration testing tool outputs could be archived
in the vector database. In practice, though, we observe that
many results closely resemble and vary in only nuanced
ways. This similarity often leads to confused information
retrieval. Solely relying on a vector database fails to over-
come context loss in penetration testing tasks. Integrating
the vector database into the design of PENTEST GPT is an
avenue for future research."

There is a PTT, Pen Testing Task Tree, so would labelling the testing tool outputs with metadata such as:

At which branch of the PTT the output comes from.
What actions came before it.
Etc.

Then similar entries in the vector database are distinguishable by the metadata and queries can sort by metadata column values. Is this something worth pursuing?

I built a Custom GPT using the ChatGPT Retrieval Plugin and Supabase so somewhat familiar with the process although in this case this repo's app would need to make direct API calls out to any server acting as the client for the vector database.

GreyDGL · 2024-04-13T14:58:21Z

Yes that is possible. All you need to do is to link the generation results from the reasoning_module to the sub-task that the tool is trying to resolve.
However, I don't see the necessities of using a vector DB for this particular case. The design rationale of the reasoning module is to include all the contexts in one conversation, without retrieving any external resources or memories. Do you have any particular thoughts/design ideas on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vector database #219

vector database #219

nyck33 commented Apr 13, 2024

GreyDGL commented Apr 13, 2024 •

edited

Loading

vector database #219

vector database #219

Comments

nyck33 commented Apr 13, 2024

GreyDGL commented Apr 13, 2024 • edited Loading

GreyDGL commented Apr 13, 2024 •

edited

Loading