Skip to content

Commit

Permalink
Fixes #4090: The apoc.vectordb.*.get/query procedures should search f…
Browse files Browse the repository at this point in the history
…or nodes/relationships with mapping config (#4096)
  • Loading branch information
vga91 authored May 29, 2024
1 parent d767f7e commit c49d4c6
Show file tree
Hide file tree
Showing 19 changed files with 456 additions and 229 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -119,39 +119,107 @@ CALL apoc.vectordb.chroma.queryAndUpdate($host,
| ...
|===

[NOTE]
====
We can use mapping with `apoc.vectordb.chroma.getAndUpdate` procedure as well
====

[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====

We can define a mapping, to fetch the associated nodes and relationships and optionally create them, by leveraging the vector metadata.

In the same way as other procedures, we can define a mapping, to fetch the associated nodes and relationships and optionally create them,
by leveraging the vector metadata. For example:
For example, if we have created 2 vectors with the above upsert procedures,
we can populate some existing nodes (i.e. `(:Test {myId: 'one'})` and `(:Test {myId: 'two'})`):

.Query vectors
[source,cypher]
----
CALL apoc.vectordb.chroma.query($host, '<collection_id>',
CALL apoc.vectordb.chroma.queryAndUpdate($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----

which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [vector1]})` and `(:Test {myId: 'two', city: 'London', vect: [vector2]})`,
which will be returned in the `entity` column result.



We can also set the mapping configuration `mode` to `CREATE_IF_MISSING` (which creates nodes if not exist), `READ_ONLY` (to search for nodes/rels, without making updates) or `UPDATE_EXISTING` (default behavior):

[source,cypher]
----
CALL apoc.vectordb.chroma.queryAndUpdate($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----

which creates and 2 new nodes as above.

Or, we can populate an existing relationship (i.e. `(:Start)-[:TEST {myId: 'one'}]->(:End)` and `(:Start)-[:TEST {myId: 'two'}]->(:End)`):


[source,cypher]
----
CALL apoc.vectordb.chroma.queryAndUpdate($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()`
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.chroma.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to `*.queryOrUpdate` procedure with mapping config having `mode: "READ_ONLY"`).

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.chroma.get*` procedures as well
====

[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====

.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API])
[source,cypher]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [v
which will be returned in the `entity` column result.


Or else, we can create a node if not exists, via `create: true`:
We can also set the mapping configuration `mode` to `CREATE_IF_MISSING` (which creates nodes if not exist), `READ_ONLY` (to search for nodes/rels, without making updates) or `UPDATE_EXISTING` (default behavior):

[source,cypher]
----
Expand All @@ -156,7 +156,7 @@ CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collect
{},
5,
{ mapping: {
create: true,
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
Expand Down Expand Up @@ -189,9 +189,30 @@ which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.milvus.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to `*.queryOrUpdate` procedure with mapping config having `mode: "READ_ONLY"`).

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.milvus.query('http://localhost:19531', 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.milvus.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.milvus.get*` procedures as well
====

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [v
which will be returned in the `entity` column result.


Or else, we can create a node if not exists, via `create: true`:
We can also set the mapping configuration `mode` to `CREATE_IF_MISSING` (which creates nodes if not exist), `READ_ONLY` (to search for nodes/rels, without making updates) or `UPDATE_EXISTING` (default behavior):

[source,cypher]
----
Expand All @@ -170,7 +170,7 @@ CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
{},
5,
{ mapping: {
create: true,
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
Expand Down Expand Up @@ -203,9 +203,30 @@ which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.pinecone.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to `*.queryOrUpdate` procedure with mapping config having `mode: "READ_ONLY"`).

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.pinecone.query($host, 'test-index',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.pinecone.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.pinecone.get*` procedures as well
====

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [v
which will be returned in the `entity` column result.


Or else, we can create a node if not exists, via `create: true`:
We can also set the mapping configuration `mode` to `CREATE_IF_MISSING` (which creates nodes if not exist), `READ_ONLY` (to search for nodes/rels, without making updates) or `UPDATE_EXISTING` (default behavior):

[source,cypher]
----
Expand All @@ -158,7 +158,7 @@ CALL apoc.vectordb.qdrant.queryAndUpdate($hostOrKey, 'test_collection',
{},
5,
{ mapping: {
create: true,
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
Expand Down Expand Up @@ -191,9 +191,29 @@ which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.qdrant.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to `*.queryOrUpdate` procedure with mapping config having `mode: "READ_ONLY"`).

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.qdrant.query($hostOrKey, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.qdrant.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.qdrant.get*` procedures as well
====

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ and `(:Test {myId: 'two', city: 'London', vect: [vector2]})`,
which will be returned in the `entity` column result.


Or else, we can create a node if not exists, via `create: true`:
We can also set the mapping configuration `mode` to `CREATE_IF_MISSING` (which creates nodes if not exist), `READ_ONLY` (to search for nodes/rels, without making updates) or `UPDATE_EXISTING` (default behavior):

[source,cypher]
----
Expand All @@ -170,7 +170,7 @@ CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
5,
{ fields: ["city", "foo"],
mapping: {
create: true,
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
Expand Down Expand Up @@ -205,9 +205,31 @@ and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.weaviate.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to `*.queryOrUpdate` procedure with mapping config having `mode: "READ_ONLY"`).

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----



[NOTE]
====
We can use mapping with `apoc.vectordb.weaviate.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.weaviate.get*` procedures as well
====

[NOTE]
Expand Down
Loading

0 comments on commit c49d4c6

Please sign in to comment.