Skip to content

Commit

Permalink
Fixes #4090: The apoc.vectordb.* procedures should search for nodes/r…
Browse files Browse the repository at this point in the history
…elationships with mapping config
  • Loading branch information
vga91 committed May 29, 2024
1 parent bc76a3d commit 18ce21a
Show file tree
Hide file tree
Showing 21 changed files with 385 additions and 183 deletions.
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ subprojects {

ext {
// NB: due to version.json generation by parsing this file, the next line must not have any if/then/else logic
neo4jVersion = "5.21.0"
neo4jVersion = "5.19.0"
// instead we apply the override logic here
neo4jVersionEffective = project.hasProperty("neo4jVersionOverride") ? project.getProperty("neo4jVersionOverride") : neo4jVersion
testContainersVersion = '1.18.3'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,17 +119,6 @@ CALL apoc.vectordb.chroma.queryAndUpdate($host,
| ...
|===

[NOTE]
====
We can use mapping with `apoc.vectordb.chroma.getAndUpdate` procedure as well
====

[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====


In the same way as other procedures, we can define a mapping, to fetch the associated nodes and relationships and optionally create them,
Expand All @@ -151,7 +140,35 @@ CALL apoc.vectordb.chroma.query($host, '<collection_id>',
})
----

We can also use mapping for `apoc.vectordb.chroma.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates.
For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.chroma.query($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.chroma.get*` procedures as well
====

[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====

.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API])
[source,cypher]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,28 @@ which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.milvus.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates.
For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.milvus.query('http://localhost:19531', 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.milvus.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.milvus.get*` procedures as well
====

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,9 +203,28 @@ which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.pinecone.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates.
For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.pinecone.query($host, 'test-index',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.pinecone.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.pinecone.get*` procedures as well
====

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,27 @@ which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.qdrant.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates.
For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.qdrant.queryAndUpdate($hostOrKey, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.qdrant.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.qdrant.get*` procedures as well
====

[NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,9 +205,29 @@ and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.weaviate.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates.
For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----



[NOTE]
====
We can use mapping with `apoc.vectordb.weaviate.getAndUpdate` procedure as well
We can use mapping with `apoc.vectordb.weaviate.get*` procedures as well
====

[NOTE]
Expand Down
47 changes: 27 additions & 20 deletions extended-it/src/test/java/apoc/vectordb/ChromaDbTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,15 @@
import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;

import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.MapUtil.map;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
import static apoc.vectordb.VectorDbHandler.Type.CHROMA;
import static apoc.vectordb.VectorDbTestUtil.assertBerlinResult;
import static apoc.vectordb.VectorDbTestUtil.assertLondonResult;
import static apoc.vectordb.VectorDbTestUtil.assertNodesCreated;
import static apoc.vectordb.VectorDbTestUtil.assertReadOnlyProcWithMappingResults;
import static apoc.vectordb.VectorDbTestUtil.assertRelsCreated;
import static apoc.vectordb.VectorDbTestUtil.dropAndDeleteAll;
import static apoc.vectordb.VectorDbTestUtil.EntityType.*;
Expand Down Expand Up @@ -294,19 +296,22 @@ MAPPING_KEY, map(EMBEDDING_KEY, "vect",
assertNodesCreated(db);
}


@Test
public void getReadOnlyVectorsWithMapping() {
db.executeTransactionally("CREATE (:Test {readID: 'one'}), (:Test {readID: 'two'})");

Map<String, Object> conf = map(ALL_RESULTS_KEY, true,
MAPPING_KEY, map(EMBEDDING_KEY, "vect"));

try {
testCall(db, "CALL apoc.vectordb.chroma.get($host, $collection, [1, 2], $conf)",
map("host", HOST, "collection", COLL_ID.get(), "conf", conf),
r -> fail()
);
} catch (RuntimeException e) {
Assertions.assertThat(e.getMessage()).contains(ERROR_READONLY_MAPPING);
}
MAPPING_KEY, map(NODE_LABEL, "Test",
ENTITY_KEY, "readID",
METADATA_KEY, "foo")
);

testResult(db, "CALL apoc.vectordb.chroma.get($host, $collection, ['1', '2'], $conf) " +
"YIELD vector, id, metadata, node RETURN * ORDER BY id",
map("host", HOST, "collection", COLL_ID.get(), "conf", conf),
r -> assertReadOnlyProcWithMappingResults(r, "node")
);
}

@Test
Expand Down Expand Up @@ -338,17 +343,19 @@ MAPPING_KEY, map(EMBEDDING_KEY, "vect",

@Test
public void queryReadOnlyVectorsWithMapping() {
db.executeTransactionally("CREATE (:Start)-[:TEST {readID: 'one'}]->(:End), (:Start)-[:TEST {readID: 'two'}]->(:End)");

Map<String, Object> conf = map(ALL_RESULTS_KEY, true,
MAPPING_KEY, map(EMBEDDING_KEY, "vect"));

try {
testCall(db, "CALL apoc.vectordb.chroma.query($host, $collection, [0.2, 0.1, 0.9, 0.7], {}, 5, $conf)",
map("host", HOST, "collection", COLL_ID.get(), "conf", conf),
r -> fail()
);
} catch (RuntimeException e) {
Assertions.assertThat(e.getMessage()).contains(ERROR_READONLY_MAPPING);
}
MAPPING_KEY, map(
REL_TYPE, "TEST",
ENTITY_KEY, "readID",
METADATA_KEY, "foo")
);

testResult(db, "CALL apoc.vectordb.chroma.query($host, $collection, [0.2, 0.1, 0.9, 0.7], {}, 5, $conf)",
map("host", HOST, "collection", COLL_ID.get(), "conf", conf),
r -> assertReadOnlyProcWithMappingResults(r, "rel")
);
}

@Test
Expand Down
47 changes: 34 additions & 13 deletions extended-it/src/test/java/apoc/vectordb/MilvusTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import apoc.util.TestUtil;
import apoc.util.Util;
import org.assertj.core.api.Assertions;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
Expand All @@ -27,9 +26,9 @@
import static apoc.vectordb.VectorDbTestUtil.assertBerlinResult;
import static apoc.vectordb.VectorDbTestUtil.assertLondonResult;
import static apoc.vectordb.VectorDbTestUtil.assertNodesCreated;
import static apoc.vectordb.VectorDbTestUtil.assertReadOnlyProcWithMappingResults;
import static apoc.vectordb.VectorDbTestUtil.assertRelsCreated;
import static apoc.vectordb.VectorDbTestUtil.dropAndDeleteAll;
import static apoc.vectordb.VectorDbUtil.ERROR_READONLY_MAPPING;
import static apoc.vectordb.VectorEmbeddingConfig.ALL_RESULTS_KEY;
import static apoc.vectordb.VectorEmbeddingConfig.FIELDS_KEY;
import static apoc.vectordb.VectorEmbeddingConfig.MAPPING_KEY;
Expand Down Expand Up @@ -297,6 +296,24 @@ MAPPING_KEY, map(EMBEDDING_KEY, "vect",
assertNodesCreated(db);
}

@Test
public void getReadOnlyVectorsWithMapping() {
db.executeTransactionally("CREATE (:Test {readID: 'one'}), (:Test {readID: 'two'})");

Map<String, Object> conf = map(ALL_RESULTS_KEY, true,
FIELDS_KEY, FIELDS,
MAPPING_KEY, map(EMBEDDING_KEY, "vect",
NODE_LABEL, "Test",
ENTITY_KEY, "readID",
METADATA_KEY, "foo"));

testResult(db, "CALL apoc.vectordb.milvus.get($host, 'test_collection', [1, 2], $conf) " +
"YIELD vector, id, metadata, node RETURN * ORDER BY id",
map("host", HOST, "conf", conf),
r -> assertReadOnlyProcWithMappingResults(r, "node")
);
}

@Test
public void queryVectorsWithCreateNodeUsingExistingNode() {

Expand Down Expand Up @@ -336,7 +353,8 @@ public void queryVectorsWithCreateRel() {
MAPPING_KEY, map(EMBEDDING_KEY, "vect",
REL_TYPE, "TEST",
ENTITY_KEY, "myId",
METADATA_KEY, "foo"));
METADATA_KEY, "foo")
);
testResult(db, "CALL apoc.vectordb.milvus.queryAndUpdate($host, 'test_collection', [0.2, 0.1, 0.9, 0.7], null, 5, $conf)",
map("host", HOST, "conf", conf),
r -> {
Expand All @@ -356,17 +374,20 @@ MAPPING_KEY, map(EMBEDDING_KEY, "vect",

@Test
public void queryReadOnlyVectorsWithMapping() {
db.executeTransactionally("CREATE (:Start)-[:TEST {readID: 'one'}]->(:End), (:Start)-[:TEST {readID: 'two'}]->(:End)");

Map<String, Object> conf = map(ALL_RESULTS_KEY, true,
MAPPING_KEY, map(EMBEDDING_KEY, "vect"));

try {
testCall(db, "CALL apoc.vectordb.milvus.query($host, 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5, $conf)",
map("host", HOST, "conf", conf),
r -> fail()
);
} catch (RuntimeException e) {
Assertions.assertThat(e.getMessage()).contains(ERROR_READONLY_MAPPING);
}
FIELDS_KEY, FIELDS,
MAPPING_KEY, map(
REL_TYPE, "TEST",
ENTITY_KEY, "readID",
METADATA_KEY, "foo")
);

testResult(db, "CALL apoc.vectordb.milvus.query($host, 'test_collection', [0.2, 0.1, 0.9, 0.7], null, 5, $conf)",
map("host", HOST, "conf", conf),
r -> assertReadOnlyProcWithMappingResults(r, "rel")
);
}

@Test
Expand Down
Loading

0 comments on commit 18ce21a

Please sign in to comment.