-
Notifications
You must be signed in to change notification settings - Fork 2
Speeding Up Your AI‐Powered Search with JAI Async
Businesses today must deal with the massive amount of data constantly being generated. The data collected from various sources enables organizations to understand their customer's needs and preferences better. With so much data available, the search process can become time-consuming and slow down business operations. A more efficient search system is essential to support business success. Our previous developer notebook discussed how combining ChatGPT with retrieval and re-ranking methods can improve search accuracy. You can obtain a fast and efficient search function place by retrieving the most related content through cosine similarity to a hypothetical answer. By using async calls, you can even decrease the search time further, resulting in a 50% to 200% increase in speed! Steps 1 and 2 can be done in parallel with steps 3 and 4 using the async interface for Open AI API provided by JAI. In this blog, we will dive into how you can speed up your search with JAI Async methods.
In the previous developer notebook, we discussed how to improve search accuracy by combining ChatGPT with retrieval and re-ranking methods. This technique can be implemented on top of any existing search systems, including Elasticsearch, Solr, or any custom search engine application.
To implement this approach, we did these steps:
- Ask the user a question and generate a list of potential queries based on the question.
- Execute the search queries and retrieve relevant articles.
- Create an ideal answer
- Create an ideal answer embedding
- Score each article based on the embeddings for each article compared to the hypothetical ideal answer using dot to calculate cosine similarities.
- Sort and filter the articles based on the similarity obtained from the embeddings of the articles vs. the embedding of the ideal answer.
- Generate an answer to the user's question, including references and links.
The most related content, as measured by cosine similarity to the hypothetical answer (HyDE), is retrieved using this approach. It is fast and can be added to a search function you already have without managing a vector database.
We can speed up this process by using async calls. The speed up is about 30% to 200% faster.
Steps 1 and 2 can be done in parallel with steps 3 and 4. Since JAI has an async interface for accessing Open AI API in Java, we can easily do this.
Let’s shows an updated version of the main
method within the WhoWonUFC290Async
class. This code introduces asynchronous operations and uses CompletableFuture
to handle asynchronous tasks. Here's a breakdown of the code:
public static void main(String... args) throws Exception {
try {
long startTime = System.currentTimeMillis();
final CountDownLatch countDownLatch = new CountDownLatch(2);
// Generating a hypothetical answer and its embedding
final var hypotheticalAnswerEmbeddingFuture = hypotheticalAnswer()
.thenCompose(WhoWonUFC290Async::embeddingsAsync).thenApply(floats -> {
countDownLatch.countDown();
return floats;
});
// Generate a list of queries and use them to look up articles.
final var queriesFuture = jsonGPT(QUERIES_INPUT.replace("{USER_QUESTION}", USER_QUESTION))
.thenApply(queriesJson ->
JsonParserBuilder.builder().build().parse(queriesJson)
.getObjectNode().getArrayNode("queries")
.filter(node -> node instanceof StringNode)
.stream().map(Object::toString).collect(Collectors.toList())
).thenCompose(WhoWonUFC290Async::getArticles
).thenApply(objectNodes -> {
countDownLatch.countDown();
return objectNodes;
});
if (!countDownLatch.await(30, TimeUnit.SECONDS))
throw new TimeoutException("Timed out waiting for hypotheticalAnswerEmbedding and " +
" articles ");
final var articles = queriesFuture.get();
final var hypotheticalAnswerEmbedding = hypotheticalAnswerEmbeddingFuture.get();
// Extracting article content and generating embeddings for each article
final var articleContent = articles.stream().map(article ->
String.format("%s %s %s", article.getString("title"),
article.getString("description"), article.getString("content").substring(0, 100)))
.collect(Collectors.toList());
final var articleEmbeddings = embeddingsAsync(articleContent).get();
// Calculating cosine similarities between the hypothetical answer embedding and article embeddings
final var cosineSimilarities = articleEmbeddings.stream()
.map(articleEmbedding -> dot(hypotheticalAnswerEmbedding, articleEmbedding))
.collect(Collectors.toList());
// Creating a set of scored articles based on cosine similarities
final var articleSet = IntStream.range(0,
Math.min(cosineSimilarities.size(), articleContent.size()))
.mapToObj(i -> new ScoredArticle(articles.get(i), cosineSimilarities.get(i)))
.collect(Collectors.toSet());
final var sortedArticles = new ArrayList<>(articleSet);
sortedArticles.sort((o1, o2) -> Float.compare(o2.getScore(), o1.getScore()));
// Printing the top 5 scored articles
sortedArticles.subList(0, 5).forEach(System.out::println);
// Formatting the top results as JSON strings
final var formattedTopResults = String.join(",\n", sortedArticles.stream()
.map(ScoredArticle::getContent)
.map(article -> String.format(Json.niceJson("{'title':'%s', 'url':'%s'," +
" 'description':'%s', 'content':'%s'}\n"),
article.getString("title"), article.getString("url"),
article.getString("description"),
getArticleContent(article))).collect(Collectors.toList()).subList(0, 10));
System.out.println(formattedTopResults);
// Generating the final answer with the formatted top results
final var finalAnswer = jsonGPT(ANSWER_INPUT.replace("{USER_QUESTION}", USER_QUESTION)
.replace("{formatted_top_results}", formattedTopResults), "Output format is markdown").get();
System.out.println(finalAnswer);
long endTime = System.currentTimeMillis();
System.out.println(endTime - startTime);
} catch (Exception ex) {
ex.printStackTrace();
}
}
- Asynchronous Execution:
- The code initializes a
CountDownLatch
with a count of 2. This will be used to synchronize the completion of two asynchronous tasks: generating a hypothetical answer embedding and obtaining the list of queries. - The
hypotheticalAnswerEmbeddingFuture
is created using thehypotheticalAnswer
method, which returns aCompletableFuture
. TheembeddingsAsync
method is then applied asynchronously to generate the hypothetical answer embedding. ThethenApply
method is used to handle the result and decrement the latch count. - The
queriesFuture
is created by first calling thejsonGPT
method to generate the queries JSON, and then applying a series of asynchronous operations (getArticles
andthenApply
) to obtain the list of articles. ThethenApply
method is used to handle the result and decrement the latch count.
- The code initializes a
- Synchronization:
- The code uses
countDownLatch.await
to wait for both asynchronous tasks to complete. If the tasks don't complete within 30 seconds, aTimeoutException
is thrown.
- The code uses
- Obtaining Results:
- The code retrieves the results of the asynchronous tasks by calling
queriesFuture.get()
andhypotheticalAnswerEmbeddingFuture.get()
.
- The code retrieves the results of the asynchronous tasks by calling
- Extracting Article Content and Embeddings:
- The code continues by extracting article content and generating embeddings for each article, similar to the previous example.
- Calculating Cosine Similarities:
- The code calculates cosine similarities between the hypothetical answer embedding and article embeddings, similar to the previous example.
- Creating Scored Articles and Sorting:
- The code creates a set of scored articles based on cosine similarities and sorts them in descending order based on the score, similar to the previous example.
- Printing Top Results:
- The code prints the top 5 scored articles by iterating over a sublist of sorted articles and calling
System.out.println
for each article.
- The code prints the top 5 scored articles by iterating over a sublist of sorted articles and calling
- Formatting Top Results as JSON Strings:
- The code formats the top results as JSON strings using the
stream
function,map
operation, andString.format
, similar to the previous example.
- The code formats the top results as JSON strings using the
- Generating the Final Answer:
- The code generates the final answer by replacing the placeholders in the
ANSWER_INPUT
string with the actual values, using thejsonGPT
method asynchronously. Theget
method is called on the returnedCompletableFuture
to obtain the final answer.
- Printing Timing Information:
- The code calculates the elapsed time by subtracting the start time from the end time and prints it using
System.out.println
.
Let’s cover the rest.
Recall that we are doing this steps
- Ask the user a question and generate a list of potential queries based on the question.
- Execute the search queries and retrieve relevant articles.
- Create an ideal answer
- Create an ideal answer embedding
Steps 1 and 2 can be done in parallel with steps 1 and 2. Since JAI has an async interface for accessing Open AI API in Java, we can easily do this. Let’s s how to run step 3 and 4 at the same time as 1 and 2. First we will run 3 and 4 by using async interface of JAI .
public static CompletableFuture<String> jsonGPT(String input) {
return jsonGPT(input, "All output shall be JSON");
}
public static CompletableFuture<String> jsonGPT(String input, String system) {
final var client = OpenAIClient.builder()
.setApiKey(System.getenv("OPENAI_API_KEY")).build();
final var chatRequest = ChatRequest.builder()
.addMessage(Message.builder().role(Role.SYSTEM).content(system).build())
.addMessage(Message.builder().role(Role.USER).content(input).build())
.build();
return client.chatAsync(chatRequest).thenApply(chat -> {
if (chat.getResponse().isPresent()) {
return chat.getResponse().get().getChoices().get(0).getMessage().getContent();
} else {
System.out.println(chat.getStatusCode().orElse(666) + " " + chat.getStatusMessage().orElse(""));
throw new IllegalStateException();
}
});
}
public static CompletableFuture<String> hypotheticalAnswer() {
final var input = HA_INPUT.replace("{USER_QUESTION}", USER_QUESTION);
return jsonGPT(input).thenApply(response -> JsonParserBuilder.builder().build().parse(response).getObjectNode().getString("hypotheticalAnswer"));
}
public static CompletableFuture<float[]> embeddingsAsync(String input) {
System.out.println("INPUT " + input);
return embeddingsAsync(List.of(input)).thenApply(embeddings -> embeddings.get(0));
}
public static CompletableFuture<List<float[]>> embeddingsAsync(List<String> input) {
System.out.println("INPUT " + input);
if (input == null || input.size() == 0) {
return CompletableFuture.completedFuture(Collections.singletonList(new float[0]));
}
final var client = OpenAIClient.builder().setApiKey(System.getenv("OPENAI_API_KEY")).build();
return client.embeddingAsync(EmbeddingRequest.builder().model("text-embedding-ada-002").input(input).build()).thenApply(embedding -> {
if (embedding.getResponse().isPresent()) {
return embedding.getResponse().get().getData().stream().map(Embedding::getEmbedding).collect(Collectors.toList());
} else {
System.out.println(embedding.getStatusCode().orElse(666) + " " + embedding.getStatusMessage().orElse(""));
throw new IllegalStateException(embedding.getStatusCode().orElse(666) + " " + embedding.getStatusMessage().orElse(""));
}
});
}
public static void main(String... args) {
try {
long startTime = System.currentTimeMillis();
final CountDownLatch countDownLatch = new CountDownLatch(2);
// Generating a hypothetical answer and its embedding
final var hypotheticalAnswerEmbeddingFuture = hypotheticalAnswer()
.thenCompose(WhoWonUFC290Async::embeddingsAsync).thenApply(floats -> {
countDownLatch.countDown();
return floats;
});
...
The example adds some additional methods to support asynchronous operations and CompletableFuture usage. Here's a breakdown of the code:
-
jsonGPT
Method:- This method is overloaded with two versions. The first version takes only the
input
parameter and sets the system message to "All output shall be JSON". The second version takes both theinput
andsystem
parameters to customize the system message. - The method creates a
ChatRequest
with a system message and user message using the provided input. - The
client.chatAsync(chatRequest)
method is invoked to perform the chat interaction asynchronously. - The returned
CompletableFuture
is then processed usingthenApply
to extract the desired response from the chat and return it.
- This method is overloaded with two versions. The first version takes only the
-
hypotheticalAnswer
Method:- This method generates a hypothetical answer by replacing the
{USER_QUESTION}
placeholder in theHA_INPUT
string with the actual user question. - It calls the
jsonGPT
method with the generated input and then processes the resultingCompletableFuture
usingthenApply
. - In the
thenApply
function, the response is parsed usingJsonParserBuilder
to extract thehypotheticalAnswer
field from the JSON object.
- This method generates a hypothetical answer by replacing the
-
embeddingsAsync
Methods:- The
embeddingsAsync
methods handle the generation of embeddings asynchronously. - The single-input version of the method takes an
input
string, which represents the content for which embeddings need to be generated. - The multi-input version of the method takes a list of input strings.
- Both methods first check if the input is empty or null. If so, a completed
CompletableFuture
is returned with an empty list of embeddings. - The methods then create an
OpenAIClient
and invoke the appropriate embedding operation (embeddingAsync
) using theinput
parameter. - The resulting
CompletableFuture
is processed usingthenApply
, where the embeddings are extracted from the response and returned as a list.
- The
Next, let’s run step 1 and 2, getting the list of queries and then loading the articles for query, at the same time.
public static void main(String... args) {
try {
final CountDownLatch countDownLatch = new CountDownLatch(2);
// Generating a hypothetical answer and its embedding
...
// Generate a list of queries and use them to look up articles.
final var queriesFuture = jsonGPT(QUERIES_INPUT.replace("{USER_QUESTION}", USER_QUESTION))
.thenApply(queriesJson ->
JsonParserBuilder.builder().build().parse(queriesJson)
.getObjectNode().getArrayNode("queries")
.filter(node -> node instanceof StringNode)
.stream().map(Object::toString).collect(Collectors.toList())
).thenCompose(WhoWonUFC290Async::getArticles
).thenApply(objectNodes -> {
countDownLatch.countDown();
return objectNodes;
});
if (!countDownLatch.await(30, TimeUnit.SECONDS))
throw new TimeoutException("Timed out waiting for hypotheticalAnswerEmbedding and " +
" articles ");
final var articles = queriesFuture.get();
final var hypotheticalAnswerEmbedding =
hypotheticalAnswerEmbeddingFuture.get();
...
- The above example initializes a
CountDownLatch
with a count of 2 to synchronize the completion of two asynchronous tasks: generating a hypothetical answer embedding and obtaining the list of queries. - The
hypotheticalAnswerEmbeddingFuture
is created by calling thehypotheticalAnswer
method, followed by theembeddingsAsync
method to generate the embeddings asynchronously. ThethenApply
method is used to handle the result and decrement the latch count. - The
queriesFuture
is obtained using similar asynchronous operations as in the previous examples. - The code checks if both asynchronous tasks complete within 30 seconds using
countDownLatch.await
. If not, aTimeoutException
is thrown. - The results of the asynchronous tasks are retrieved using the
get
method on the respectiveCompletableFuture
objects. - Creation of
queriesFuture
:- The code uses the
jsonGPT
method to generate a list of queries based on the user question. - The
QUERIES_INPUT
string is modified by replacing the{USER_QUESTION}
placeholder with the actual user question. - The resulting
CompletableFuture
of the JSON queries is processed usingthenApply
to parse the JSON and extract the array of queries. - The obtained
objectNodes
are then processed usingthenCompose
to call thegetArticles
method asynchronously. Notice this because we will cover this: WhoWonUFC290Async::getArticles. - After obtaining the articles, the
countDownLatch
is decremented by callingcountDownLatch.countDown()
.
- The code uses the
- Asynchronous Waiting:
- The code uses
countDownLatch.await
to wait for the completion of both the hypothetical answer embedding and the retrieval of articles. - If the waiting time exceeds 30 seconds, a
TimeoutException
is thrown.
- The code uses
- Retrieval of Articles and Hypothetical Answer Embedding:
- The articles are retrieved from the
queriesFuture
usingqueriesFuture.get()
. - The hypothetical answer embedding is obtained from the
hypotheticalAnswerEmbeddingFuture
usinghypotheticalAnswerEmbeddingFuture.get()
.
- The articles are retrieved from the
Let’s take a look at the getArticles method.
WhoWonUFC290Async::getArticles.
private static CompletableFuture<List<ObjectNode>> getArticles(List<String> queries) {
final CompletableFuture<List<ObjectNode>> completableFuture = new CompletableFuture<>();
final CountDownLatch countDownLatch = new CountDownLatch(queries.size());
final LinkedTransferQueue<ObjectNode> results = new LinkedTransferQueue<>();
final List<CompletableFuture<ArrayNode>> queryFutures = queries.stream()
.map(WhoWonUFC290Async::searchNews).collect(Collectors.toList());
final ExecutorService executorService = Executors.newCachedThreadPool();
executorService.submit(() -> {
for (CompletableFuture<ArrayNode> future : queryFutures) {
try {
ArrayNode arrayNode = future.get();
arrayNode.forEach(node -> results.add((ObjectNode) node));
countDownLatch.countDown();
} catch (Exception e) {
e.printStackTrace();
countDownLatch.countDown();
}
}
});
executorService.submit(() -> {
try {
if (!countDownLatch.await(30, TimeUnit.SECONDS)) {
throw new TimeoutException("Timed out waiting for articles");
}
final var list = new ArrayList<ObjectNode>();
if (list.addAll(results)) {
completableFuture.complete(list);
} else {
completableFuture.complete(list);
}
} catch (Exception e) {
completableFuture.completeExceptionally(e);
} finally {
executorService.shutdown();
}
});
return completableFuture;
}
The getArticles
method is a utility method that retrieves articles for a given list of queries asynchronously using CompletableFuture
. Let's break down the code:
- Initialization:
- A
CompletableFuture<List<ObjectNode>>
namedcompletableFuture
is created to hold the final result. - A
CountDownLatch
namedcountDownLatch
is initialized with the size of thequeries
list to keep track of query completions. - A
LinkedTransferQueue<ObjectNode>
namedresults
is created to collect the intermediate results from each query.
- A
- Query Execution:
- The
queries
list is streamed, and for each query, thesearchNews
method is called asynchronously usingCompletableFuture
. - The resulting
CompletableFuture<ArrayNode>
instances are collected in thequeryFutures
list.
- The
- ExecutorService and Results Collection:
- An
ExecutorService
namedexecutorService
is created with a cached thread pool. - Two tasks are submitted to the
executorService
for concurrent execution.- The first task iterates through the
queryFutures
list and retrieves theArrayNode
results from each future. - Each
ObjectNode
from theArrayNode
is added to theresults
queue, and thecountDownLatch
is decremented. - Any exceptions during the retrieval process are printed, and the
countDownLatch
is still decremented even in case of errors. - The second task waits for the completion of all queries by calling
countDownLatch.await
. - If the waiting time exceeds 30 seconds, a
TimeoutException
is thrown. - After the waiting period, the
results
queue is converted to anArrayList<ObjectNode>
namedlist
. - If the
list
is successfully populated, thecompletableFuture
is completed with thelist
, otherwise it's completed with an emptylist
. - Any exceptions that occur during the process are caught, and the
completableFuture
is completed exceptionally. - Finally, the
executorService
is shut down.
- The first task iterates through the
- An
- Return:
- The
completableFuture
is returned, which will hold the final list ofObjectNode
results when completed.
- The
This method allows the asynchronous execution of multiple queries to retrieve articles, and the results are collected in a list of ObjectNode
using CompletableFuture
for concurrent processing and efficient handling of asynchronous tasks.
The example continues with the remaining steps of extracting article content, calculating cosine similarities, creating scored articles, sorting them, and using the selected articles as part of the final answer. The big difference is that looking up articles based on generated queries happens simultaneously when we generate an ideal answer and get the ideal answers embeddings.
The Java Open AI API Client Async methods can be used to speed up the process of generating a list of potential queries, executing search queries, creating an ideal answer, and creating an ideal answer embedding to measure and prioritize articles. The example code demonstrates the use of CompletableFuture to handle asynchronous operations with JAI, including generating a hypothetical answer and its embedding, obtaining a list of queries, and retrieving articles for each query. The examples, as before, also shows how to calculate cosine similarities, create scored articles, and sort them to select the best articles for the final answer.
- Java Open AI Client
- Using ChatGpt embeddings and hyde to improve search results
- Anthropics Claude Chatbot Gets Upgrade
- Elon Musks XAi new frontier for artificial intelligence
- Using Mockito to test JAI Java Open AI Client
- Fine tuning journey with Open AI API
- Using Open AI to create callback functions, the basis for plugins
- Using Java Open AI Client Async
- Fastest Java JSON Parser
- Java Open AI API Client on Github
- Medium: Introducing Java Open AI Client
- Medium: Using ChatGPT, Embeddings, and HyDE to Improve Search Results