Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(idp extraction connector): implement new idp extraction outbound connector #3482

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sahilbhatoacamunda
Copy link

Description

This PR introduces the integration of an outbound IDP extraction connector. This connector uses AWS Bedrock and Textract/Apache PdfBox to extract the text and then analyze it based on the input taxonomy.

Related issues

closes https:/camunda/product-hub/issues/2527

Checklist

  • PR has a milestone or the no milestone label.

@sahilbhatoacamunda sahilbhatoacamunda requested a review from a team as a code owner October 15, 2024 14:44
@CLAassistant
Copy link

CLAassistant commented Oct 15, 2024

CLA assistant check
All committers have signed the CLA.

Comment on lines +72 to +76
Message message =
Message.builder()
.content(ContentBlock.fromText(prompt))
.role(ConversationRole.USER)
.build();

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'Message message' is never read.
Comment on lines +79 to +90
ConverseResponse response =
bedrockRuntimeClient.converse(
request ->
request
.modelId(converseData.modelId())
.messages(message)
.inferenceConfig(
config ->
config
.maxTokens(converseData.maxTokens())
.temperature(converseData.temperature())
.topP(converseData.topP())));

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'ConverseResponse response' is never read.
private static final Logger LOGGER = LoggerFactory.getLogger(AwsS3Util.class);

private static String uploadNewFileFromUrl(
final String documentUrl, final String bucketName, final S3AsyncClient s3AsyncClient)

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 's3AsyncClient' is never used.
Comment on lines +47 to +48
PutObjectRequest putObjectRequest =
PutObjectRequest.builder().bucket(bucketName).key(documentKey).build();

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'PutObjectRequest putObjectRequest' is never read.
Comment on lines +51 to +57
AsyncRequestBody asyncRequestBody =
AsyncRequestBody.fromInputStream(
body ->
body.executor(executorService)
.contentLength(contentLength)
.inputStream(inputStream)
.build());

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'AsyncRequestBody asyncRequestBody' is never read.
import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;

public class ExtractionRequest extends AwsBaseRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more flexible in terms of cloud providers, I would recommend to create another class in your model like AwsRequest and just reference it ExtractionRequest so you dont have to inherit from the AwsBaseRequest here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also allow you to change this one to a record.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new class BaseRequest to extend from AwsBaseRequest .
Converted ExtractionRequest to a record.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are working on it :)

@sbuettner
Copy link
Contributor

@sahilbhatoacamunda Congrats to your first Connector 🥳 and great work. Added just a smaller comments regarding the model.

From my perspective its fine for now that you implemented your own polling although its somewhat duplicated in the textract connector right now but we can generalize later rather than now.

.role(ConversationRole.USER)
.build();

ConverseData converseData = extractionRequest.input().converseData();

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'ConverseData converseData' is never read.
*/
package io.camunda.connector.idp.extraction.model;

public record ExtractionResult(String response) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an Object would allow to return a json object.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants