Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SQL failing to save when too many tags with the same key are present (list/content too big for the key) #527

Closed
bossbast1 opened this issue May 29, 2024 · 2 comments
Assignees
Labels
bug Something isn't working triage

Comments

@bossbast1
Copy link

bossbast1 commented May 29, 2024

Context / Scenario

We are ingesting Documents to KM, we store them to SQL -> SQL content is displayed in UI for users.
In case our Document has too many Keywords (too many tags with the same key), SQL fails on save_records with confusing error: JSON text is not properly formatted. Unexpected character '"' is found at position 245. (always around this 250 position mark)

If all Keywords are batched by 10, eg keys are Keyword1, Keyword2, ... it works, but separating is not really a valid option for us.

example code that results in stuck document:

var httpClient = new HttpClient();

var contentTest = new MultipartFormDataContent();

var fileContentTest = new ByteArrayContent(Encoding.UTF8.GetBytes("Test text"));
contentTest.Add(fileContentTest, "file", $"test.txt");
contentTest.Add(new StringContent("Test001"), "documentId");
contentTest.Add(new StringContent("index002"), "index");

for (int i = 0; i < 100; i++)
{
    contentTest.Add(new StringContent($"Keyword:test{i}"), $"tags");
}

var responseTest = await httpClient.PostAsync("https://<KM server API>/upload", contentTest);
var resTest = await responseTest.Content.ReadAsStringAsync();

in KM, this flow is used:

ConfigureIngestionMemoryDb ->

case string x when x.Equals("SqlServer", StringComparison.OrdinalIgnoreCase):
{
    var instance = this.GetServiceInstance<IMemoryDb>(builder,
        s => s.AddSqlServerAsMemoryDb(this.GetServiceConfig<SqlServerConfig>("SqlServer"))
    );
    builder.AddIngestionMemoryDb(instance);
    break;
}

What happened?

SQL should process it properly without crash. (when omitting SQL, KM is able to properly store the tags in search service without crashing)

Importance

a fix would make my life easier

Platform, Language, Versions

C#, Azure SQL Instance, version: Microsoft.KernelMemory.MemoryDb.SQLServer and Core - 0.61.240524.1

Relevant log output

│ trce: Microsoft.KernelMemory.Handlers.SaveRecordsHandler[0]                                                                                                                                                                                                                                                                                                                            │
│       Saving record d=KB30488_Zurich-Switzerland//p=1c8f1382edc343258a85a12e124e9b54 in index 'index002'                                                                                                                                                                                                                                                                               │
│ warn: Microsoft.KernelMemory.Orchestration.AzureQueues.AzureQueuesPipeline[0]                                                                                                                                                                                                                                                                                                          │
│       Message '01ef2d66-4b15-43c8-83ef-fabc0e1a618c' processing failed with exception, putting message back in the queue with a delay of 11000 msecs                                                                                                                                                                                                                                   │
│       Microsoft.Data.SqlClient.SqlException (0x80131904): JSON text is not properly formatted. Unexpected character '"' is found at position 245.                                                                                                                                                                                                                                      │
│          at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)                                                                                                                                                                                                                                                │
│          at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)                                                                                                                                                                                                        │
│          at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)                                                                                                                                                            │
│          at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)                                                                                                                                           │
│          at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)                                                                                                                                                                                                                                                 │
│          at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)                                                                                                                                                                                                                                             │
│          at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)                                                                                                                                                                                                                                                                                   │
│          at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)                                                                                                                                                                                                                                                                                      │
│          at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)                                                                                                                                                                                                         │
│       --- End of stack trace from previous location ---                                                                                                                                                                                                                                                                                                                                │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.BatchUpsertAsync(String index, IEnumerable`1 records, CancellationToken cancellationToken)+MoveNext()                                                                                                                                                                                                            │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.BatchUpsertAsync(String index, IEnumerable`1 records, CancellationToken cancellationToken)+MoveNext()                                                                                                                                                                                                            │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.BatchUpsertAsync(String index, IEnumerable`1 records, CancellationToken cancellationToken)+MoveNext()                                                                                                                                                                                                            │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)                                                                                                                                                                                                                              │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)                                                                                                                                                                                                                              │
│          at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveRecordAsync(DataPipeline pipeline, IMemoryDb db, MemoryRecord record, HashSet`1 createdIndexes, CancellationToken cancellationToken)                                                                                                                                                                                │
│          at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)                                                                                                                                                                                                                                                 │
│          at Microsoft.KernelMemory.Pipeline.DistributedPipelineOrchestrator.RunPipelineStepAsync(DataPipeline pipeline, IPipelineStepHandler handler, CancellationToken cancellationToken)                                                                                                                                                                                             │
│          at Microsoft.KernelMemory.Pipeline.DistributedPipelineOrchestrator.<>c__DisplayClass5_0.<<AddHandlerAsync>b__0>d.MoveNext()                                                                                                                                                                                                                                                   │
│       --- End of stack trace from previous location ---                                                                                                                                                                                                                                                                                                                                │
│          at Microsoft.KernelMemory.Orchestration.AzureQueues.AzureQueuesPipeline.<>c__DisplayClass20_0.<<OnDequeue>b__0>d.MoveNext()                                                                                                                                                                                                                                                   │
│       ClientConnectionId:04a49bb0-393e-40f1-9fce-1fe2b9b225a1                                                                                                                                                                                                                                                                                                                          │
│       Error Number:13609,State:4,Class:16
@bossbast1 bossbast1 added bug Something isn't working triage labels May 29, 2024
@dluc
Copy link
Collaborator

dluc commented May 29, 2024

FYI @kbeaugrand if you have a chance to look into it - thanks

@kbeaugrand
Copy link
Contributor

Will take a look soon.

kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
kbeaugrand added a commit to kbeaugrand/kernel-memory that referenced this issue Jun 1, 2024
@dluc dluc closed this as completed in 8c0ad8c Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants