Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Prompt Formatter #787

Merged
merged 4 commits into from
Jun 10, 2024

Conversation

patrick-hovsepian
Copy link
Contributor

@patrick-hovsepian patrick-hovsepian commented Jun 8, 2024

  • more generic default template history transformer
  • EOT and EOS token to string
  • minor refactor of ModelToken to readonly struct
  • template convenience methods
  • exposed metadata get by key native handle
  • update example to use update default history transform and convenience anti prompt value from model

- more generic default template history transformer
- EOT and EOS token to string
- minor refactor of ModelToken to readonly struct
- template convenience methods
- exposed metadata get by key native handle
- update example to use update default history transform and convenience anti prompt value from model
LLama/LLamaTemplate.cs Outdated Show resolved Hide resolved
LLama/LLamaTemplate.cs Outdated Show resolved Hide resolved
_eos = LLamaTokenToString(EOS, true);
}

private string? LLamaTokenToString(LLamaToken? token, bool isSpecialToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling strings in this class feels like a bit of a mismatch to me (it's meant to just e a simple list of tokens). As far as I can see these new string properties are only used up in the Antiprompts (which may not be necessary).

If they're not critical to anything else can we remove this change from this PR for now, and discuss if we want something like this and where best to put it in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling strings in this class feels like a bit of a mismatch to me (it's meant to just e a simple list of tokens). As far as I can see these new string properties are only used up in the Antiprompts (which may not be necessary).

This was my primary use-case, yes. Per your comment above, I need to explore the IsEndOfGeneration. Is the InteractiveExecutor sticking around? My motivation for adding this was because of the chat examples that used the interactive executor. I (naively) thought I could just plug in any model and it would work. I tried with Llama3 and Phi3 and got some odd behavior until I learned about chat templates. That made me think we could synthesize the history transformer to make a best guess and use the llama.cpp template formatter. To your other comments though, it's not great to have to constantly allocate memory to do the P/Invokes and I did consider if it's worth just porting the code to be dotnet native but it would stink to have to keep it in lockstep with that repo so I finally landed on the implementation in this PR. (this would have been helpful in the initial PR description 😅 )

To me, the ModelToken class seemed like a best fit and could cache the results to prevent further calls to the native library but I could remove it and just expose this function as a utility if that seems better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the InteractiveExecutor sticking around

Probably (hopefully) not. My long term project for the past 6 months or so has been to build a ew foundation for better future executors.

I (naively) thought I could just plug in any model and it would work

In the long term, I hope we can make that work! Many of the bits are there now - Automatic antiprompting (with end-of-generation tokens), automatic templating (with LlamaTemplate), maybe even automatic context shifting to handle out-of-context errors!

To me, the ModelToken class seemed like a best fit and could cache the results to prevent further calls to the native library but I could remove it and just expose this function as a utility if that seems better

I'm ok with either way to be honest.

If we're going to keep it with the strings in ModelTokens could you switch ModelToken back to a sealed class instead of a struct - it's a bit too big to be a struct (and will probably grow if we add equivalent stirng properties for all the special tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably (hopefully) not. My long term project for the past 6 months or so has been to build a ew foundation for better future executors.

Is that going to be here or another project entirely? I'd be interested in approaching this from the ground up if you're open to it 😸

I'll revert the ModelTokens back to a sealed class and leave it as-is for now if that's cool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here. Over the past six months I've been laying the groundwork with BatchedExecutor (a low level executor that can act as a foundation for all executors).

I wrote up a lot of my thoughts recent in this comment this comment. I ended up giving an overview of how I see the project being split up into layers. In particular the Mid level and Puzzle Pieces section show how I've put together a lot of the parts for new high level executors, now just need to figure out how to put them together! I'd be very happy with help there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I'll give that thread a read to get caught up

@martindevans
Copy link
Member

Please make sure that this compiles under netstandard 2.0. There are a few changes which I'm suspicious may break (e.g. removal of some using LLamaSharp.Extensions, which appear unused under NET8 but often contain polyfills for standard 2.0)

@patrick-hovsepian
Copy link
Contributor Author

dotnet build does compile against all targets with the removed using statements that seemed unused. Would it be better to add a workflow that explicitly targets each framework to make it more automatic?

@martindevans
Copy link
Member

a workflow that explicitly targets each framework to make it more automatic?

It would, and I thought we had one to be honest. I double checked before making that comment and was surprised to see we didn't!

/// <summary>
/// Get a span to the underlying bytes as specified by Encoding
/// </summary>
public ReadOnlySpan<byte> TemplateDataBuffer => _result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I created Apply with the slightly weird use pattern (call it to do work and discover length, call it again to get results) was to ensure the internal _result array was never exposed.

This is to avoid a case like this where the buffer might get modified underneath you without you realising:

t.Apply()
var template1 = t.TemplateDataBuffer;
t.RemoveAt(1);
t.Apply()
var template2 = t.TemplateDataBuffer;
// template1 was modified by the second call to Apply!

If we don't want to accept that risk, then this property needs to be removed for that reason.

On the other hand, if we do want to accept that risk then we could change Apply() to return the span immediately, that leads to a much nicer interface!

ReadOnlySpan<byte> Apply()

It's up to you which you go with. Having seen it in use I think I'd be slightly in favour of option 2 (as long as it is there is a warning in the method docs comments of course).


Sidenote: there's also a bug here, although from the above it probably doesn't matter!

_result is a big array that contains the template output. However, it's not all valid data. _resultLength indicates how much of the array currently contains useful data. So this should be _result.AsSpan(0, _resultLength);.

Copy link
Contributor Author

@patrick-hovsepian patrick-hovsepian Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of option 2 as well. I'd argue the internal buffer shouldn't be a class member at all. Every call to Apply would be idempotent and just return a new buffer and however callers chose to share or pass around that data is now up to them. I'd also be in favor of just returning the string here as is and not have to force them to also do an encoding translation or at the least an overload that does such

These changes were necessary to efficiently move that PromptToString method out of here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with returning string or a buffer is the huge allocation for every call - with LLM contexts growing to 128k (tokens) this could be pushing a megabyte allocation for every new message added to the template! I think all of the low/mid components should expose interfaces that allow you to call them in an efficient low/no allocation way.

If it works for the PromptToString method implementation, shall we go with option 2 for now, but keep this in mind as something that might need revisiting in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, made the changes to support option 2.

I guess my gap is the use cases I've seen it's always been applied the last message added (sans the system prompt and user message seeding) likely because of the InteractiveExecutor. We'll likely be able to make additional improvements here since the history would be additive so re-templating the entire history on every call seems like overkill. Something that can be explored later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the templater relies entirely on llama.cpp which just supplies one method that takes a list and applies the template to it, so any kind of incremental updating is difficult with that particular API. I've been wondering about other upstreaming some improvements to llama.cpp to make it incremental, but that might a bit beynd my C++ abilities.

@martindevans
Copy link
Member

Looks good to me, thanks for all the hard work @patrick-hovsepian! I'll merge it once the CI finishes :)

@martindevans martindevans merged commit 2990b47 into SciSharp:master Jun 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants