Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: look-up table support in user shaders #4586

Closed
bjin opened this issue Jul 9, 2017 · 15 comments
Closed

feature request: look-up table support in user shaders #4586

bjin opened this issue Jul 9, 2017 · 15 comments

Comments

@bjin
Copy link
Contributor

bjin commented Jul 9, 2017

Look-up table is an essential technique used to speed up some core part of mpv's opengl backend, including most scaler based on convolution kernel (polar or not), dithering and 3dlut based color management.

But currently, there is no direct support of look-up table in user shaders (either via texture or uniform buffer). This makes a large part of LUT-based algorithm/filter infeasible to be implemented as user shader. This includes any reasonable main scaler that I can think of (all are convolution based), some prescaler with large amount of weights, and customized dithering algorithm.

It would be a great help if it's possible to define and upload static texture in user shader:

  • User can define a single texture just once, providing a texture name, and it will be available for all hooks to BIND.
  • Texture could be uploaded during user shader initialization, just like how LUT used by scaler is handled now.
  • Supports one and four components per texel, I think there is actual use case for both format.
  • Supports both bilinear and nearest sampling. Again, there is actual use case for both.
  • (optional) Support 1D and 3D texture.

My proposal

We can extend the definition of single block in user shaders to define a constant texture.

//!TEXTURE LUT_weights
//!COMPONENTS 4
//!HEIGHT h
//!WIDTH w
//!SAMPLING bilinear
t00.r,t00.g,t00.b,t00.a;t10.r,t10.g,t10.b,t10.a;t20.r, ...
t01.r,t01.g,t01.b,t01.a;t11.r,..
t02.r,...
...

Texture will be named LUT_weights, and hooks are able to use BIND LUT_weights to access it.

Values like t00.r are floating numbers in decimal representation (like "0.01" or "1e-3"). There are h lines in the body, each with w texel separated by semicolon. Within each texel, components are separated by comma.

It's tempting to allow binary representation of single-precision float, or even exact integer. But --opengl-fbo-format is not controlled by user shader, and a safe fall back is generic float number, the value that will later be retrieved.


This feature is kind of critical to the performance of a new prescaler I'm currently trying to implement (theoretically it should be comparable to built-in EWA scalers but currently it's far slower than nnedi3). But besides it, I also feel it's necessary since otherwise I can't even port built-in scalers into user shader and make some customization.

I post the proposal here since I feel it's not necessary the only approach. I don't have input from other user shader authors, and also haven't been following the opengl backend for a while.

@haasn
Copy link
Member

haasn commented Jul 9, 2017

I have an unmerged branch sitting on my computer that allows user shaders to load custom textures, but it's annoying for a couple of reasons so I haven't merged it. Maybe I could get your comments about the API?

For reference, the patch looks like this: haasn@484a011

The API design I was considering looks something like this:

//!LOAD NEAREST 9 9 1 4 ~~/shaders/fsrcnn/FSRCNN_4_2_1.tex

In order, that's <sampling_mode> <width> <height> <depth> <components> <path to file containing raw data>, but it's annoying because I have to use absolute paths. (Don't know how to make it so that it's relative to the shader file). It's also hard-coded to use 32-bit float format, because I didn't want to bloat it up with even more options. But I guess your style of API would be more flexible in this regard?

This would be much easier on the mpv side of things than actually trying to parse the texture/float data literally. Would you consider this API acceptable? Maybe I could introduce a hybrid between your syntax and my syntax, where the //!TEXTURE primitive doesn't directly include the raw data but instead defines a file path?

Also, I'm concerned about user shaders trying to replace the shader kernel because that doesn't “work” properly - the main shader still runs, and the user shader can't take into account options like sigmoid upscaling or cropping/offset issues.

What I might do instead is introduce a new shader primitive next to //!HOOK, basically //!REPLACE - the only current user would be //!REPLACE KERNEL which makes it a 1:1 drop-in replacement for the shader kernel code in mpv, effectively replacing it.

@bjin
Copy link
Contributor Author

bjin commented Jul 9, 2017

Really glad to know that there is a working solution now. A few comments:

  • It probably will be rather annoying to require absolute path. I won't expect user of my shader to edit shader file after downloading it.
  • It probably make some sense to use raw floats here, since FBO format is fixed to be GL_RGBA32F anyway. Even if we want to add support for new format, we could also ask user shader author to provide raw dump in specific format, not much change required in mpv's end.
  • Correct me if I'm wrong. I did a quick look at the code, it seems to me that texture data is cached in memory, but not in GPU. They are uploaded to GPU for every rendering pass, and for different hooks. Maybe we could build a global cache for this?
  • Separating texture data and shaders into different file seems to be more elegant than my proposal.
  • //!TEXTURE lut defines texture and //!BIND lut(in other hook) loads it looks more natural to me. We can extends //!TEXTURE to includes all metadata in single line just like //!LOAD

@haasn
Copy link
Member

haasn commented Jul 9, 2017

Correct me if I'm wrong. I did a quick look at the code, it seems to me that texture data is cached in memory, but not in GPU. They are uploaded to GPU for every rendering pass, and for different hooks. Maybe we could build a global cache for this?

pass_hook_user_shaders is only run from gl_video_setup_hooks (in gl_video_resize), so it only gets re-uploaded when resizing the window. (Incidentally, that could probably also be optimized away to do it only once per option reinitialization - low hanging fruit if somebody wants to improve it)

The only code that gets run per-frame for user shaders is in user_hook, which just does gl_sc_uniform_sampler.

We can extends //!TEXTURE to includes all metadata in single line just like //!LOAD

I'd rather do the other way. I actually like the //!TEXTURE name approach because it doesn't have to cram every piece of metadata into a single line, so I can easily extend it with more metadata options without worrying about API overload.

Separating texture data and shaders into different file seems to be more elegant than my proposal.

I actually think it owuld be more elegant to have them as part of the same file, to make it more self-contained. Unfortunately, the inability to easily load these weights is a big selling point that gets lost. Hmm.. what if we use something like base64 to store the raw weights file as plaintext inside the shader body? That way you can include newlines and such, and mpv is freed from the burden of having to parse a few thousand floats. (We get base64 for “free” from libavutil)

@bjin
Copy link
Contributor Author

bjin commented Jul 9, 2017

pass_hook_user_shaders is only run from gl_video_setup_hooks (in gl_video_resize), so it only gets re-uploaded when resizing the window.

I see.

A few additional configuration settings for texture that could matter potentially

  • texel format: support integer format for example.
  • wrap settings: current dithering code are using "repeat" instead of "clamp to edge". However there is workaround if performance is not a issue.

@bjin
Copy link
Contributor Author

bjin commented Jul 9, 2017

what if we use something like base64 to store the raw weights file as plaintext inside the shader body?

This approach also looks good to me, especially considering that decimal representation of float-point number won't save much space either.

@haasn
Copy link
Member

haasn commented Jul 9, 2017

To be honest even the thought of extending the user shader API is annoying. Maybe we should revolutionize the API and switch to loading something like a lua file that defines all of the passes directly? Then you could load a file or use base64 or whatever inside lua internally..

That's also ugly though, especially the “bridge” that would have to be built between vo_opengl and lua. (Where would this even go? a VOCTRL?)

@bjin
Copy link
Contributor Author

bjin commented Jul 9, 2017

To be honest even the thought of extending the user shader API is annoying. Maybe we should revolutionize the API and switch to loading something like a lua file that defines all of the passes directly?

Yes, it do feels to be annoying. But if I understand correctly, after adding LUT support, within the rendering workflow current opengl vo defined, every single step of built-in filters could potentially be replaced by user shader. It's more or less future-proof, especially considering there aren't much development activity based on user shader.

That's also ugly though, especially the “bridge” that would have to be built between vo_opengl and lua. (Where would this even go? a VOCTRL?)

There is just the technical side. I also feel that migrating all existing user shader (even though not much) to lua will take a lot of effects.

Anyway, I do hope there is an elegant solution to this situation. Embedding weights might be weird, but could resolve some real problem, and have some performance benefits (smaller GLSL shader size, LUT resists on GPU).

@haasn
Copy link
Member

haasn commented Jul 9, 2017

That's also ugly though, especially the “bridge” that would have to be built between vo_opengl and lua. (Where would this even go? a VOCTRL?)

Hmm no, even this would be insufficient. It seems like ideally, what would have to happen is that vo_opengl itself learns the ability to execute lua files, and then maybe we could also get rid of the awkward RPN szexp and figure out how to use lua somehow. But this is tricky for many reasons, the start of which is that I don't know how to just execute arbitrary lua files from within vo_opengl.

@haasn
Copy link
Member

haasn commented Jul 9, 2017

@bjin @wm4 How about this design?

  1. Delete --opengl-shaders
  2. Add a new lua command mp.register_shader(func) to work similarly to register_event/register_idle. The function func would return an object describing the shader.
  3. Whenever mpv wants to recompute the list of user shaders (when is that the case?), it would call into lua to evaluate all of these registered functions, and the resulting object transformed back into a C struct (how?) and sent to the vo using a single VOCTRL_UPDATE_SHADERS.

This way we would unify user scripts with user shaders, both just being lua files you plop into ~/.mpv/scripts - and user shaders could also do whatever they want w.r.t loading and unloading user shaders. (By just dynamically registering/unregistering the shader based on a keybinding or something)

As an annoying side detail, the “size expressions” would probably still have to be in some form in which vo_opengl can easily execute them, because they may reference the sizes of textures that only exist at runtime like LUMA.w. So we would either have to keep around some sort of literal stack-based RPN expression, or figure out a way for vo_opengl to “cheaply” call lua expressions at runtime, per frame (how hard/expensive is that?).

@haasn
Copy link
Member

haasn commented Jul 9, 2017

As an aside, this registered func could do arbitrary computation, including generating the right texture and GLSL based on dynamic configuration (script-opts) or video properties. So you could rewrite your “shader generator” in lua instead of python and use them as-is.

@bjin
Copy link
Contributor Author

bjin commented Jul 9, 2017

So you could rewrite your “shader generator” in lua instead of python and use them as-is.

This is kind of thing I want to avoid. I could image porting python code into lua gonna be a nightmare for me. If it have to be done in this way, I will definitely choose to generate lua code (which "returns" shader file) from python directly.

Add a new lua command mp.register_shader(func) to work similarly to register_event/register_idle. The function func would return an object describing the shader.

I'm actually not sure what's the gain for this move. If it's just about extending customized DSL, we could probably use another structure format like JSON. User shaders are, after all, just a bunch of metadata and strings. If it's about dynamically generating shader with lua, and providing flexibility like reading and uploading LUT texture, it seems to be a little bit overkill for me.

@haasn
Copy link
Member

haasn commented Jul 9, 2017

Hmm good point re: just loading JSON, I guess mpv already has reusable functions for this somewhere. Unfortunately that's somewhat annoying since it means having to escape the shader body. YAML would probably be better, but wm4 would kill me for suggesting it.

I just thought that making them more dynamic would possibly solve a number of different requirements, like dynamically loading-unloading individual shaders via keybind or property (currently not really possible unless you plan on parsing the opengl-shaders property yourself...), or making the shaders more directly user-configurable. (e.g. adjusting the strength using a keybinding)

But oh well, if you think it's overkill then maybe it is. I guess I'll probably stick with the status quo of having this horrible user_shaders.c parser...

(What do you think, @wm4?)

bjin added a commit to bjin/mpv-prescalers that referenced this issue Jul 10, 2017
This change requires commits from an unmerged branch of mpv[1].

The API is most likely going to change[2]. So some temporary hack is
used, `/tmp/ravu.bin` will be written for example.

`--target=native` is still insanely slow for some reason I don't
understand, probably due to my shader compiler failed to perform
certain optimization (float vs. half-float?). So default target is
changed to `luma` for now.

[1]: https:/haasn/mpv/tree/gl_user_tex
[2]: mpv-player/mpv#4586
@bjin
Copy link
Contributor Author

bjin commented Jul 10, 2017

@haasn I adopted the API provided by your gl_user_tex branch, and it works as expected. But as we discussed, a cleaner and intuitive API is probably still preferred. I'm leaning to //!TEXTURE block with base64 encoded body approach. Once there is a conclusion, I would be happy to help get it merged.

@haasn
Copy link
Member

haasn commented Jul 10, 2017

I'm leaning to //!TEXTURE block with base64 encoded body approach. Once there is a conclusion, I would be happy to help get it merged.

In the absence of any sort of radical lua-based reformation, I would be leaning towards that as well. In addition to allowing TEXTURE blocks, I would also like to allow REPLACE blocks, to replace the scaler kernel by your own. (And possibly also the dithering algorithm, although I have extreme doubt that there's a good reason to do so)

How much does the texture upload help performance? What I found for FSRCNN is that hard-coding the weights in the shader was actually faster for fixed convolutions, even though the weight required a position-based array lookup, so I expect this to be the case for e.g. NNEDI3 as well.

@bjin
Copy link
Contributor Author

bjin commented Jul 10, 2017

How much does the texture upload help performance? What I found for FSRCNN is that hard-coding the weights in the shader was actually faster for fixed convolutions

Yes, this is the case for nnedi3 as well. We used to have two approach to upload nnedi3 weights, via uniform buffer, or hard-coded in code. The hard coding approach is much faster as I tested. I guess this is always true if all weights are going to be used for rendering single pixel, and in specified order.

RAISR, however, is not neural network based. Only a small portion of the weights are going to be used. So using a texture to look up weights is much faster than hard-coding all weights, about 8 times faster as I tested.

haasn added a commit to haasn/mp that referenced this issue Jul 11, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained.

I decided to reuse the internal saved_tex mechanism to avoid having to
bloat up `pass_hook` more, and make the texture binding mechanism
consistent for users, even though the result of doing this is slightly
quirky. because now we will also generate the _size etc. uniforms even
though they don't really make sense. This also requires me to add a new
PLANE_ type so I just have _something_ to set for the type, because none
of RGB etc. really make sense. May revisit this in the future.

As an aside, by request of @wm4, I've added hex decoding code to bstr.c
and made stream_memory.c use it. As a result of this, stream_memory.c's
hex:// protocol now silently ignores invalid characters. (But this
change is backwards compatible)

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 11, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained.

As an aside, by request of @wm4, I've added hex decoding code to bstr.c
and made stream_memory.c use it. As a result of this, stream_memory.c's
hex:// protocol now silently ignores invalid characters. (But this
change is backwards compatible)

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 11, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained.

As an aside, by request of @wm4, I've added hex decoding code to bstr.c
and made stream_memory.c use it. As a result of this, stream_memory.c's
hex:// protocol now silently ignores invalid characters. (But this
change is backwards compatible)

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 11, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained.

As an aside, by request of @wm4, I've added hex decoding code to bstr.c
and made stream_memory.c use it. As a result of this, stream_memory.c's
hex:// protocol now silently ignores invalid characters. (But this
change is backwards compatible)

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 12, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained.

As an aside, by request of @wm4, I've added hex decoding code to bstr.c
and made stream_memory.c use it. As a result of this, stream_memory.c's
hex:// protocol now silently ignores invalid characters. (But this
change is backwards compatible)

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 12, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained.

As an aside, by request of @wm4, I've added hex decoding code to bstr.c
and made stream_memory.c use it. As a result of this, stream_memory.c's
hex:// protocol now silently ignores invalid characters. (But this
change is backwards compatible)

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 12, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained. In order to facilitate different types of
shaders, the parse_user_shader interaction has been changed to instead
have it loop through blocks and call the passed functions for each valid
block parsed. This is more modular and also cleaner, with better code
separation.

Closes mpv-player#4586.
haasn added a commit to haasn/mp that referenced this issue Jul 13, 2017
Parsing the texture data as raw strings makes the textures the most
portable and self-contained. In order to facilitate different types of
shaders, the parse_user_shader interaction has been changed to instead
have it loop through blocks and call the passed functions for each valid
block parsed. This is more modular and also cleaner, with better code
separation.

Closes mpv-player#4586.
@haasn haasn closed this as completed in 345bb19 Jul 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants