Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent test failure on Nvidia + Vulkan #690

Open
b0nes164 opened this issue Sep 17, 2024 · 2 comments
Open

Intermittent test failure on Nvidia + Vulkan #690

b0nes164 opened this issue Sep 17, 2024 · 2 comments

Comments

@b0nes164
Copy link

b0nes164 commented Sep 17, 2024

  • Running cargo nextest run results in intermittent failure on tests from the compare_gpu_cpu or snapshot suite when using Vulkan as backend.
  • Changing backend to DX12 seemed to fix issue, but test time increased from an average of ~30s to ~60s.
  • All tests from WGPU passed.
  • Failure observed on both Win10, exit code: 0xc0000005, STATUS_ACCESS_VIOLATION, and Ubuntu 22.04 signal: 11, SIGSEGV: invalid memory reference
  • AdapaterInfo on Win10:
AdapterInfo { name: "NVIDIA GeForce RTX 2080 SUPER", vendor: 4318, device: 7809, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "560.94", backend: Vulkan }
AdapterInfo { name: "NVIDIA GeForce RTX 2080 SUPER", vendor: 4318, device: 7809, device_type: DiscreteGpu, driver: "32.0.15.6094", driver_info: "", backend: Dx12 }
  • AdapterInfo on Ubuntu 22.04:
AdapterInfo { name: "NVIDIA GeForce RTX 2080 SUPER", vendor: 4318, device: 7809, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "555.42.02", backend: Vulkan }
  • Failures were not observed on Win10 with IntelHD620:
AdapterInfo { name: "Intel(R) HD Graphics 620", vendor: 32902, device: 22806, device_type: IntegratedGpu, driver: "Intel Corporation", driver_info: "Intel driver", backend: Vulkan }

Edit:
Most likely it has something to do with these warnings:
WARNING: [Loader Message] Code 0 : loader_add_layer_properties: 'layers' tag not supported until file version 1.0.1, but /usr/share/vulkan/implicit_layer.d/nvidia_layers.json is reporting version 1
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib/x86_64-linux-gnu/libvulkan_virtio.so. Skipping this driver.
which have been linked to issues in a variety of other projects.

See wgpu#5270

@waywardmonkeys waywardmonkeys added this to the Vello 0.3 release milestone Sep 18, 2024
@DJMcNab
Copy link
Member

DJMcNab commented Sep 18, 2024

Is there a corresponding failure outside of running the tests?

The fact that this applies on both Windows and Linux is concerning, because it suggests something is more fundamentally broken. Are you able to get a backtrace for either of the segfaults?

@b0nes164
Copy link
Author

Is there a corresponding failure outside of running the tests?

I haven't encountered one so far, but I've only ran the ghostscript tiger splash with cargo run -p with_winit. I suspect this is not enough to reproduce the bug encountered in the tests.

Here is a stacktrace from Win10:

compare_gpu_cpu-001e8d9aac02291a.exe!ash::extensions_generated::ext::debug_utils::Device::set_debug_utils_object_name(ash::vk::definitions::DebugUtilsObjectNameInfoEXT * self) Line 15 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ash-0.38.0+1.3.281\src\extensions\ext\debug_utils.rs:15)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu_hal::vulkan::DeviceShared::set_object_name<ash::vk::definitions::Buffer>(ash::vk::definitions::Buffer self, ref$<str$> object) Line 50 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-hal-22.0.0\src\vulkan\device.rs:50)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu_hal::vulkan::device::impl$4::create_buffer(wgpu_hal::vulkan::Device * self, wgpu_hal::BufferDescriptor * desc) Line 915 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-hal-22.0.0\src\vulkan\device.rs:915)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu_core::resource::StagingBuffer<wgpu_hal::vulkan::Api>::new<wgpu_hal::vulkan::Api>(core::num::nonzero::NonZero<u64> device) Line 864 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-core-22.1.0\src\resource.rs:864)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu_core::global::Global::queue_write_buffer<wgpu_hal::vulkan::Api>(wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Queue>> self, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Buffer>> queue_id, unsigned __int64 buffer_id, ref$<slice2$<u8>> buffer_offset) Line 408 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-core-22.1.0\src\device\queue.rs:408)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu::backend::wgpu_core::impl$7::queue_write_buffer(wgpu::backend::wgpu_core::ContextWgpuCore * self, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Queue>> * queue, wgpu::backend::wgpu_core::Queue * queue_data, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Buffer>> * buffer, wgpu::backend::wgpu_core::Buffer * _buffer_data, unsigned __int64 offset, ref$<slice2$<u8>>) Line 2178 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-22.1.0\src\backend\wgpu_core.rs:2178)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu::context::impl$5::queue_write_buffer<wgpu::backend::wgpu_core::ContextWgpuCore>(wgpu::backend::wgpu_core::ContextWgpuCore * self, wgpu::context::ObjectId * queue, ref$<dyn$<core::any::Any,core::marker::Send,core::marker::Sync>> buffer, wgpu::context::ObjectId * offset, ref$<dyn$<core::any::Any,core::marker::Send,core::marker::Sync>>) Line 2959 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-22.1.0\src\context.rs:2959)
compare_gpu_cpu-001e8d9aac02291a.exe!wgpu::Queue::write_buffer(wgpu::Buffer * self, unsigned __int64 buffer, ref$<slice2$<u8>> offset) Line 5407 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-22.1.0\src\lib.rs:5407)
compare_gpu_cpu-001e8d9aac02291a.exe!vello::wgpu_engine::BindMapBuffer::upload_if_needed(vello::recording::BufferProxy * self, wgpu::Device * proxy, wgpu::Queue * device, vello::wgpu_engine::ResourcePool * queue) Line 996 (c:\Users\Tom\Documents\GitHub\vello\vello\src\wgpu_engine.rs:996)
compare_gpu_cpu-001e8d9aac02291a.exe!vello::wgpu_engine::TransientBindMap::create_bind_group(vello::wgpu_engine::BindMap * self, vello::wgpu_engine::ResourcePool * bind_map, wgpu::Device * pool, wgpu::Queue * device, wgpu::CommandEncoder * queue, wgpu::BindGroupLayout * encoder, ref$<slice2$<enum2$<vello::recording::ResourceProxy>>> layout) Line 1083 (c:\Users\Tom\Documents\GitHub\vello\vello\src\wgpu_engine.rs:1083)
compare_gpu_cpu-001e8d9aac02291a.exe!vello::wgpu_engine::WgpuEngine::run_recording(wgpu::Device * self, wgpu::Queue * device, vello::recording::Recording * queue, ref$<slice2$<enum2$<vello::wgpu_engine::ExternalResource>>> recording, ref$<str$>) Line 542 (c:\Users\Tom\Documents\GitHub\vello\vello\src\wgpu_engine.rs:542)
compare_gpu_cpu-001e8d9aac02291a.exe!vello::Renderer::render_to_texture(wgpu::Device * self, wgpu::Queue * device, vello::scene::Scene * queue, wgpu::TextureView * scene, vello::RenderParams * texture) Line 396 (c:\Users\Tom\Documents\GitHub\vello\vello\src\lib.rs:396)
compare_gpu_cpu-001e8d9aac02291a.exe!vello_tests::get_scene_image::async_fn$0(core::pin::Pin<ref_mut$<enum2$<vello_tests::get_scene_image::async_fn_env$0>>>, core::task::wake::Context *) Line 119 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\src\lib.rs:119)
compare_gpu_cpu-001e8d9aac02291a.exe!vello_tests::render_then_debug::async_fn$0(core::pin::Pin<ref_mut$<enum2$<vello_tests::render_then_debug::async_fn_env$0>>>, core::task::wake::Context *) Line 54 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\src\lib.rs:54)
compare_gpu_cpu-001e8d9aac02291a.exe!vello_tests::compare::compare_gpu_cpu::async_fn$0(core::pin::Pin<ref_mut$<enum2$<vello_tests::compare::compare_gpu_cpu::async_fn_env$0>>>, core::task::wake::Context *) Line 99 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\src\compare.rs:99)
compare_gpu_cpu-001e8d9aac02291a.exe!pollster::block_on<enum2$<vello_tests::compare::compare_gpu_cpu::async_fn_env$0>>(enum2$<vello_tests::compare::compare_gpu_cpu::async_fn_env$0> fut) Line 128 (c:\Users\Tom\.cargo\registry\src\index.crates.io-6f17d22bba15001f\pollster-0.3.0\src\lib.rs:128)
compare_gpu_cpu-001e8d9aac02291a.exe!vello_tests::compare::compare_gpu_cpu_sync(vello::scene::Scene scene, vello_tests::TestParams params) Line 90 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\src\compare.rs:90)
compare_gpu_cpu-001e8d9aac02291a.exe!compare_gpu_cpu::compare_test_scene(scenes::ExampleScene test_scene, vello_tests::TestParams params) Line 19 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\tests\compare_gpu_cpu.rs:19)
compare_gpu_cpu-001e8d9aac02291a.exe!compare_gpu_cpu::compare_stroke_styles_skew() Line 65 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\tests\compare_gpu_cpu.rs:65)
compare_gpu_cpu-001e8d9aac02291a.exe!compare_gpu_cpu::compare_stroke_styles_skew::closure$0(compare_gpu_cpu::compare_stroke_styles_skew::closure_env$0 *) Line 62 (c:\Users\Tom\Documents\GitHub\vello\vello_tests\tests\compare_gpu_cpu.rs:62)

And here is a stack trace from Ubuntu 22.04:

ash::extensions::ext::debug_utils::<impl ash::extensions_generated::ext::debug_utils::Device>::set_debug_utils_object_name (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ash-0.38.0+1.3.281/src/extensions/ext/debug_utils.rs:15)
wgpu_hal::vulkan::device::<impl wgpu_hal::vulkan::DeviceShared>::set_object_name (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-22.0.0/src/vulkan/device.rs:50)
wgpu_hal::vulkan::command::<impl wgpu_hal::CommandEncoder for wgpu_hal::vulkan::CommandEncoder>::begin_encoding (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-22.0.0/src/vulkan/command.rs:72)
wgpu_core::command::CommandEncoder<A>::open (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-22.1.0/src/command/mod.rs:226)
wgpu_core::command::compute::<impl wgpu_core::global::Global>::compute_pass_end_impl (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-22.1.0/src/command/compute.rs:656)
wgpu_core::command::compute::<impl wgpu_core::global::Global>::compute_pass_end (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-22.1.0/src/command/compute.rs:368)
<wgpu_core::command::compute::ComputePass<A> as wgpu_core::command::dyn_compute_pass::DynComputePass>::end (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-22.1.0/src/command/dyn_compute_pass.rs:172)
<wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::compute_pass_end (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-22.1.0/src/backend/wgpu_core.rs:2585)
<T as wgpu::context::DynContext>::compute_pass_end (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-22.1.0/src/context.rs:3284)
<wgpu::ComputePassInner as core::ops::drop::Drop>::drop (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-22.1.0/src/lib.rs:5094)
core::ptr::drop_in_place<wgpu::ComputePassInner> (@core::ptr::drop_in_place<wgpu::ComputePassInner>:8)
core::ptr::drop_in_place<wgpu::ComputePass> (@core::ptr::drop_in_place<wgpu::ComputePass>:6)
vello::wgpu_engine::WgpuEngine::run_recording (/home/tom/Desktop/vello/vello/src/wgpu_engine.rs:564)
vello::Renderer::render_to_texture (/home/tom/Desktop/vello/vello/src/lib.rs:396)
vello_tests::get_scene_image::{{closure}} (/home/tom/Desktop/vello/vello_tests/src/lib.rs:119)
vello_tests::render_then_debug::{{closure}} (/home/tom/Desktop/vello/vello_tests/src/lib.rs:54)
vello_tests::compare::compare_gpu_cpu::{{closure}} (/home/tom/Desktop/vello/vello_tests/src/compare.rs:97)
pollster::block_on (/home/tom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128)
vello_tests::compare::compare_gpu_cpu_sync (/home/tom/Desktop/vello/vello_tests/src/compare.rs:90)
compare_gpu_cpu::compare_test_scene (/home/tom/Desktop/vello/vello_tests/tests/compare_gpu_cpu.rs:19)
compare_gpu_cpu::compare_blurred_rounded_rect (/home/tom/Desktop/vello/vello_tests/tests/compare_gpu_cpu.rs:99)
compare_gpu_cpu::compare_blurred_rounded_rect::{{closure}} (/home/tom/Desktop/vello/vello_tests/tests/compare_gpu_cpu.rs:95)

@DJMcNab DJMcNab removed this from the Vello 0.3 release milestone Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants