summaryrefslogtreecommitdiffstats
path: root/src/video_core/renderer_vulkan/vk_compute_pass.cpp (follow)
Commit message (Collapse)AuthorAgeFilesLines
* astc_decoder: Reduce workgroup sizeameerj2021-08-011-2/+2
| | | | This reduces the amount of over dispatching when there are odd dimensions (i.e. ASTC 8x5), which rarely evenly divide into 32x32.
* astc_decoder: Compute offset swizzles in-shaderameerj2021-08-011-62/+5
| | | | Alleviates the dependency on the swizzle table and a uniform which is constant for all ASTC texture sizes.
* astc_decoder: Optimize the use EncodingDataameerj2021-08-011-33/+9
| | | | | | | This buffer was a list of EncodingData structures sorted by their bit length, with some duplication from the cpu decoder implementation. We can take advantage of its sorted property to optimize its usage in the shader. Thanks to wwylele for the optimization idea.
* vk_compute_pass: Remove unused capturesLioncash2021-07-271-3/+2
| | | | Resolves two compiler warnings.
* vk_compute_pass: Fix pipeline barrier for indexed quadsReinUsesLisp2021-07-261-1/+1
| | | | Use an index buffer barrier instead of a vertex input read barrier.
* vk_compute_pass: Fix -Wshadow warningReinUsesLisp2021-07-231-3/+3
|
* vulkan: Defer descriptor set work to the Vulkan threadReinUsesLisp2021-07-231-24/+21
| | | | | | | Move descriptor lookup and update code to a separate thread. Delaying this removes work from the main GPU thread and allows creating descriptor layouts on another thread. This reduces a bit the workload of the main thread when new pipelines are encountered.
* vulkan: Rework descriptor allocation algorithmReinUsesLisp2021-07-231-96/+104
| | | | | | Create multiple descriptor pools on demand. There are some degrees of freedom what is considered a compatible pool to avoid wasting large pools on small descriptors.
* nsight_aftermath_tracker: Report used shaders to Nsight AftermathReinUsesLisp2021-07-231-0/+1
|
* vk_compute_pass: Fix compute passesReinUsesLisp2021-07-231-22/+17
|
* vulkan: Create pipeline layouts in separate threadsReinUsesLisp2021-07-231-1/+1
|
* spirv: Add lower fp16 to fp32 passReinUsesLisp2021-07-231-0/+3
|
* vk_compute_pass: Fix pipeline barriers on non-initialized ASTC imagesReinUsesLisp2021-07-181-2/+3
|
* vk_compute_pass: Fix ASTC buffer setup synchronizationReinUsesLisp2021-07-181-14/+14
|
* astc_decoder.comp: Remove unnecessary LUT SSBOsameerj2021-06-191-64/+10
| | | | We can move them to instead be compile time constants within the shader.
* astc: Various robustness enhancements for the gpu decoderameerj2021-06-191-32/+7
| | | | | | These changes should help in reducing crashes/drivers panics that may occur due to synchronization issues between the shader completion and later access of the decoded texture.
* astc_decoder: Refactor for style and more efficient memory useameerj2021-03-251-79/+96
|
* astc_decoder: Reimplement LayersRodrigo Locatti2021-03-131-70/+88
| | | | Reimplements the approach to decoding layers in the compute shader. Fixes multilayer astc decoding when using Vulkan.
* renderer_vulkan: Accelerate ASTC decodingameerj2021-03-131-0/+298
| | | | Co-Authored-By: Rodrigo Locatti <reinuseslisp@airmail.cc>
* vk_staging_buffer_pool: Add stream buffer for small uploadsReinUsesLisp2021-02-131-35/+26
| | | | | | | | This uses a ring buffer similar to OpenGL's stream buffer for small uploads. This stops us from allocating several small buffers, reducing memory fragmentation and cache locality. It uses dedicated allocations when possible.
* video_core: Reimplement the buffer cacheReinUsesLisp2021-02-131-85/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reimplement the buffer cache using cached bindings and page level granularity for modification tracking. This also drops the usage of shared pointers and virtual functions from the cache. - Bindings are cached, allowing to skip work when the game changes few bits between draws. - OpenGL Assembly shaders no longer copy when a region has been modified from the GPU to emulate constant buffers, instead GL_EXT_memory_object is used to alias sub-buffers within the same allocation. - OpenGL Assembly shaders stream constant buffer data using glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In theory this should save one hash table resolve inside the driver compared to glBufferSubData. - A new OpenGL stream buffer is implemented based on fences for drivers that are not Nvidia's proprietary, due to their low performance on partial glBufferSubData calls synchronized with 3D rendering (that some games use a lot). - Most optimizations are shared between APIs now, allowing Vulkan to cache more bindings than before, skipping unnecesarry work. This commit adds the necessary infrastructure to use Vulkan object from OpenGL. Overall, it improves performance and fixes some bugs present on the old cache. There are still some edge cases hit by some games that harm performance on some vendors, this are planned to be fixed in later commits.
* vulkan_memory_allocator: Add "download" memory usage hintReinUsesLisp2021-01-151-3/+3
| | | | | | | Allow users of the allocator to hint memory usage for downloads. This removes the non-descriptive boolean passed for "host visible" or not host visible memory commits, and uses an enum to hint device local, upload and download usages.
* vk_memory_manager: Improve memory manager and its APIReinUsesLisp2021-01-151-16/+16
| | | | | | | | | Fix a bug where the memory allocator could leave gaps between commits. To fix this the allocation algorithm was reworked, although it's still short in number of lines of code. Rework the allocation API to self-contained movable objects instead of naively using an unique_ptr to do the job for us. Remove the VK prefix.
* renderer_vulkan: Move device abstraction to vulkan_commonReinUsesLisp2021-01-041-1/+1
|
* renderer_vulkan: Rename VKDevice to DeviceReinUsesLisp2021-01-031-4/+4
| | | | | | | The "VK" prefix predates the "Vulkan" namespace. It was carried around the codebase for consistency. "VKDevice" currently is a bad alias with "VkDevice" (only an upcase character of difference) that can cause confusion. Rename all instances of it.
* vulkan_common: Rename renderer_vulkan/wrapper.h to vulkan_common/vulkan_wrapper.hReinUsesLisp2020-12-311-1/+1
| | | | Allows sharing Vulkan wrapper code between different rendering backends.
* video_core: Rewrite the texture cacheReinUsesLisp2020-12-301-314/+13
| | | | | | | | | | | | | | The current texture cache has several points that hurt maintainability and performance. It's easy to break unrelated parts of the cache when doing minor changes. The cache can easily forget valuable information about the cached textures by CPU writes or simply by its normal usage.The current texture cache has several points that hurt maintainability and performance. It's easy to break unrelated parts of the cache when doing minor changes. The cache can easily forget valuable information about the cached textures by CPU writes or simply by its normal usage. This commit aims to address those issues.
* video_core: Resolve more variable shadowing scenarios pt.2Lioncash2020-12-051-20/+20
| | | | | | | Migrates the video core code closer to enabling variable shadowing warnings as errors. This primarily sorts out shadowing occurrences within the Vulkan code.
* renderer_vulkan: Make unconditional use of VK_KHR_timeline_semaphoreReinUsesLisp2020-09-191-10/+13
| | | | | | | | | | | | | | | | | | | | | | | This reworks how host<->device synchronization works on the Vulkan backend. Instead of "protecting" resources with a fence and signalling these as free when the fence is known to be signalled by the host GPU, use timeline semaphores. Vulkan timeline semaphores allow use to work on a subset of D3D12 fences. As far as we are concerned, timeline semaphores are a value set by the host or the device that can be waited by either of them. Taking advantange of this, we can have a monolithically increasing atomic value for each submission to the graphics queue. Instead of protecting resources with a fence, we simply store the current logical tick (the atomic value stored in CPU memory). When we want to know if a resource is free, it can be compared to the current GPU tick. This greatly simplifies resource management code and the free status of resources should have less false negatives. To workaround bugs in validation layers, when these are attached there's a thread waiting for timeline semaphores.
* vk_compute_pass: Make use of designated initializers where applicableLioncash2020-07-161-95/+99
| | | | Note: Some barriers can't be converted over yet, as they ICE MSVC.
* vulkan: Remove unnecessary includesLioncash2020-04-291-1/+1
| | | | | | | Reduces some header churn and reduces rebuilds when some header internals change. While we're at it we can also resolve a missing include in buffer_cache.
* vk_compute_pass: Implement indexed quadsReinUsesLisp2020-04-171-8/+197
| | | | | | | | | | | | | Implement indexed quads (GL_QUADS used with glDrawElements*) with a compute pass conversion. The compute shader converts from uint8/uint16/uint32 indices to uint32. The format is passed through push constants to avoid having different variants of the same shader. - Used by Fast RMX - Used by Xenoblade Chronicles 2 (it still has graphical due to synchronization issues on Vulkan)
* buffer_cache: Return handles instead of pointer to handlesReinUsesLisp2020-04-161-8/+8
| | | | | | | | | | | The original idea of returning pointers is that handles can be moved. The problem is that the implementation didn't take that in mind and made everything harder to work with. This commit drops pointer to handles and returns the handles themselves. While it is still true that handles can be invalidated, this way we get an old handle instead of a dangling pointer. This problem can be solved in the future with sparse buffers.
* renderer_vulkan: Drop Vulkan-HppReinUsesLisp2020-04-111-76/+165
|
* vk_compute_pass: Address feedbackRodrigo Locatti2020-01-111-0/+2
| | | Comment hardcoded SPIR-V modules.
* vk_compute_pass: Add compute passes to emulate missing Vulkan featuresReinUsesLisp2020-01-081-0/+337
This currently only supports quad arrays and u8 indices. In the future we can remove quad arrays with a table written from the CPU, but this was used to bootstrap the other passes helpers and it was left in the code. The blob code is generated from the "shaders/" directory. Read the instructions there to know how to generate the SPIR-V.