| Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Please enter the commit message for your changes. Lines starting
|
|
|
|
Co-Authored-By: Mat M. <mathew1800@gmail.com>
|
|
|
|
|
|
|
|
|
|
Layered framebuffer attachments is a feature that allows applications to
write attach layered textures to a single attachment. What layer the
fragments are written to is decided from the shader using gl_Layer.
|
|
SPIR-V's Layer is GLSL's gl_Layer. It lets the application choose from a
shader stage (vertex, tessellation or geometry) which framebuffer layer
write the output fragments to.
|
|
|
|
Pass instanced state of a draw invocation as an argument instead of
having two separate virtual methods.
|
|
|
|
Vulkan's VertexIndex and InstanceIndex don't match with hardware. This
is because Nvidia implements gl_VertexID and gl_InstanceID. The math
that relates these is:
gl_VertexIndex = gl_BaseVertex + gl_VertexID
gl_InstanceIndex = gl_InstanceIndex + gl_InstanceID
To emulate it using what Vulkan's SPIR-V offers (the *Index variants)
this commit substracts gl_Base* from gl_*Index to obtain the OpenGL and
hardware's equivalent.
|
|
Removes bounds checking from "texceptions" instances.
|
|
Adds a Qt and SDL2 frontend for Vulkan. It also finishes the missing
bits on Vulkan initialization.
|
|
|
|
ATOM operates atomically on global memory. For now only add ATOM.ADD
since that's what was found in commercial games.
This asserts for ATOM.ADD.S32 (handling the others as unimplemented),
although ATOM.ADD.U32 shouldn't be any different.
This change forces us to change the default type on SPIR-V storage
buffers from float to uint. We could also alias the buffers, but it's
simpler for now to just use uint. While we are at it, abstract the code
to avoid repetition.
|
|
|
|
|
|
|
|
Some games like The Legend of Zelda: Breath of the Wild assign
render targets without writing them from the fragment shader. This
generates Vulkan validation errors, so silence these I previously
introduced a commit to set "vec4(0, 0, 0, 1)" for these attachments. The
problem is that this is not what games expect. This commit reverts that
change.
|
|
|
|
This abstraction takes care of presenting accelerated and
non-accelerated or "framebuffer" images to the Vulkan swapchain.
|
|
Also updates sirit to include atomic instructions.
|
|
Front face was being forced to a certain value when cull face is
disabled. Set a default value on initialization and drop the forcefully
set front facing value with culling disabled.
|
|
|
|
This abstraction is Vulkan's equivalent to OpenGL's rasterizer. It takes
care of joining all parts of the backend and rendering accordingly on
demand.
|
|
|
|
|
|
|
|
Co-Authored-By: MysticExile <30736337+MysticExile@users.noreply.github.com>
|
|
It currently ignores PBO linearizations since these should be dropped as
soon as possible on OpenGL.
|
|
Comment hardcoded SPIR-V modules.
|
|
Nvidia's driver defaults invalid enumerations to GL_CLAMP. Vulkan
doesn't expose GL_CLAMP through its API, but we can hack it on Nvidia's
driver using the internal driver defaults.
|
|
This currently only supports quad arrays and u8 indices.
In the future we can remove quad arrays with a table written from the
CPU, but this was used to bootstrap the other passes helpers and it
was left in the code.
The blob code is generated from the "shaders/" directory. Read the
instructions there to know how to generate the SPIR-V.
|
|
|
|
Given a pipeline key, this cache returns a pipeline abstraction (for
graphics or compute).
|
|
This abstractio represents the state of the 3D engine at a given draw.
Instead of changing individual bits of the pipeline how it's done in
APIs like D3D11, OpenGL and NVN; on Vulkan we are forced to put
everything together into a single, immutable object.
It takes advantage of the few dynamic states Vulkan offers.
|
|
This abstraction represents a Vulkan compute pipeline.
|
|
This function allows us to share code between compute and graphics
pipelines compilation.
|
|
|
|
|
|
The renderpass cache is used to avoid creating renderpasses on each
draw. The hashed structure is not currently optimized.
|
|
The update descriptor is used to store in flat memory a large chunk of
staging data used to update descriptor sets through templates. It
provides a push interface to easily insert descriptors following the
current pipeline. The order used in the descriptor update template has
to be implicitly followed. We can catch bugs here using validation
layers.
|
|
The stream buffer before this commit once it was full (no more bytes to
write before looping) waiting for all previous operations to finish.
This was a temporary solution and had a noticeable performance penalty
in performance (from what a profiler showed).
To avoid this mark with fences usages of the stream buffer and once it
loops wait for them to be signaled. On average this will never wait.
Each fence knows where its usage finishes, resulting in a non-paged
stream buffer.
On the other side, the buffer cache is reimplemented using the generic
buffer cache. It makes use of the staging buffer pool and the new
stream buffer.
|
|
* Allocate memory in discrete exponentially increasing chunks until the
128 MiB threshold. Allocations larger thant that increase linearly by
256 MiB (depending on the required size). This allows to use small
allocations for small resources.
* Move memory maps to a RAII abstraction. To optimize for debugging
tools (like RenderDoc) users will map/unmap on usage. If this ever
becomes a noticeable overhead (from my profiling it doesn't) we can
transparently move to persistent memory maps without harming the API,
getting optimal performance for both gameplay and debugging.
* Improve messages on exceptional situations.
* Fix typos "requeriments" -> "requirements".
* Small style changes.
|
|
This is intended for a follow up commit to avoid circular dependencies.
|
|
|
|
Co-Authored-By: Mat M. <mathew1800@gmail.com>
|
|
Create a large descriptor pool where we allocate all our descriptors
from. It has to be wide enough to support any pipeline, hence its large
numbers.
If the descritor pool is filled, we allocate more memory at that moment.
This way we can take advantage of permissive drivers like Nvidia's that
allocate more descriptors than what the spec requires.
|
|
This commit introduces a mechanism by which shader IR code can be
amended and extended. This useful for track algorithms where certain
information can derived from before the track such as indexes to array
samplers.
|
|
|
|
|
|
The job of this abstraction is to provide staging buffers for temporary
operations. Think of image uploads or buffer uploads to device memory.
It automatically deletes unused buffers.
|
|
This object's job is to contain an image and manage its transitions.
Since Nvidia hardware doesn't know what a transition is but Vulkan
requires them anyway, we have to state track image subresources
individually.
To avoid the overhead of tracking each subresource in images with many
subresources (think of cubemap arrays with several mipmaps), this commit
tracks when subresources have diverged. As long as this doesn't happen
we can check the state of the first subresource (that will be shared
with all subresources) and update accordingly.
Image transitions are deferred to the scheduler command buffer.
|
|
Marks as noexcept Hash, operator== and operator!= for consistency.
|
|
The intention behind this hasheable structure is to describe the state
of fixed function pipeline state that gets compiled to a single graphics
pipeline state object. This is all dynamic state in OpenGL but Vulkan
wants it in an immutable state, even if hardware can edit it freely.
In this commit the structure is defined in an optimized state (it uses
booleans, has paddings and many data entries that can be packed to
single integers). This is intentional as an initial implementation that
is easier to debug, implement and review. It will be optimized in later
stages, or it might change if Vulkan gets more dynamic states.
|
|
ExprCondCode visit implements the generic Visit. Use this instead of
that one.
As an intended side effect this fixes unwritten memory usages in cases
when a negation of a condition code is used.
|
|
|
|
Notify the programmer when a request to release a fence is invalid
because the fence is already free.
|
|
This allows us to put VKFenceWatch inside a std::vector without storing
it in heap. On move we have to signal the fences where the new protected
resource is, adding some overhead.
|
|
VK_NV_device_diagnostic_checkpoints allows us to push data to a Vulkan
queue and then query it even after a device loss. This allows us to push
the current pipeline object and see what was the call that killed the
device.
|
|
When full decompilation was enabled, labels were not being inserted and
instructions were misused. Fix these bugs.
|
|
Avoid changing gl_Position when the NDC used by the game is [0, 1]
(Vulkan's native).
|
|
Some games write from fragment shaders to an unexistant framebuffer
attachment or they don't write to one when it exists in the framebuffer.
Fix this by skipping writes or adding zeroes.
|
|
|
|
|
|
These shaders are used to specify code that is not dynamically generated
in the Vulkan backend. Instead of packing it inside the build system,
it's manually built and copied to the C++ file to avoid adding
unnecessary build time dependencies.
quad_array should be dropped in the future since it can be emulated with
a memory pool generated from the CPU.
|
|
A1B5G5R5 uses A1R5G5B5. This is flipped with image view swizzles;
flushing is still not properly implemented on Vulkan for this particular
format.
|
|
|
|
Add an extra argument to query device capabilities in the future. The
intention behind this is to use native quads, quad strips, line loops
and polygons if these are released for Vulkan.
|
|
The OpenGL spec defines GL_CLAMP's formula similarly to CLAMP_TO_EDGE
and CLAMP_TO_BORDER depending on the filter mode used. It doesn't
exactly behave like this, but it's the closest we can get with what
Vulkan offers without emulating it by injecting shader code.
|
|
|
|
Introduce a worker thread approach for delegating Vulkan work derived
from dxvk's approach. https://github.com/doitsujin/dxvk
Now that the scheduler is what handles all Vulkan work related to
command streaming, store state tracking in itself. This way we can know
when to reupload Vulkan dynamic state to the queue (since this one is
invalidated between command buffers unlike NVN). We can also store the
renderpass state and graphics pipeline bound to avoid redundant binds
and renderpass begins/ends.
|
|
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
|
|
|
|
|
|
|
|
Update Sirit and its usage in vk_shader_decompiler. Highlights:
- Implement tessellation shaders
- Implement geometry shaders
- Implement some missing features
- Use native half float instructions when available.
|
|
- Setup more features and requirements.
- Improve logging for missing features.
- Collect telemetry parameters.
- Add queries for more image formats.
- Query push constants limits.
- Optionally enable some extensions.
|
|
|
|
We don't know until the game is running if it's using an sRGB color
space or not. Add support for hot-swapping swapchain surface formats.
|
|
With all of the interfaces ready for migration, it's trivial to migrate
over GetPointer().
|
|
Amends a few interfaces to be able to handle the migration over to the
new Memory class by passing the class by reference as a function
parameter where necessary.
Notably, within the filesystem services, this eliminates two ReadBlock()
calls by using the helper functions of HLERequestContext to do that for
us.
|
|
|
|
Abstracted ComponentType was not being used in a meaningful way.
This commit drops its usage.
There is one place where it was being used to test compatibility between
two cached surfaces, but this one is implied in the pixel format.
Removing the component type test doesn't change the behaviour.
|
|
|
|
|
|
|
|
|
|
|
|
These parameters aren't actually modified in any way, so they can be
made const references.
|
|
This would previously result in NeverExecute and UnusedIndex being
treated as regular predicates.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
|
|
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
|
|
|
|
* Increase minimum Vulkan requirements
* Require VK_EXT_vertex_attribute_divisor
* Require depthClamp, samplerAnisotropy and largePoints features
* Search and expose VK_KHR_uniform_buffer_standard_layout
* Search and expose VK_EXT_index_type_uint8
* Search and expose native float16 arithmetics
* Track current driver with VK_KHR_driver_properties
* Query and expose SSBO alignment
* Query more image formats
* Improve logging overall
* Minor style changes
* Minor rephrasing of commentaries
|
|
|
|
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
|
|
This commit takes care of implementing the F16 Variants of the
conversion instructions and makes sure conversions are done.
|
|
|
|
|
|
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
|
|
These are no longer used within this header, so they can be removed.
|
|
|
|
Instead of passing by copy an execution context through out the whole
Vulkan call hierarchy, use a command buffer view and fence view
approach.
This internally dereferences the command buffer or fence forcing the
user to be unable to use an outdated version of it on normal usage.
It is still possible to keep store an outdated if it is casted to
VKFence& or vk::CommandBuffer.
While changing this file, add an extra parameter for Flush and Finish to
allow releasing the fence from this calls.
|
|
|
|
Hardware testing revealed that SSY and PBK push to a different stack,
allowing code like this:
SSY label1;
PBK label2;
SYNC;
label1: PBK;
label2: EXIT;
|
|
Instead of having a vector of unique_ptr stored in a vector and
returning star pointers to this, use shared_ptr. While changing
initialization code, move it to a separate file when possible.
This is a first step to allow code analysis and node generation beyond
the ShaderIR class.
|
|
|
|
|
|
Fix missing OpSelectionMerge instruction. This caused devices loses on
most hardware, Intel didn't care.
Fix [-1;1] -> [0;1] depth conversions.
Conditionally use VK_EXT_scalar_block_layout. This allows us to use
non-std140 layouts on UBOs.
Update external Vulkan headers.
|
|
Keeps track of native ASTC support, VK_EXT_scalar_block_layout
availability and SSBO range.
Check for independentBlend and vertexPipelineStorageAndAtomics as a
required feature. Always enable it.
Use vk::to_string format to log Vulkan enums.
Style changes.
|
|
|
|
|
|
|
|
This PR should heavily reduce memory usage since temporal buffers are no
longer stored per Surface but instead managed by the Rasterizer Cache.
|
|
flushing is now responsability of children caches instead of the cache
object. This change will allow the specific cache to pass extra
parameters on flushing and will allow more flexibility.
|
|
|
|
|
|
Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Replaces header inclusions with forward declarations where applicable
and also removes unused headers within the cpp file. This reduces a few
more dependencies on core/memory.h
|
|
|
|
|
|
Specifies the members in the same order that initialization would take
place in.
This also silences -Wreorder warnings.
|
|
Ensures that the signatures will always match with the base class.
Also silences a few compilation warnings.
|
|
|
|
|
|
Co-Authored-By: ReinUsesLisp <reinuseslisp@airmail.cc>
|
|
|
|
Removes a few unnecessary dependencies on core-related machinery, such
as the core.h and memory.h, which reduces the amount of rebuilding
necessary if those files change.
This also uncovered some indirect dependencies within other source
files. This also fixes those.
|
|
|
|
|
|
This buffer cache is just like OpenGL's buffer cache with some minor
style changes. It uses VKStreamBuffer.
|
|
Reorders members in the order that they would actually be initialized
in. Silences a -Wreorder warning.
|
|
|
|
This manages two kinds of streaming buffers: one for unified memory
models and one for dedicated GPUs. The first one skips the copy from the
staging buffer to the real buffer, since it creates an unified buffer.
This implementation waits for all fences to finish their operation
before "invalidating". This is suboptimal since it should allocate
another buffer or start searching from the beginning. There is room for
improvement here.
This could also handle AMD's "pinned" memory (a heap with 256 MiB) that
seems to be designed for buffer streaming.
|
|
|
|
VKMemoryCommitImpl was using as the end of its interval "begin + end".
That ended up wasting memory.
|
|
The scheduler abstracts command buffer and fence management with an
interface that's able to do OpenGL-like operations on Vulkan command
buffers.
It returns by value a command buffer and fence that have to be used for
subsequent operations until Flush or Finish is executed, after that the
current execution context (the pair of command buffers and fences) gets
invalidated a new one must be fetched. Thankfully validation layers will
quickly detect if this is skipped throwing an error due to modifications
to a sent command buffer.
|
|
A memory manager object handles the memory allocations for a device. It
allocates chunks of Vulkan memory objects and then suballocates.
|
|
|
|
Handles a pool of resources protected by fences. Manages resource
overflow allocating more resources.
This class is intended to be used through inheritance.
|
|
CommitFence iterates a pool of fences until one is found. If all fences
are being used at the same time, allocate more.
|
|
A fence watch is used to keep track of the usage of a fence and protect
a resource or set of resources without having to inherit from their
handlers.
|
|
Fences take ownership of objects, protecting them from GPU-side or
driver-side concurrent access. They must be commited from the resource
manager. Their usage flow is: commit the fence from the resource
manager, protect resources with it and use them, send the fence to an
execution queue and Wait for it if needed and then call Release. Used
resources will automatically be signaled when they are free to be
reused.
|
|
VKResource is an interface that gets signaled by a fence when it is free
to be reused.
|
|
VKDevice contains all the data required to manage and initialize a
physical device. Its intention is to be passed across Vulkan objects to
query device-specific data (for example the logical device and the
dispatch loader).
|
|
This file is intended to be included instead of vulkan/vulkan.hpp. It
includes declarations of unique handlers using a dynamic dispatcher
instead of a static one (which would require linking to a Vulkan
library).
|