| Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
* Increase minimum Vulkan requirements
* Require VK_EXT_vertex_attribute_divisor
* Require depthClamp, samplerAnisotropy and largePoints features
* Search and expose VK_KHR_uniform_buffer_standard_layout
* Search and expose VK_EXT_index_type_uint8
* Search and expose native float16 arithmetics
* Track current driver with VK_KHR_driver_properties
* Query and expose SSBO alignment
* Query more image formats
* Improve logging overall
* Minor style changes
* Minor rephrasing of commentaries
|
|
|
|
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
|
|
This commit takes care of implementing the F16 Variants of the
conversion instructions and makes sure conversions are done.
|
|
|
|
|
|
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
|
|
These are no longer used within this header, so they can be removed.
|
|
|
|
Instead of passing by copy an execution context through out the whole
Vulkan call hierarchy, use a command buffer view and fence view
approach.
This internally dereferences the command buffer or fence forcing the
user to be unable to use an outdated version of it on normal usage.
It is still possible to keep store an outdated if it is casted to
VKFence& or vk::CommandBuffer.
While changing this file, add an extra parameter for Flush and Finish to
allow releasing the fence from this calls.
|
|
|
|
Hardware testing revealed that SSY and PBK push to a different stack,
allowing code like this:
SSY label1;
PBK label2;
SYNC;
label1: PBK;
label2: EXIT;
|
|
Instead of having a vector of unique_ptr stored in a vector and
returning star pointers to this, use shared_ptr. While changing
initialization code, move it to a separate file when possible.
This is a first step to allow code analysis and node generation beyond
the ShaderIR class.
|
|
|
|
|
|
Fix missing OpSelectionMerge instruction. This caused devices loses on
most hardware, Intel didn't care.
Fix [-1;1] -> [0;1] depth conversions.
Conditionally use VK_EXT_scalar_block_layout. This allows us to use
non-std140 layouts on UBOs.
Update external Vulkan headers.
|
|
Keeps track of native ASTC support, VK_EXT_scalar_block_layout
availability and SSBO range.
Check for independentBlend and vertexPipelineStorageAndAtomics as a
required feature. Always enable it.
Use vk::to_string format to log Vulkan enums.
Style changes.
|
|
|
|
|
|
|
|
This PR should heavily reduce memory usage since temporal buffers are no
longer stored per Surface but instead managed by the Rasterizer Cache.
|
|
flushing is now responsability of children caches instead of the cache
object. This change will allow the specific cache to pass extra
parameters on flushing and will allow more flexibility.
|
|
|
|
|
|
Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Replaces header inclusions with forward declarations where applicable
and also removes unused headers within the cpp file. This reduces a few
more dependencies on core/memory.h
|
|
|
|
|
|
Specifies the members in the same order that initialization would take
place in.
This also silences -Wreorder warnings.
|
|
Ensures that the signatures will always match with the base class.
Also silences a few compilation warnings.
|
|
|
|
|
|
Co-Authored-By: ReinUsesLisp <reinuseslisp@airmail.cc>
|
|
|
|
Removes a few unnecessary dependencies on core-related machinery, such
as the core.h and memory.h, which reduces the amount of rebuilding
necessary if those files change.
This also uncovered some indirect dependencies within other source
files. This also fixes those.
|
|
|
|
|
|
This buffer cache is just like OpenGL's buffer cache with some minor
style changes. It uses VKStreamBuffer.
|
|
Reorders members in the order that they would actually be initialized
in. Silences a -Wreorder warning.
|
|
|
|
This manages two kinds of streaming buffers: one for unified memory
models and one for dedicated GPUs. The first one skips the copy from the
staging buffer to the real buffer, since it creates an unified buffer.
This implementation waits for all fences to finish their operation
before "invalidating". This is suboptimal since it should allocate
another buffer or start searching from the beginning. There is room for
improvement here.
This could also handle AMD's "pinned" memory (a heap with 256 MiB) that
seems to be designed for buffer streaming.
|
|
|
|
VKMemoryCommitImpl was using as the end of its interval "begin + end".
That ended up wasting memory.
|
|
The scheduler abstracts command buffer and fence management with an
interface that's able to do OpenGL-like operations on Vulkan command
buffers.
It returns by value a command buffer and fence that have to be used for
subsequent operations until Flush or Finish is executed, after that the
current execution context (the pair of command buffers and fences) gets
invalidated a new one must be fetched. Thankfully validation layers will
quickly detect if this is skipped throwing an error due to modifications
to a sent command buffer.
|
|
A memory manager object handles the memory allocations for a device. It
allocates chunks of Vulkan memory objects and then suballocates.
|
|
|
|
Handles a pool of resources protected by fences. Manages resource
overflow allocating more resources.
This class is intended to be used through inheritance.
|
|
CommitFence iterates a pool of fences until one is found. If all fences
are being used at the same time, allocate more.
|
|
A fence watch is used to keep track of the usage of a fence and protect
a resource or set of resources without having to inherit from their
handlers.
|
|
Fences take ownership of objects, protecting them from GPU-side or
driver-side concurrent access. They must be commited from the resource
manager. Their usage flow is: commit the fence from the resource
manager, protect resources with it and use them, send the fence to an
execution queue and Wait for it if needed and then call Release. Used
resources will automatically be signaled when they are free to be
reused.
|
|
VKResource is an interface that gets signaled by a fence when it is free
to be reused.
|
|
VKDevice contains all the data required to manage and initialize a
physical device. Its intention is to be passed across Vulkan objects to
query device-specific data (for example the logical device and the
dispatch loader).
|
|
This file is intended to be included instead of vulkan/vulkan.hpp. It
includes declarations of unique handlers using a dynamic dispatcher
instead of a static one (which would require linking to a Vulkan
library).
|