| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Fixes Ori and the Blind Forest's menu on GLASM. For some reason
(probably high level optimizations) it is not sanitized on SPIR-V for
OpenGL. Vulkan is unaffected by this change.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Move code to separate files to be able to reuse it from OpenGL. This
greatly simplifies the pipeline cache logic on Vulkan.
Transform feedback state is not yet abstracted and it's still
intrusively stored inside vk_pipeline_cache. It will be moved when
needed on OpenGL.
|
| |
|
|
|
|
|
|
|
| |
Use its std::stop_token to abort shader cache loading.
Using std::stop_token instead of std::atomic_bool allows the usage of
other utilities like std::stop_callback.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Setting __GL_SHADER_DISK_CACHE_PATH we can force the cache directory to
be in yuzu's user directory to stop commonly distributed malware from
deleting our driver shader cache. And by setting
__GL_SHADER_DISK_CACHE_SKIP_CLEANUP we can have an unbounded shader
cache size.
This has only been implemented on Windows, mostly because previous tests
didn't seem to work on Linux.
Disable the precompiled cache on Nvidia's driver. There's no need to
hide information the driver already has in its own cache.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
|
|
|
|
|
|
| |
With C++20, we can use the more concise contains() member function
instead of comparing the result of the find() call with the end
iterator.
|
|
|
|
|
| |
Cleans out the rest of the occurrences of variable shadowing and makes
any further occurrences of shadowing compiler errors.
|
|
|
|
|
|
| |
Resolves variable shadowing scenarios up to the end of the OpenGL code
to make it nicer to review. The rest will be resolved in a following
commit.
|
|
|
|
|
|
|
|
|
| |
Now that the GPU is initialized when video backends are initialized,
it's no longer needed to query components once the game is running: it
can be done when yuzu is booting.
This allows us to pass components between constructors and in the
process remove all Core::System references in the video backend.
|
|
|
|
|
| |
This allows us passing any type of string and hinting the length of the
string to the OpenGL driver.
|
|
|
|
| |
Does not allocate more threads than available in the host system for boot-time shader compilation and always allocates at least 1 thread if hardware_concurrency() returns 0.
|
|\
| |
| | |
video_core: Allow copy elision to take place where applicable
|
| |
| |
| |
| |
| | |
Removes const from some variables that are returned from functions, as
this allows the move assignment/constructors to execute for them.
|
|/
|
|
| |
Silences several compiler warnings about unused variables.
|
| |
|
|
|
|
|
|
| |
All programs had a size of zero due to this bug, skipping invalidations.
While we are at it, remove some unused forward declarations.
|
|\
| |
| | |
gl_arb_decompiler: Implement an assembly shader decompiler
|
| |
| |
| |
| |
| |
| | |
Emit code compatible with NV_gpu_program5.
This should emit code compatible with Fermi, but it wasn't tested on
that architecture. Pascal has some issues not present on Turing GPUs.
|
| |
| |
| |
| | |
Trivial port the generic shader cache to Vulkan.
|
|/
|
|
| |
Trivially port the generic shader cache to OpenGL.
|
|
|
|
|
| |
Avoids compilation errors at the cost of shader build times and runtime
performance when a game hits the limit of uniform buffers we can use.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add code required to use OpenGL assembly programs based on
NV_gpu_program5. Decompilation for ARB programs is intended to be added
in a follow up commit. This does **not** include ARB decompilation and
it's not in an usable state.
The intention behind assembly programs is to reduce shader stutter
significantly on drivers supporting NV_gpu_program5 (and other required
extensions). Currently only Nvidia's proprietary driver supports these
extensions.
Add a UI option hidden for now to avoid people enabling this option
accidentally.
This code path has some limitations that OpenGL compatibility doesn't
have:
- NV_shader_storage_buffer_object is limited to 16 entries for a single
OpenGL context state (I don't know if this is an intended limitation, an
specification issue or I am missing something). Currently causes issues
on The Legend of Zelda: Link's Awakening.
- NV_parameter_buffer_object can't bind buffers using an offset
different to zero. The used workaround is to copy to a temporary buffer
(this doesn't happen often so it's not an issue).
On the other hand, it has the following advantages:
- Shaders build a lot faster.
- We have control over how floating point rounding is done over
individual instructions (SPIR-V on Vulkan can't do this).
- Operations on shared memory can be unsigned and signed.
- Transform feedbacks are dynamic state (not yet implemented).
- Parameter buffers (uniform buffers) are per stage, matching NVN and
hardware's behavior.
- The API to bind and create assembly programs makes sense, unlike
ARB_separate_shader_objects.
|
|
|
|
|
|
|
|
| |
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.
While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
From my testing on a Splatoon 2 shader that takes 3800ms on average to
compile changing to FullDecompile reduces it to 900ms on average.
The shader decoder will automatically fallback to a more naive method if
it can't use full decompile.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes the GraphicsContext to be managed by the GPU core. This
eliminates the need for the frontends to fool around with tricky
MakeCurrent/DoneCurrent calls that are dependent on the settings (such
as async gpu option).
This also refactors out the need to use QWidget::fromWindowContainer as
that caused issues with focus and input handling. Now we use a regular
QWidget and just access the native windowHandle() directly.
Another change is removing the debug tool setting in FrameMailbox.
Instead of trying to block the frontend until a new frame is ready, the
core will now take over presentation and draw directly to the window if
the renderer detects that its hooked by NSight or RenderDoc
Lastly, since it was in the way, I removed ScopeAcquireWindowContext and
replaced it with a simple subclass in GraphicsContext that achieves the
same result
|
| |
|
|
|
|
|
| |
Registry consistency is something that practically can't happen and it
has a measurable runtime cost. Reduce it to a DEBUG_ASSERT.
|
|
|
|
|
| |
Store information GLSL forces us to provide but it's dynamic state in
hardware (workgroup sizes, primitive topology, shared memory size).
|
| |
|
|
|
|
|
| |
Instead of pre-specializing shaders and then post-specializing them,
drop the later and only "specialize" the shader while decoding it.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Given this isn't used, this can be removed entirely.
|
|
|
|
| |
Avoids several reallocations of std::vector instances where applicable.
|
|
|
|
| |
Eliminates a few unnecessary constructions of std::vectors.
|
|
|
|
|
|
|
|
| |
Remove false commentary. Not dividing by 4 the size of shared memory is
not a hack; it describes the number of integers, not bytes.
While we are at it sort the generated code to put preprocessor lines on
the top.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing new-line. This caused shaders using local memory and shared
memory to inject a preprocessor GLSL line after an expression (resulting
in invalid code).
It looked like this:
shared uint smem[8];#define LOCAL_MEMORY_SIZE 16
It should look like this (addressed by this commit):
shared uint smem[8];
\#define LOCAL_MEMORY_SIZE 16
|
|
|
|
|
|
|
|
| |
The current shared memory size seems to be smaller than what the game
actually uses. This makes Nvidia's driver consistently blow up; in the
case of FE3H it made it explode on Qt's SwapBuffers while SDL2 worked
just fine. For now keep this hack since it's still progress over the
previous hardcoded shared memory size.
|
| |
|
| |
|
|
|
|
|
| |
Images were not being bound to draw invocations because these would
require a cache invalidation.
|
|
|
|
|
| |
Local memory size in compute shaders was stubbed with an arbitary size.
This commit specializes local memory size from guest GPU parameters.
|
|
|
|
|
| |
Shared memory was being declared with an undefined size. Specialize from
guest GPU parameters the compute shader's shared memory size.
|
|
|
|
|
|
| |
Drop the usage of ARB_compute_variable_group_size and specialize compute
shaders instead. This permits compute to run on AMD and Intel
proprietary drivers.
|
|
|
|
|
| |
Instead of specializing shaders to separate texture buffers from 1D
textures, use the locker to deduce them while they are being decoded.
|
|\
| |
| | |
shader: Implement FSWZADD and reimplement SHFL
|
| |
| |
| |
| | |
Silence GLSL compilation warnings.
|
| | |
|
|/
|
|
| |
Properly pass engine when a shader is being constructed from memory.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
|
| |
|
|
|
|
|
| |
Now that ProgramVariants holds the primitive topology we no longer need
to keep track of individual geometry shaders topologies.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* texture_cache/surface_params: Remove unused local variable
* rasterizer_interface: Add missing documentation commentary
* maxwell_dma: Remove unused rasterizer reference
* video_core/gpu: Sort member declaration order to silent -Wreorder warning
* fermi_2d: Remove unused MemoryManager reference
* video_core: Silent unused variable warnings
* buffer_cache: Silent -Wreorder warnings
* kepler_memory: Remove unused MemoryManager reference
* gl_texture_cache: Add missing override
* buffer_cache: Add missing include
* shader/decode: Remove unused variables
|
|\
| |
| | |
gl_texture_cache: Miscellaneous texture buffer fixes
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
|
|\ \
| | |
| | | |
gl_rasterizer: Implement compute shaders
|
| | | |
|
| | | |
|
| | | |
|
| |/ |
|
|/ |
|
|\
| |
| | |
gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
|
|/ |
|
|\
| |
| | |
Implement a new Texture Cache
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
Fixes missing review comments introduced.
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This addresses a bug on geometry shaders where code was being written
before all #extension declarations were done. Ref to #2523
|
| |
|
| |
|
|
|
|
| |
Silences a -Wreorder warning.
|
| |
|
| |
|
|\
| |
| | |
gl_shader_decompiler: Disable variable AOFFI on unsupported devices
|
| | |
|
| | |
|
|/ |
|
|\
| |
| | |
video_core/texures/texture: Remove unnecessary includes
|
| |
| |
| |
| |
| |
| | |
Nothing in this header relies on common_funcs or the memory manager.
This gets rid of reliance on indirect inclusions in the OpenGL caches.
|
|\ \
| |/
|/| |
shader_cache: Permit a Null Shader in case of a bad host_ptr.
|
| | |
|
|\ \
| | |
| | | |
gl_shader_manager: Remove reliance on a global accessor within MaxwellUniformData::SetFromRegs()
|
| |/
| |
| |
| |
| |
| |
| |
| |
| | |
This isn't used at all in the OpenGL shader cache, so we can remove it's
include here, meaning one less file needs to be recompiled if any
changes ever occur within that header.
core/memory.h is also not used within this file at all, so we can remove
it as well.
|
|/
|
|
|
|
|
| |
Specifies the members in the same order that initialization would take
place in.
This also silences -Wreorder warnings.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
# Conflicts:
# src/video_core/engines/kepler_memory.cpp
# src/video_core/engines/maxwell_3d.cpp
# src/video_core/morton.cpp
# src/video_core/morton.h
# src/video_core/renderer_opengl/gl_global_cache.cpp
# src/video_core/renderer_opengl/gl_global_cache.h
# src/video_core/renderer_opengl/gl_rasterizer_cache.cpp
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
i965 (and probably all mesa drivers) require GL_PROGRAM_SEPARABLE when using
glProgramBinary. This is probably required by the standard but it's ignored by
permisive proprietary drivers.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This constant is related to the size of the instruction.
|
|
|
|
|
| |
The previous code would cause a warning, as it was truncating size_t
(64-bit) to a u32 (32-bit) implicitly.
|
| |
|
| |
|
|\
| |
| | |
gl_shader_decompiler: Guard out of bound geometry shader input reads
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Geometry shaders follow a pattern that results in out of bound reads.
This pattern is:
- VSETP to predicate
- Use that predicate to conditionally set a register a big number
- Use the register to access geometry shaders
At the time of writing this commit I don't know what's the intent of
this number. Some drivers argue about these out of bound reads. To avoid
this issue, input reads are guarded limiting reads to the highest
posible vertex input of the current topology (e.g. points to 1 and
triangles to 3).
|
|/
|
|
|
|
| |
Rather than have a transparent dependency, we can make it explicit in
the interface. This also gets rid of the need to put the core include in
a header.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
* Added glObjectLabels for renderdoc for textures and shader programs
* Changed hardcoded "Texture" name to reflect the texture type instead
* Removed string initialize
|
| |
|
|
|
|
|
|
|
|
| |
- Fixed all warnings, for renderer_opengl items, which were indicating a
possible incorrect behavior from integral promotion rules and types
larger than those in which arithmetic is typically performed.
- Added const for variables where possible and meaningful.
- Added constexpr where possible.
|
|
|
|
|
|
|
| |
The std::string generation with its malloc and free requirement
was a noticeable overhead. Also switch to an ordered_map to
avoid the std::hash call. As those maps usually have a size of
two elements, the lookup time shall not matter.
|
| |
|
|
|
|
|
|
|
|
| |
Given std::vector is a type with a non-trivial destructor, this
variable cannot be optimized away by the compiler, even if unused.
Because of that, something that was intended to be fairly lightweight,
was actually allocating 32KB and deallocating it at the end of the
function.
|
|
|