summaryrefslogtreecommitdiffstats
path: root/src/video_core/shader (unfollow)
Commit message (Collapse)AuthorFilesLines
2021-02-15Review 1Kelebek11-1/+1
2021-02-15Implement texture offset support for TexelFetch and TextureGather and add offsets for TldsKelebek12-2/+10
Formatting
2021-02-14yuzu: Various frontend improvements to avoid crashes and improve experience on Linux.bunnei1-0/+1
2021-02-13video_core: Reimplement the buffer cacheReinUsesLisp3-9/+6
Reimplement the buffer cache using cached bindings and page level granularity for modification tracking. This also drops the usage of shared pointers and virtual functions from the cache. - Bindings are cached, allowing to skip work when the game changes few bits between draws. - OpenGL Assembly shaders no longer copy when a region has been modified from the GPU to emulate constant buffers, instead GL_EXT_memory_object is used to alias sub-buffers within the same allocation. - OpenGL Assembly shaders stream constant buffer data using glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In theory this should save one hash table resolve inside the driver compared to glBufferSubData. - A new OpenGL stream buffer is implemented based on fences for drivers that are not Nvidia's proprietary, due to their low performance on partial glBufferSubData calls synchronized with 3D rendering (that some games use a lot). - Most optimizations are shared between APIs now, allowing Vulkan to cache more bindings than before, skipping unnecesarry work. This commit adds the necessary infrastructure to use Vulkan object from OpenGL. Overall, it improves performance and fixes some bugs present on the old cache. There are still some edge cases hit by some games that harm performance on some vendors, this are planned to be fixed in later commits.
2021-01-25Revert "Start of Integer flags implementation"ReinUsesLisp3-59/+3
This reverts #4713. The implementation in that PR is not accurate. It does not reflect the behavior seen in hardware.
2021-01-24video_core: Silence -Wmissing-field-initializers warningsReinUsesLisp1-0/+18
2021-01-23shader_ir: Fix comment typoLevi Behunin1-1/+1
2021-01-04renderer_vulkan: Move device abstraction to vulkan_commonReinUsesLisp1-1/+1
2021-01-03renderer_vulkan: Rename VKDevice to DeviceReinUsesLisp2-3/+3
The "VK" prefix predates the "Vulkan" namespace. It was carried around the codebase for consistency. "VKDevice" currently is a bad alias with "VkDevice" (only an upcase character of difference) that can cause confusion. Rename all instances of it.
2020-12-30half_set: Resolve -Wmaybe-uninitialized warningsLioncash1-7/+7
2020-12-30video_core: Rewrite the texture cacheReinUsesLisp7-69/+70
The current texture cache has several points that hurt maintainability and performance. It's easy to break unrelated parts of the cache when doing minor changes. The cache can easily forget valuable information about the cached textures by CPU writes or simply by its normal usage.The current texture cache has several points that hurt maintainability and performance. It's easy to break unrelated parts of the cache when doing minor changes. The cache can easily forget valuable information about the cached textures by CPU writes or simply by its normal usage. This commit aims to address those issues.
2020-12-07video_core: Make use of ordered container contains() where applicableLioncash2-7/+7
With C++20, we can use the more concise contains() member function instead of comparing the result of the find() call with the end iterator.
2020-12-07ast: Improve string concat readability in operator()Lioncash1-5/+4
Provides an in-place format string to make it more pleasant to read.
2020-12-07shader_ir: std::move node within DeclareAmend()Lioncash1-2/+2
Same behavior, but elides an unnecessary atomic reference count increment and decrement.
2020-12-07video_core: Remove unnecessary enum class casting in logging messagesLioncash11-54/+44
fmt now automatically prints the numeric value of an enum class member by default, so we don't need to use casts any more. Reduces the line noise a bit.
2020-12-05video_core: Resolve more variable shadowing scenarios pt.3Lioncash12-48/+55
Cleans out the rest of the occurrences of variable shadowing and makes any further occurrences of shadowing compiler errors.
2020-12-05video_core: Resolve more variable shadowing scenarios pt.2Lioncash3-13/+13
Migrates the video core code closer to enabling variable shadowing warnings as errors. This primarily sorts out shadowing occurrences within the Vulkan code.
2020-12-03node: Mark member functions as [[nodiscard]] where applicableLioncash1-29/+29
Prevents logic bugs from accidentally ignoring the return value.
2020-12-03node: Eliminate variable shadowingLioncash1-47/+49
2020-11-20async_shaders: emplace threads into the worker thread vectorLioncash1-2/+2
Same behavior, but constructs the threads in place instead of moving them.
2020-11-20async_shaders: Simplify implementation of GetCompletedWork()Lioncash1-2/+1
This is equivalent to moving all the contents and then clearing the vector. This avoids a redundant allocation.
2020-11-20async_shaders: Simplify moving data into the pending queueLioncash1-13/+8
2020-11-20async_shaders: std::move data within QueueVulkanShader()Lioncash1-2/+2
Same behavior, but avoids redundant copies. While we're at it, we can simplify the pushing of the parameters into the pending queue.
2020-10-29async_shaders: Increase Async worker thread count for 8+ thread cpusameerj1-8/+9
Adds 1 async worker thread for every 2 available threads above 8
2020-10-28shader: Partially implement texture cube array shadowReinUsesLisp1-1/+0
This implements texture cube arrays with shadow comparisons but doesn't fix the asserts related to it. Fixes out of bounds reads on swizzle constructors and makes them use bounds checked ::at instead of the unsafe operator[].
2020-10-28shader/arithmetic: Implement FCMP immediate + register variantReinUsesLisp1-1/+2
Trivially add the encoding for this.
2020-10-08shader/texture: Implement CUBE texture type for TMML and fix arraysReinUsesLisp1-19/+22
TMML takes an array argument that has no known meaning, this one appears as the first component in gpr8 followed by s, t and r. Skip this component when arrays are being used. Also implement CUBE texture types. - Used by Pikmin 3: Deluxe Demo.
2020-09-25More forgetting... duhLevi Behunin1-2/+2
2020-09-25Forgot to apply suggestion here as wellLevi Behunin1-1/+1
2020-09-25Address CommentsLevi Behunin3-25/+34
2020-09-25Start of Integer flags implementationLevi Behunin3-3/+50
2020-09-24arithmetic_integer_immediate: Make use of std::move where applicableLioncash1-16/+19
Same behavior, minus any redundant atomic reference count increments and decrements.
2020-09-23shader/registry: Silence a -Wshadow warningLioncash2-5/+5
2020-09-23shader/registry: Remove unnecessary namespace qualifiersLioncash1-5/+3
Using statements already make these unnecessary.
2020-09-23shader/registry: Make use of designated initializers where applicableLioncash1-17/+19
Same behavior, less repetition.
2020-09-23control_flow: emplace elements in place within TryQuery()Lioncash1-6/+6
Places data structures where they'll eventually be moved to to avoid needing to even move them in the first place.
2020-09-23control_flow: Make use of std::move in InsertBranch()Lioncash1-7/+8
Avoids unnecessary atomic increments and decrements.
2020-09-22General: Make use of std::nullopt where applicableLioncash2-18/+11
Allows some implementations to avoid completely zeroing out the internal buffer of the optional, and instead only set the validity byte within the structure. This also makes it consistent how we return empty optionals.
2020-09-19renderer_vulkan: Make unconditional use of VK_KHR_timeline_semaphoreReinUsesLisp1-0/+11
This reworks how host<->device synchronization works on the Vulkan backend. Instead of "protecting" resources with a fence and signalling these as free when the fence is known to be signalled by the host GPU, use timeline semaphores. Vulkan timeline semaphores allow use to work on a subset of D3D12 fences. As far as we are concerned, timeline semaphores are a value set by the host or the device that can be waited by either of them. Taking advantange of this, we can have a monolithically increasing atomic value for each submission to the graphics queue. Instead of protecting resources with a fence, we simply store the current logical tick (the atomic value stored in CPU memory). When we want to know if a resource is free, it can be compared to the current GPU tick. This greatly simplifies resource management code and the free status of resources should have less false negatives. To workaround bugs in validation layers, when these are attached there's a thread waiting for timeline semaphores.
2020-09-17decode/image: Eliminate switch fallthrough in DecodeImage()Lioncash1-0/+1
Fortunately this didn't result in any issues, given the block that code was falling through to would immediately break.
2020-09-17decoder/texture: Eliminate narrowing conversion in GetTldCode()Lioncash1-1/+1
The assignment was previously truncating a u64 value to a bool.
2020-09-16video_core: Enforce -Werror=switchReinUsesLisp2-4/+13
This forces us to fix all -Wswitch warnings in video_core.
2020-09-06video_core: Remove all Core::System references in rendererReinUsesLisp2-9/+4
Now that the GPU is initialized when video backends are initialized, it's no longer needed to query components once the game is running: it can be done when yuzu is booting. This allows us to pass components between constructors and in the process remove all Core::System references in the video backend.
2020-08-24async_shaders: Mark getters as const member functionsLioncash2-17/+15
While we're at it, we can also mark them as nodiscard.
2020-08-16Remove unneeded newlines, optional Registry in shader paramsameerj2-8/+4
Addressing feedback from Rodrigo
2020-08-16Morph: Update worker allocation commentAmeer J1-1/+1
Co-authored-by: Morph <39850852+Morph1984@users.noreply.github.com>
2020-08-16move thread 1/4 count computation into allocate workers methodameerj2-3/+12
2020-08-16Address feedback, add shader compile notifier, update setting textameerj2-68/+65
2020-08-16Vk Async Worker directly emplace in cacheameerj1-53/+25
2020-08-16Address feedback. Bruteforce delete duplicatesameerj2-61/+78
2020-08-16Vk Async pipeline compilationameerj2-6/+84
2020-08-14shader/memory: Amend UNIMPLEMENTED_IF_MSG without a messageLioncash1-1/+2
We need to provide a message for this variant of the macro, so we can simply log out the type being used.
2020-08-14async_shaders: Resolve -Wpessimizing-move warningLioncash1-2/+2
Prevents pessimization of the move constructor (which thankfully didn't actually happen in practice here, given std::thread isn't copyable).
2020-08-13General: Tidy up clang-format warnings part 2Lioncash2-17/+19
2020-07-21video_core: Allow copy elision to take place where applicableLioncash4-22/+22
Removes const from some variables that are returned from functions, as this allows the move assignment/constructors to execute for them.
2020-07-18Fix style issuesDavid Marcec1-4/+10
2020-07-17Remove duplicate configDavid Marcec1-0/+1
2020-07-17Use conditional varDavid Marcec2-9/+15
2020-07-17async shadersDavid Marcec2-0/+277
2020-07-16decode/other: Implement S2R.LaneIdReinUsesLisp1-2/+1
This maps to host's thread id. - Fixes graphical issues on Paper Mario.
2020-07-13video_core: Rearrange pixel format namesReinUsesLisp1-27/+27
Normalizes pixel format names to match Vulkan names. Previous to this commit pixel formats had no convention, leading to confusion and potential bugs.
2020-06-23shader/half_set: Implement HSET2_IMMReinUsesLisp1-21/+67
Add HSET2_IMM. Due to the complexity of the encoding avoid using BitField unions and read the relevant bits from the code itself. This is less error prone.
2020-06-20decode/image: Implement B10G11R11FMorph1-9/+17
- Used by Kirby Star Allies
2020-06-18memory_util: boost hashes are size_tMerryMage1-2/+2
* boost::hash_value returns a size_t * boost::hash_combine takes a size_t& argument
2020-06-05shader/texture: Join separate image and sampler pairs offlineReinUsesLisp7-69/+146
Games using D3D idioms can join images and samplers when a shader executes, instead of baking them into a combined sampler image. This is also possible on Vulkan. One approach to this solution would be to use separate samplers on Vulkan and leave this unimplemented on OpenGL, but we can't do this because there's no consistent way of determining which constant buffer holds a sampler and which one an image. We could in theory find the first bit and if it's in the TIC area, it's an image; but this falls apart when an image or sampler handle use an index of zero. The used approach is to track for a LOP.OR operation (this is done at an IR level, not at an ISA level), track again the constant buffers used as source and store this pair. Then, outside of shader execution, join the sample and image pair with a bitwise or operation. This approach won't work on games that truly use separate samplers in a meaningful way. For example, pooling textures in a 2D array and determining at runtime what sampler to use. This invalidates OpenGL's disk shader cache :) - Used mostly by D3D ports to Switch
2020-06-05shader/track: Move bindless tracking to a separate functionReinUsesLisp2-25/+39
2020-05-30shader/other: Fix hardcoded value in S2R INVOCATION_INFOReinUsesLisp1-1/+1
Geometry shaders built from Nvidia's compiler check for bits[16:23] to be less than or equal to 0 with VSETP to default to a "safe" value of 0x8000'0000 (safe from hardware's perspective). To avoid hitting this path in the shader, return 0x00ff'0000 from S2R INVOCATION_INFO. This seems to be the maximum number of vertices a geometry shader can emit in a primitive.
2020-05-27shader/other: Implement MEMBAR.CTSReinUsesLisp2-4/+15
This silences an assertion we were hitting and uses workgroup memory barriers when the game requests it.
2020-05-22shader/other: Implement BAR.SYNC 0x0ReinUsesLisp2-0/+6
Trivially implement this particular case of BAR. Unless games use OpenCL or CUDA barriers, we shouldn't hit any other case here.
2020-05-22shader/memory: Implement non-addition operations in REDReinUsesLisp1-2/+1
Trivially implement these instructions. They are used in Astral Chain.
2020-05-22shader/other: Implement thread comparisons (NV_shader_thread_group)ReinUsesLisp2-0/+26
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially implement these using Nvidia's extension on OpenGL or naively stubbing them with the ARB instructions to match. This might cause issues if the host device warp size doesn't match Nvidia's. That said, this is unlikely on proper shaders. Refer to the attached url for more documentation about these flags. https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
2020-05-09shader_ir: Separate float-point comparisons in ordered and unorderedReinUsesLisp4-78/+66
This allows us to use native SPIR-V instructions without having to manually check for NAN.
2020-04-28shader/arithmetic_integer: Fix tracking issue in temporaryReinUsesLisp1-4/+0
This temporary is not needed as we mark Rd.CC + IADD.X as unimplemented. It caused issues when tracking global buffers.
2020-04-26shader/memory_util: Deduplicate codeReinUsesLisp5-24/+127
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as well as shader decoder code. While we are at it, fix a bug in gl_shader_cache where compute shaders had an start offset of a stage shader.
2020-04-26shader/arithmetic_integer: Fix edge case and mark IADD.X Rd.CC as unimplementedReinUsesLisp1-1/+6
IADD.X Rd.CC requires some extra logic that is not currently implemented. Abort when this is hit.
2020-04-26shader/arithmetic_integer: Change IAdd to UAdd to avoid signed overflowReinUsesLisp1-2/+2
Signed integer addition overflow might be undefined behavior. It's free to change operations to UAdd and use unsigned integers to avoid potential bugs.
2020-04-26shader/arithmetic_integer: Implement IADD.XReinUsesLisp1-0/+6
IADD.X takes the carry flag and adds it to the result. This is generally used to emulate 64-bit operations with 32-bit registers.
2020-04-26shader/arithmetic_integer: Implement CC for IADDReinUsesLisp2-3/+21
2020-04-26decode/register_set_predicate: Implement CCReinUsesLisp1-9/+14
P2R CC takes the state of condition codes and puts them into a register. We already have this implemented for PR (predicates). This commit implements CC over that.
2020-04-26decode/register_set_predicate: Use move for shared pointersReinUsesLisp1-16/+17
Avoid atomic counters used by shared pointers.
2020-04-24Revert: shader_decode: Fix LD, LDG when track constant buffer.Fernando Sahmkow1-14/+6
2020-04-23decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bitsReinUsesLisp1-14/+37
The encoding for negation and absolute value was wrong. Extracting is now done manually. Similar instructions having different encodings is the rule, not the exception. To keep sanity and readability I preferred to extract the desired bit manually. This is implemented against nxas: https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L68 That is itself tested against nvdisasm (Nvidia's official disassembler).
2020-04-23shader/texture: Support multiple unknown sampler propertiesReinUsesLisp2-62/+87
This allows deducing some properties from the texture instruction before asking the runtime. By doing this we can handle type mismatches in some instructions from the renderer instead of the shader decoder. Fixes texelFetch issues with games using 2D texture instructions on a 1D sampler.
2020-04-23shader_ir: Turn classes into data structuresReinUsesLisp5-182/+102
2020-04-21shader/arithmetic_integer: Fix LEA_IMM encodingReinUsesLisp1-2/+2
The operand order in LEA_IMM was flipped compared to nvdisasm. Fix that using nxas as reference: https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L122
2020-04-17General: Resolve warnings related to missing declarationsLioncash1-2/+2
2020-04-17decode/memory: Resolve unused variable warningLioncash1-1/+1
Only the first element of the returned pair is ever used.
2020-04-17decode/texture: Resolve unused variable warnings.Lioncash1-5/+7
Some variables aren't used, so we can remove these. Unfortunately, diagnostics are still reported on structured bindings even when annotated with [[maybe_unused]], so we need to unpack the elements that we want to use manually.
2020-04-17decode/texture: Collapse loop down into std::generateLioncash1-3/+1
Same behavior, less code.
2020-04-17decode/texture: Eliminate trivial missing field initializer warningsLioncash1-3/+4
We can just specify the initializers.
2020-04-16decode/shift: Remove unused variable within Shift()Lioncash1-1/+0
Removes a redundant variable that is already satisfied by the IsFull() utility function.
2020-04-16control_flow: Make use of std::move in TryInspectAddress()Lioncash1-3/+3
Eliminates redundant atomic reference count increments and decrements.
2020-04-16decode/image: Fix typo in assert in GetComponentSize()Lioncash1-3/+3
2020-04-16decoder/image: Fix incorrect G24R8 component sizes in GetComponentSize()Lioncash1-2/+2
The components' sizes were mismatched. This corrects that.
2020-04-16track: Eliminate redundant copiesLioncash1-5/+6
Two variables can be references, while two others can be std::moved. Makes for 4 less atomic reference count increments and decrements.
2020-04-16CMakeLists: Specify -Wextra on linux buildsLioncash3-10/+15
Allows reporting more cases where logic errors may exist, such as implicit fallthrough cases, etc. We currently ignore unused parameters, since we currently have many cases where this is intentional (virtual interfaces). While we're at it, we can also tidy up any existing code that causes warnings. This also uncovered a few bugs as well.
2020-04-15shader/arithmetic: Add FCMP_CR variantReinUsesLisp1-1/+2
Adds another variant of FCMP.
2020-04-12shader/video: Partially implement VMNMXReinUsesLisp2-0/+61
Implements the common usages for VMNMX. Inputs with a different size than 32 bits are not supported and sign mismatches aren't supported either. VMNMX works as follows: It grabs Ra and Rb and applies a maximum/minimum on them (this is defined by .MX), having in mind the input sign. This result can then be saturated. After the intermediate result is calculated, it applies another operation on it using Rc. These operations are merges, accumulations or another min/max pass. This instruction allows to implement with a more flexible approach GCN's min3 and max3 instructions (for instance).
2020-04-10shader/texture: Remove type mismatches management from shader decoderReinUsesLisp1-14/+0
Since commit e22816a5bb we handle type mismatches from the CPU. We don't need to hack our shader decoder due to game bugs anymore. Removed in this commit.
2020-04-07address nit.Nguyen Dac Nam1-1/+1
2020-04-07shader/conversion: Implement I2I sign extension, saturation and selectionReinUsesLisp1-13/+100
Reimplements I2I adding sign extension, saturation (clamp source value to the destination), selection and destination sizes that are not 32 bits wide. It doesn't implement CC yet.
2020-04-07Apply suggestions from code reviewNguyen Dac Nam1-9/+9
Co-Authored-By: Rodrigo Locatti <reinuseslisp@airmail.cc>
2020-04-06shader_decode: SULD.D using std::pair instead of out parameternamkazy2-19/+15
2020-04-06shader_decode: SULD.D avoid duplicate code block.namkazy1-39/+2
2020-04-06shader_decode: SULD.D fix conversion error.namkazy1-3/+3
2020-04-06shader_decode: SULD.D implement bits64 and reverse shader ir init method to removed shader stage.namkazy3-42/+101
2020-04-06shader/memory: Implement RED.E.ADDReinUsesLisp2-1/+29
Implements a reduction operation. It's an atomic operation that doesn't return a value. This commit introduces another primitive because some shading languages might have a primitive for reduction operations.
2020-04-06shader/memory: Add "using std::move"ReinUsesLisp1-11/+13
2020-04-06shader/memory: Minor fixes in ATOMReinUsesLisp1-32/+30
2020-04-05silent warning (conversion error)namkazy1-3/+2
2020-04-05shader_decode: SULD.D -> SINT actually same as UNORM.namkazy1-5/+4
2020-04-05shader_decode: SULD.D fix decode SNORM componentnamkazy1-10/+9
2020-04-05clang-formatnamkazy1-2/+2
2020-04-05shader_decode: get sampler descriptor from registry.namkazy1-77/+93
2020-04-05tweaking.namkazy1-3/+3
2020-04-05cleanup unuse paramsnamkazy1-8/+6
2020-04-05cleanup debug code.namkazy1-14/+3
2020-04-05reimplement get component type, uncomment mistaken codenamkazy1-18/+93
2020-04-05remove disable optimizenamkazy1-2/+0
2020-04-05[wip] reimplement SULD.Dnamkazy1-22/+229
2020-04-05add shader stage when init shader irnamkazy2-5/+7
2020-04-05clang-fixNguyen Dac Nam1-1/+1
2020-04-05shader: image - import PredConditionNguyen Dac Nam1-0/+1
2020-04-05shader: SULD.D bits32 implement more complexer method.Nguyen Dac Nam1-4/+28
2020-04-05shader: SULD.D import StoreTypeNguyen Dac Nam1-0/+1
2020-04-05shader: implement SULD.D bits32Nguyen Dac Nam1-11/+27
2020-04-04shader/other: Add error message for some S2R registersReinUsesLisp1-0/+6
2020-04-04shader_bytecode: Rename MOV_SYS to S2RReinUsesLisp1-3/+3
2020-04-04shader_ir: Add error message for EXIT.FCSM_TRReinUsesLisp1-0/+3
2020-04-02shader/memory: Silence no return value warningReinUsesLisp1-0/+3
Silences a warning about control paths not all returning a value.
2020-04-02shader_decompiler: Remove FragCoord.w hack and change IPA implementationReinUsesLisp1-15/+21
Credits go to gdkchan and Ryujinx. The pull request used for this can be found here: https://github.com/Ryujinx/Ryujinx/pull/1082 yuzu was already using the header for interpolation, but it was missing the FragCoord.w multiplication described in the linked pull request. This commit finally removes the FragCoord.w == 1.0f hack from the shader decompiler. While we are at it, this commit renames some enumerations to match Nvidia's documentation (linked below) and fixes component declaration order in the shader program header (z and w were swapped). https://github.com/NVIDIA/open-gpu-doc/blob/master/Shader-Program-Header/Shader-Program-Header.html
2020-03-31clang-formatNguyen Dac Nam1-2/+1
2020-03-31shader_decode: fix by suggestionNguyen Dac Nam1-27/+22
2020-03-30clang-formatnamkazy1-3/+3
2020-03-30shader_decode: ATOM/ATOMS: add function to avoid code repetitionnamkazy2-70/+53
2020-03-30shader_decode: implement ATOM operation for S32 and U32Nguyen Dac Nam1-6/+39
2020-03-30clang-formatnamkazy1-3/+3
2020-03-30shader_decode: implement ATOMS instr partial.Nguyen Dac Nam1-10/+42
2020-03-30shader: node - update correct commentNguyen Dac Nam1-15/+15
2020-03-30shader_decode: add Atomic op for common usageNguyen Dac Nam1-1/+15
2020-03-28shader/lea: Simplify generated LEA codeReinUsesLisp1-3/+2
2020-03-27shader/lea: Fix op_a and op_b usagesReinUsesLisp1-2/+2
They were swapped.
2020-03-27shader/lea: Remove const and use move when possibleReinUsesLisp1-11/+5
2020-03-26shader/conversion: Fix F2F rounding operations with different sizesReinUsesLisp1-5/+10
Rounding operations only matter when the conversion size of source and destination is the same, i.e. .F16.F16, .F32.F32 and .F64.F64. When there is a mismatch (.F16.F32), these bits are used for IEEE rounding, we don't emulate this because GLSL and SPIR-V don't support configuring it per operation.
2020-03-23xmad: fix clang build errormakigumo1-4/+5
2020-03-16shader/shader_ir: Track usage in input attribute and of legacy varyingsReinUsesLisp2-34/+58
2020-03-16shader/shader_ir: Fix clip distance usage storesReinUsesLisp1-2/+1
2020-03-16shader/shader_ir: Change declare output attribute to a switchReinUsesLisp1-9/+9
2020-03-14clang-formatNguyen Dac Nam1-2/+1
2020-03-14nitNguyen Dac Nam1-1/+1
2020-03-13shader/transform_feedback: Expose buffer strideReinUsesLisp2-0/+2
2020-03-13shader/transform_feedback: Add host API friendly TFB builderReinUsesLisp2-0/+136
2020-03-13nit & remove some optional paramNguyen Dac Nam1-10/+11
2020-03-13shader_decode: implement XMAD mode CSfuNguyen Dac Nam1-9/+41
2020-03-13clang-formatNguyen Dac Nam1-4/+8
2020-03-13Apply suggestions from code reviewNguyen Dac Nam1-5/+5
Co-Authored-By: Mat M. <mathew1800@gmail.com>
2020-03-13shader_decode: BFE add ref of reverse parallel method.Nguyen Dac Nam1-0/+3
2020-03-13shader_decode: implement BREV on BFENguyen Dac Nam1-6/+25
Implement reverse parallel follow: https://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel
2020-03-13node_helper: add IBitfieldExtract caseNguyen Dac Nam1-0/+2
2020-03-13shader_decode: Reimplement BFE instructionsNguyen Dac Nam1-25/+27
2020-03-09engines/maxwell_3d: Add TFB registers and store them in shader registryReinUsesLisp2-3/+12
2020-03-09shader/registry: Address feedbackReinUsesLisp2-12/+17
2020-03-09shader/registry: Cache tessellation stateReinUsesLisp2-2/+9
2020-03-09shader/registry: Store graphics and compute metadataReinUsesLisp3-36/+81
Store information GLSL forces us to provide but it's dynamic state in hardware (workgroup sizes, primitive topology, shared memory size).
2020-03-09video_core: Rename "const buffer locker" to "registry"ReinUsesLisp9-50/+54
2020-03-09gl_shader_cache: Rework shader cache and remove post-specializationsReinUsesLisp4-28/+17
Instead of pre-specializing shaders and then post-specializing them, drop the later and only "specialize" the shader while decoding it.
2020-02-29nit: move comment to right place.Nguyen Dac Nam1-2/+2
2020-02-28shader_decode: Fix LD, LDG when track constant bufferNguyen Dac Nam1-4/+12
2020-02-28shader_decode: keep it search on all codeNguyen Dac Nam1-4/+12
It fixed opcode LD, LDG on Pokemon Sword that can't find the constant buffer. Not sure if it helps any on visual.
2020-02-27shader: FMUL switch to using LUT (#3441)Nguyen Dac Nam1-19/+14
* shader: add FmulPostFactor LUT table * shader: FMUL apply LUT * Update src/video_core/engines/shader_bytecode.h Co-Authored-By: Mat M. <mathew1800@gmail.com> * nit: mistype * clang-format & add missing import * shader: remove post factor LUT. * shader: move post factor LUT to function and fix incorrect order. * clang-format * shader: FMUL: add static to post factor LUT * nit: typo Co-authored-by: Mat M. <mathew1800@gmail.com>
2020-02-24shader: Simplify indexed sampler usagesReinUsesLisp1-1/+1
2020-02-21shader/texture: Fix illegal 3D texture assertReinUsesLisp1-1/+1
Fix typo in the illegal 3D texture assert logic. We care about catching arrayed 3D textures or 3D shadow textures, not regular 3D textures.
2020-02-21nit: add const to where it need.Nguyen Dac Nam1-14/+14
2020-02-21shader: implement LOP3 fast replace for old functionNguyen Dac Nam1-36/+58
ref: https://devtalk.nvidia.com/default/topic/1070081/cuda-programming-and-performance/reverse-lut-for-lop3-lut/
2020-02-19shader_conversion: I2F : add Assert for case src_size is ShortNguyen Dac Nam1-0/+3
2020-02-19fix warningNguyen Dac Nam1-1/+1
2020-02-19clang-format fixNguyen Dac Nam1-1/+1
2020-02-19shader_conversion: add conversion I2F for ShortNguyen Dac Nam1-9/+6
2020-02-15shader/texture: Allow 2D shadow arrays and simplify codeReinUsesLisp1-43/+28
Shadow sampler 2D arrays are supported on OpenGL, so there's no reason to forbid these. Enable textureLod usage on these. Minor style changes.
2020-02-05shader/decode: Fix constant buffer offsetsReinUsesLisp2-3/+3
Some instances were using cbuf34.offset instead of cbuf34.GetOffset(). This returned the an invalid offset. Address those instances and rename offset to "shifted_offset" to avoid future bugs.
2020-02-02shader: Remove curly braces initializers on shared pointersReinUsesLisp5-12/+12
2020-02-02shader/shift: Implement SHIFT_RIGHT_{IMM,R}ReinUsesLisp1-26/+58
Shifts a pair of registers to the right and returns the low register.
2020-02-02shader/shift: Implement SHF_LEFT_{IMM,R}ReinUsesLisp1-10/+69
Shifts a pair of registers to the left and returns the high register.
2020-01-29shader/other: Fix skips for SYNC and BRKReinUsesLisp1-2/+2
2020-01-29shader/other: Stub S2R LaneIdReinUsesLisp1-1/+4
2020-01-27shader/bfi: Implement register-constant buffer variantReinUsesLisp1-2/+5
It's the same as the variant that was implemented, but it takes the operands from another source.
2020-01-27shader/arithmetic: Implement FCMPReinUsesLisp1-1/+10
Compares the third operand with zero, then selects between the first and second.
2020-01-26shader/memory: Implement ATOM.ADDReinUsesLisp2-2/+22
ATOM operates atomically on global memory. For now only add ATOM.ADD since that's what was found in commercial games. This asserts for ATOM.ADD.S32 (handling the others as unimplemented), although ATOM.ADD.U32 shouldn't be any different. This change forces us to change the default type on SPIR-V storage buffers from float to uint. We could also alias the buffers, but it's simpler for now to just use uint. While we are at it, abstract the code to avoid repetition.
2020-01-25Shader_IR: Address feedback.Fernando Sahmkow7-31/+33
2020-01-25shader/memory: Implement STL.S16 and STS.S16ReinUsesLisp1-3/+10
2020-01-25shader/memory: Implement unaligned LDL.S16 and LDS.S16ReinUsesLisp1-5/+3
2020-01-25shader/memory: Move unaligned load/store to functionsReinUsesLisp1-18/+27
2020-01-25shader/memory: Implement LDL.S16 and LDS.S16ReinUsesLisp1-12/+23
2020-01-24Shader_IR: Change name of TrackSampler function so it does not confuse with the type.Fernando Sahmkow3-7/+10
2020-01-24Shader_IR: Corrections, styling and extras.Fernando Sahmkow1-2/+4
2020-01-24Shader_IR: Propagate bindless index into the GL compiler.Fernando Sahmkow4-23/+53
2020-01-24Shader_IR: Implement Injectable Custom Variables to the IR.Fernando Sahmkow3-1/+34
2020-01-24Shader_IR: deduce size of indexed samplersFernando Sahmkow4-8/+60
2020-01-24Shader_IR: Setup Indexed Samplers on the IRFernando Sahmkow1-20/+46
2020-01-24Shader_IR: Implement initial code for tracking indexed samplers.Fernando Sahmkow4-0/+139
2020-01-24Shader_IR: Address FeedbackFernando Sahmkow2-25/+25
2020-01-24Shader_IR: Allow constant access of guest driver.Fernando Sahmkow1-1/+1
2020-01-24Shader_IR: Address FeedbackFernando Sahmkow2-17/+24
2020-01-24Shader_IR: Store Bound buffer on Shader UsageFernando Sahmkow2-0/+29
2020-01-24GPU: Implement guest driver profile and deduce texture handler sizes.Fernando Sahmkow4-0/+31
2020-01-16shader/memory: Implement ATOMS.ADD.U32ReinUsesLisp2-0/+21
2020-01-14control_flow: Silence -Wreorder warning for CFGRebuildStateLioncash1-1/+1
Organizes the initializer list in the same order that the variables would actually be initialized in.
2020-01-09shader_ir/texture: Simplify AOFFI codeReinUsesLisp1-10/+6
2020-01-09shader_ir/memory: Implement u16 and u8 for STG and LDGReinUsesLisp2-34/+52
Using the same technique we used for u8 on LDG, implement u16. In the case of STG, load memory and insert the value we want to set into it with bitfieldInsert. Then set that value.
2020-01-04Shader_IR: Address FeedbackFernando Sahmkow3-11/+11
2020-01-04Shader_IR: Implement TXD Array.Fernando Sahmkow1-5/+12
This commit extends the compilation of TXD to support array samplers on TXD.
2019-12-30Shader_IR: add the ability to amend code in the shader ir.Fernando Sahmkow3-3/+39
This commit introduces a mechanism by which shader IR code can be amended and extended. This useful for track algorithms where certain information can derived from before the track such as indexes to array samplers.
2019-12-20shader/p2r: Implement P2R PrReinUsesLisp1-1/+15
P2R dumps predicate or condition codes state to a register. This is useful for unit testing.
2019-12-20shader/r2p: Refactor P2R to support P2RReinUsesLisp1-16/+30
2019-12-18shader/memory: Implement LDG.U8 and unaligned U8 loadsReinUsesLisp1-6/+32
LDG can load single bytes instead of full integers or packs of integers. These have the advantage of loading bytes that are not aligned to 4 bytes. To emulate these this commit gets the byte being referenced (by doing "address & 3" and then using that to extract the byte from the loaded integer: result = bitfieldExtract(loaded_integer, (address % 4) * 8, 8)
2019-12-18shader/conversion: Implement byte selector in I2FReinUsesLisp1-2/+13
I2F's byte selector is used to choose what bytes to convert to float. e.g. if the input is 0xaabbccdd and the selector is ".B3" it will convert 0xaa. The default (when it's not shown in nvdisasm) is ".B0", in that example the default would convert 0xdd to float.
2019-12-18shader/texture: Properly shrink unused entries in size mismatchesReinUsesLisp1-4/+9
When a image format mismatches we were inserting zeroes to the texture itself. This was not handling cases were the mismatch uses less coordinates than the guest shader code. Address that by resizing the vector.
2019-12-16shader/texture: Implement TLD4.PTPReinUsesLisp3-19/+61
2019-12-16shader/texture: Enable arrayed TLD4ReinUsesLisp1-1/+0
2019-12-16shader/texture: Implement AOFFI for TLD4SReinUsesLisp1-13/+18
2019-12-16shader/texture: Remove unnecesary parenthesisReinUsesLisp1-2/+2
2019-12-12Shader_IR: Correct TLD4S Depth Compare.Fernando Sahmkow1-5/+12
2019-12-12Shader_Ir: Correct TLD4S encoding and implement f16 flag.Fernando Sahmkow2-10/+13
2019-12-12Shader_Ir: default failed tracks on bindless samplers to null values.Fernando Sahmkow2-24/+77
2019-12-10shader: Implement MEMBAR.GLReinUsesLisp2-0/+8
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
2019-12-10shader_ir/other: Implement S2R InvocationIdReinUsesLisp2-0/+3
2019-12-10shader: Keep track of shaders using warp instructionsReinUsesLisp2-0/+8
2019-12-10shader_ir/memory: Implement patch storesReinUsesLisp3-19/+36
2019-11-27video_core/const_buffer_locker: Make use of std::tie in HasEqualKeys()Lioncash1-2/+3
Tidies it up a little bit visually.
2019-11-27video_core/const_buffer_locker: Remove unused includesLioncash2-2/+2
2019-11-27video_core/const_buffer_locker: Remove #pragma once from cpp fileLioncash1-2/+0
Silences a compiler warning.
2019-11-23video_core: Unify ProgramType and ShaderStage into ShaderTypeReinUsesLisp2-1/+3
2019-11-23shader/texture: Handle TLDS texture type mismatchesReinUsesLisp1-1/+10
Some games like "Fire Emblem: Three Houses" bind 2D textures to offsets used by instructions of 1D textures. To handle the discrepancy this commit uses the the texture type from the binding and modifies the emitted code IR to build a valid backend expression. E.g.: Bound texture is 2D and instruction is 1D, the emitted IR samples a 2D texture in the coordinate ivec2(X, 0).
2019-11-23shader/texture: Deduce texture buffers from lockerReinUsesLisp3-69/+60
Instead of specializing shaders to separate texture buffers from 1D textures, use the locker to deduce them while they are being decoded.
2019-11-20shader/other: Reduce DEPBAR log severityReinUsesLisp1-1/+1
While DEPBAR is stubbed it doesn't change anything from our end. Shading languages handle what this instruction does implicitly. We are not getting anything out fo this log except noise.
2019-11-18Shader_IR: Address FeedbackFernando Sahmkow2-10/+8
2019-11-14Shader_IR: Implement TXD instruction.Fernando Sahmkow2-7/+51
2019-11-14Shader_IR: Implement FLO instruction.Fernando Sahmkow2-0/+20
2019-11-08video_core: Silence implicit conversion warningsReinUsesLisp2-5/+5
2019-11-08shader_ir/warp: Implement FSWZADDReinUsesLisp2-0/+10
2019-11-08gl_shader_decompiler: Reimplement shuffles with platform agnostic intrinsicsReinUsesLisp2-42/+37
2019-11-07shader/control_flow: Specify constness on caller lambdasRodrigo Locatti1-11/+12
Update src/video_core/shader/control_flow.cpp Co-Authored-By: Mat M. <mathew1800@gmail.com> Update src/video_core/shader/control_flow.cpp Co-Authored-By: Mat M. <mathew1800@gmail.com> Update src/video_core/shader/control_flow.cpp Co-Authored-By: Mat M. <mathew1800@gmail.com> Update src/video_core/shader/control_flow.cpp Co-Authored-By: Mat M. <mathew1800@gmail.com> Update src/video_core/shader/control_flow.cpp Co-Authored-By: Mat M. <mathew1800@gmail.com> Update src/video_core/shader/control_flow.cpp Co-Authored-By: Mat M. <mathew1800@gmail.com>
2019-11-07shader/control_flow: Use callable template instead of std::functionReinUsesLisp1-6/+5
2019-11-07shader/control_flow: Abstract repeated code chunks in BRX trackingReinUsesLisp1-93/+101
Remove copied and pasted for cycles into a common templated function.
2019-11-07shader/control_flow: Silence Intellisense cast warningsReinUsesLisp1-1/+1
2019-11-07shader/control_flow: Remove brace initializer in std containersReinUsesLisp1-9/+9
These containers have a default constructor.
2019-11-07shader/decode: Reduce severity of arithmetic rounding warningsReinUsesLisp6-15/+17
2019-11-07shader/arithmetic: Reduce RRO stub severityReinUsesLisp1-1/+2
2019-11-07shader/texture: Remove NODEP warningsReinUsesLisp1-35/+0
These warnings don't offer meaningful information while decoding shaders. Remove them.
2019-10-31Shader_IR: Fix regression on TLD4Fernando Sahmkow2-5/+4
Originally on the last commit I thought TLD4 acted the same as TLD4S and didn't have a mask. It actually does have a component mask. This commit corrects that.
2019-10-30Shader_IR: Fix TLD4 and add Bindless Variant.Fernando Sahmkow2-10/+26
This commit fixes an issue where not all 4 results of tld4 were being written, the color component was defaulted to red, among other things. It also implements the bindless variant.
2019-10-30shader/node: Unpack bindless texture encodingReinUsesLisp4-122/+100
Bindless textures were using u64 to pack the buffer and offset from where they come from. Drop this in favor of separated entries in the struct. Remove the usage of std::set in favor of std::list (it's not std::vector to avoid reference invalidations) for samplers and images.
2019-10-26Shader_IR: Address Feedback.Fernando Sahmkow7-52/+59
2019-10-25gl_shader_cache: Implement locker variants invalidationReinUsesLisp2-12/+19
2019-10-25gl_shader_disk_cache: Store and load fast BRXReinUsesLisp1-2/+2
2019-10-25const_buffer_locker: Minor style changesReinUsesLisp2-152/+76
2019-10-25gl_shader_decompiler: Move entries to a separate functionReinUsesLisp7-32/+29
2019-10-25Shader_IR: Implement Fast BRX and allow multi-branches in the CFG.Fernando Sahmkow1-1/+1
2019-10-25Shader_IR: Correct typo in Consistent method.Fernando Sahmkow2-2/+2
2019-10-25Shader_IR: allow lookup of texture samplers within the shader_ir for instructions that don't provide itFernando Sahmkow4-42/+212
2019-10-25Shader_IR: Implement Fast BRX and allow multi-branches in the CFG.Fernando Sahmkow5-130/+246
2019-10-25Shader_Cache: setup connection of ConstBufferLockerFernando Sahmkow5-12/+22
2019-10-25VideoCore: Unify const buffer accessing along engines and provide ConstBufferLocker class to shaders.Fernando Sahmkow3-0/+123
2019-10-25Shader_IR: Implement BRX tracking.Fernando Sahmkow1-0/+113
2019-10-24shader_ir: Use std::array with pair instead of unordered_mapLioncash1-53/+67
Given the overall size of the maps are very small, we can use arrays of pairs here instead of always heap allocating a new map every time the functions are called. Given the small size of the maps, the difference in container lookups are negligible, especially given the entries are already sorted.
2019-10-24video_core/shader: Resolve instances of variable shadowingLioncash6-11/+12
Silences a few -Wshadow warnings.
2019-10-22Shader_Ir: Fix TLD4S from using a component mask.Fernando Sahmkow2-5/+5
TLD4S always outputs 4 values, the previous code checked a component mask and omitted those values that weren't part of it. This commit corrects that and makes sure all 4 values are set.
2019-10-22shader_ir/memory: Ignore global memory when tracking failsReinUsesLisp2-18/+26
Ignore global memory operations instead of invoking undefined behaviour when constant buffer tracking fails and we are blasting through asserts, ignore the operation. In the case of LDG this means filling the destination registers with zeroes; for STG this means ignore the instruction as a whole. The default behaviour is still to abort execution on failure.
2019-10-18video_core/shader/ast: Make ShowCurrentState() and SanityCheck() const member functionsLioncash2-5/+5
These can also trivially be made const member functions, with the addition of a few consts.
2019-10-18video_core/shader/ast: Make ASTManager::Print a const member functionLioncash2-3/+3
Given all visiting functions never modify the nodes, we can trivially make this a const member function.
2019-10-18video_core/shader/ast: Make ExprPrinter members privateLioncash1-1/+2
This member already has an accessor, so there's no need for it to be public.
2019-10-18video_core/shader/ast: Make Indent() return a string_viewLioncash1-14/+24
The returned string is simply a substring of our constexpr tabs string_view, so we can just use a string_view here as well, since the original string_view is guaranteed to always exist. Now the function is fully non-allocating.
2019-10-18video_core/shader/ast: Make Indent() privateLioncash1-9/+9
It's never used outside of this class, so we can narrow its scope down.
2019-10-18video_core/shader/ast: Rename Ident() to Indent()Lioncash1-13/+13
This can be confusing, given "ident" is generally used as a shorthand for "identifier".
2019-10-18video_core/shader/ast: Make use of fmt where applicableLioncash1-14/+14
Makes a few strings nicer to read and also eliminates a bit of string churn with operator+.
2019-10-16control_flow: Silence truncation warningsLioncash2-4/+4
This can be trivially fixed by making the input size a size_t. CFGRebuildState's constructor parameter is already a std::size_t, so this just makes the size type fully conform with it.
2019-10-16shader/node: std::move Meta instance within OperationNode constructorLioncash1-1/+1
Allows usages of the constructor to avoid an unnecessary copy.
2019-10-07shader/half_set_predicate: Fix HSETP2 for constant buffersReinUsesLisp1-0/+2
HSETP2 when used with a constant buffer parses the second operand type as F32. This is not configurable.
2019-10-07shader/half_set_predicate: Reduce DEBUG_ASSERT to LOG_DEBUGReinUsesLisp1-1/+2
2019-10-05video_core/control_flow: Eliminate variable shadowing warningsLioncash1-6/+6
2019-10-05video_core/control_flow: Eliminate pessimizing movesLioncash1-5/+8
These can inhibit the ability of a compiler to perform RVO.
2019-10-05video_core/ast: Unindent most of IsFullyDecompiled() by one levelLioncash1-12/+12
2019-10-05video_core/ast: Make ShowCurrentState() take a string_view instead of std::stringLioncash2-2/+2
Allows the function to be non-allocating in terms of the output string.
2019-10-05video_core/ast: Eliminate variable shadowing warningsLioncash1-3/+3
2019-10-05video_core/ast: Replace std::string with a constexpr std::string_viewLioncash1-3/+1
Same behavior, but without the need to heap allocate
2019-10-05video_core/ast: Default the move constructor and assignment operatorLioncash2-26/+2
This is behaviorally equivalent and also fixes a bug where some members weren't being moved over.
2019-10-05video_core/{ast, expr}: Organize forward declarationLioncash2-10/+10
Keeps them alphabetically sorted for readability.
2019-10-05video_core/expr: Supply operator!= along with operator==Lioncash2-1/+32
Provides logical symmetry to the interface.
2019-10-05video_core/{ast, expr}: Use std::move where applicableLioncash4-45/+47
Avoids unnecessary atomic reference count increments and decrements.
2019-10-05video_core/ast: Supply const accessors for data where applicableLioncash2-37/+41
Provides const equivalents of data accessors for use within const contexts.
2019-10-05Shader_ir: Address feedbackFernando Sahmkow4-50/+14
2019-10-05Shader_Ir: Address Feedback and clang format.Fernando Sahmkow3-43/+50
2019-10-05Shader_IR: clean up AST handling and add documentation.Fernando Sahmkow1-2/+6
2019-10-05Shader_IR: Correct OutwardMoves for IfsFernando Sahmkow1-22/+11
2019-10-05Shader_IR: corrections and clang-formatFernando Sahmkow2-70/+64
2019-10-05Shader_IR: allow else derivation to be optional.Fernando Sahmkow6-8/+14
2019-10-05vk_shader_compiler: Implement the decompiler in SPIR-VFernando Sahmkow2-1/+25
2019-10-05Shader_IR: mark labels as unused for partial decompile.Fernando Sahmkow2-3/+9
2019-10-05Shader_Ir: Refactor Decompilation process and allow multiple decompilation modes.Fernando Sahmkow10-74/+307
2019-10-05gl_shader_decompiler: Implement AST decompilingFernando Sahmkow10-34/+116
2019-10-05shader_ir: Declare Manager and pass it to appropiate programs.Fernando Sahmkow7-104/+214
2019-10-05shader_ir: Corrections to outward movements and misc stuffsFernando Sahmkow5-58/+305
2019-10-05shader_ir: Add basic goto eliminationFernando Sahmkow2-38/+484
2019-10-05shader_ir: Initial Decompile SetupFernando Sahmkow5-5/+507
2019-09-21gl_shader_decompiler: Use uint for images and fix SUATOMReinUsesLisp3-69/+52
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as these require a distinction between U32 and S32. These have to be implemented with imageCompSwap loop.
2019-09-21shader/image: Implement SULD and remove irrelevant codeReinUsesLisp2-25/+52
* Implement SULD as float. * Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
2019-09-21Shader_IR: ICMP corrections and fixesFernando Sahmkow1-6/+9
2019-09-20Shader_IR: Implement ICMP.Fernando Sahmkow1-0/+26
2019-09-19VideoCore: Corrections to the MME Inliner and removal of hacky instance management.Fernando Sahmkow2-0/+22
2019-09-17shader_ir/warp: Implement SHFLReinUsesLisp2-0/+57
2019-09-11shader/image: Implement SUATOM and fix SUSTReinUsesLisp3-37/+122
2019-09-06gl_shader_decompiler: Keep track of written images and mark them as modifiedReinUsesLisp3-42/+54
2019-09-06kepler_compute: Implement texture queriesReinUsesLisp1-0/+4
2019-09-05shader_ir: Implement LD_SReinUsesLisp1-10/+13
Loads from shared memory.
2019-09-05shader_ir: Implement ST_SReinUsesLisp4-11/+45
This instruction writes to a memory buffer shared with threads within the same work group. It is known as "shared" memory in GLSL.
2019-09-04shader/shift: Implement SHR wrapped and clamped variantsReinUsesLisp1-6/+13
Nvidia defaults to wrapped shifts, but this is undefined behaviour on OpenGL's spec. Explicitly mask/clamp according to what the guest shader requires.
2019-09-04half_set_predicate: Fix predicate assignmentsReinUsesLisp1-10/+9
2019-08-30video_core: Silent miscellaneous warnings (#2820)Rodrigo Locatti5-5/+0
* texture_cache/surface_params: Remove unused local variable * rasterizer_interface: Add missing documentation commentary * maxwell_dma: Remove unused rasterizer reference * video_core/gpu: Sort member declaration order to silent -Wreorder warning * fermi_2d: Remove unused MemoryManager reference * video_core: Silent unused variable warnings * buffer_cache: Silent -Wreorder warnings * kepler_memory: Remove unused MemoryManager reference * gl_texture_cache: Add missing override * buffer_cache: Add missing include * shader/decode: Remove unused variables
2019-08-28shader_ir/conversion: Split int and float selector and implement F2F H1ReinUsesLisp1-18/+16
2019-08-28shader_ir/conversion: Implement F2I F16 Ra.H1ReinUsesLisp1-4/+16
2019-08-28float_set_predicate: Add missing negation bit for the second operandReinUsesLisp1-4/+5
2019-08-21shader_ir: Implement VOTEReinUsesLisp4-0/+62
Implement VOTE using Nvidia's intrinsics. Documentation about these can be found here https://developer.nvidia.com/reading-between-threads-shader-intrinsics Instead of using portable ARB instructions I opted to use Nvidia intrinsics because these are the closest we have to how Tegra X1 hardware renders. To stub VOTE on non-Nvidia drivers (including nouveau) this commit simulates a GPU with a warp size of one, returning what is meaningful for the instruction being emulated: * anyThreadNV(value) -> value * allThreadsNV(value) -> value * allThreadsEqualNV(value) -> true ballotARB, also known as "uint64_t(activeThreadsNV())", emits VOTE.ANY Rd, PT, PT; on nouveau's compiler. This doesn't match exactly to Nvidia's code VOTE.ALL Rd, PT, PT; Which is emulated with activeThreadsNV() by this commit. In theory this shouldn't really matter since .ANY, .ALL and .EQ affect the predicates (set to PT on those cases) and not the registers.
2019-08-04shader_ir: Implement NOPReinUsesLisp1-0/+6
2019-08-04half_set_predicate: Fix HSETP2_C constant buffer offsetReinUsesLisp1-1/+1
2019-07-26decode/half_set_predicate: Fix predicatesReinUsesLisp1-3/+3
2019-07-22shader/decode: Implement S2R TicReinUsesLisp3-0/+15
2019-07-20Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.Fernando Sahmkow2-16/+39
This commit takes care of implementing the F16 Variants of the conversion instructions and makes sure conversions are done.
2019-07-20Shader_Ir: Change Debug Asserts for Log WarningsFernando Sahmkow3-10/+17
2019-07-20shader/half_set_predicate: Fix HSETP2 implementationReinUsesLisp2-19/+15
2019-07-20shader/half_set_predicate: Implement missing HSETP2 variantsReinUsesLisp1-13/+29
2019-07-19video_core/control_flow: Provide operator!= for types with operator==Lioncash1-4/+21
Provides operational symmetry for the respective structures.
2019-07-19video_core/control_flow: Prevent sign conversion in TryGetBlock()Lioncash1-1/+1
The return value is a u32, not an s32, so this would result in an implicit signedness conversion.
2019-07-19video_core/control_flow: Remove unnecessary BlockStack copy constructorLioncash1-2/+1
This is the default behavior of the copy constructor, so it doesn't need to be specified. While we're at it we can make the other non-default constructor explicit.
2019-07-19video_core/control_flow: Use std::move where applicableLioncash1-10/+15
Results in less work being done where avoidable.
2019-07-19video_core/control_flow: Use the prefix variant of operator++ for iteratorsLioncash1-2/+2
Same thing, but potentially allows a standard library implementation to pick a more efficient codepath.
2019-07-19video_core/control_flow: Use empty() member function for checking emptinessLioncash1-2/+2
It's what it's there for.
2019-07-19video_core: Resolve -Wreorder warningsLioncash1-1/+1
Ensures that the constructor members are always initialized in the order that they're declared in.
2019-07-19video_core/control_flow: Make program_size for ScanFlow() a std::size_tLioncash2-5/+4
Prevents a truncation warning from occurring with MSVC. Also the internal data structures already treat it as a size_t, so this is just a discrepancy in the interface.
2019-07-19video_core/control_flow: Place all internally linked types/functions within an anonymous namespaceLioncash1-1/+2
Previously, quite a few functions were being linked with external linkage.
2019-07-19video_core/shader/decode: Prevent sign-conversion warningsLioncash1-2/+2
Makes it explicit that the conversions here are intentional.
2019-07-18Shader_Ir: correct clang formatFernando Sahmkow1-2/+2
2019-07-18Shader_Ir: Downgrade precision and rounding asserts to debug asserts.Fernando Sahmkow5-10/+10
This commit reduces the sevirity of asserts for FP precision and rounding as this are well known and have little to no consequences in gpu's accuracy.
2019-07-17shader_ir: std::move Node instance where applicableLioncash4-60/+67
These are std::shared_ptr instances underneath the hood, which means copying them isn't as cheap as a regular pointer. Particularly so on weakly-ordered systems. This avoids atomic reference count increments and decrements where they aren't necessary for the core set of operations.
2019-07-17shader_ir: Rename Get/SetTemporal to Get/SetTemporaryLioncash5-36/+36
This is more accurate in terms of describing what the functions are actually doing. Temporal relates to time, not the setting of a temporary itself.
2019-07-17shader_ir: Remove unused includesLioncash1-3/+0
Removes unnecessary header dependencies.
2019-07-16Shader_Ir: Correct tracking to track from right to leftFernando Sahmkow1-2/+2
2019-07-16shader/decode/other: Correct branch indirect argument within BRA handlingLioncash1-1/+1
This appears to have been a copy/paste error introduced within 8a6fc529a968e007f01464abadd32f9b5eb0a26c
2019-07-15shader: Allow tracking of indirect buffers without variable offsetReinUsesLisp6-35/+36
While changing this code, simplify tracking code to allow returning the base address node, this way callers don't have to manually rebuild it on each invocation.
2019-07-09shader_ir: Add comments on missing instruction.Fernando Sahmkow2-2/+9
Also shows Nvidia's address space on comments.
2019-07-09shader_ir: limit explorastion to best known program size.Fernando Sahmkow1-1/+1
2019-07-09control_flow: Correct block breaking algorithm.Fernando Sahmkow1-17/+17
2019-07-09control_flow: Assert shaders bigger than limit.Fernando Sahmkow1-0/+2
2019-07-09control_flow: Address feedback.Fernando Sahmkow1-89/+37
2019-07-09shader_ir: Correct parsing of scheduling instructions and correct sizingFernando Sahmkow2-13/+30
2019-07-09shader_ir: Correct max sizingFernando Sahmkow2-2/+2
2019-07-09shader_ir: Remove unnecessary constructors and use optional for ScanFlow resultFernando Sahmkow3-28/+17
2019-07-09shader_ir: Corrections, documenting and asserting control_flowFernando Sahmkow3-52/+54
2019-07-09shader_ir: Unify blocks in decompiled shaders.Fernando Sahmkow6-54/+79
2019-07-09shader_ir: Decompile Flow StackFernando Sahmkow4-11/+206
2019-07-09shader_ir: propagate shader size to the IRFernando Sahmkow3-6/+7
2019-07-09shader_ir: Implement BRX & BRA.CCFernando Sahmkow3-4/+42
2019-07-09shader_ir: Remove the old scanner.Fernando Sahmkow2-77/+0
2019-07-09shader_ir: Implement a new shader scannerFernando Sahmkow3-16/+471
2019-07-08gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shadersReinUsesLisp2-0/+31
This commit implements gl_ViewportIndex and gl_Layer in vertex and geometry shaders. In the case it's used in a vertex shader, it requires ARB_shader_viewport_layer_array. This extension is available on AMD and Nvidia devices (mesa and proprietary drivers), but not available on Intel on any platform. At the moment of writing this description I don't know if this is a hardware limitation or a driver limitation. In the case that ARB_shader_viewport_layer_array is not available, writes to these registers on a vertex shader are ignored, with the appropriate logging.
2019-07-07Delete decode_integer_set.cppTobias1-0/+0
2019-07-07shader/texture: Add F16 support for TLDSReinUsesLisp1-1/+7
2019-06-24decode/texture: Address feedbackReinUsesLisp1-0/+1
2019-06-21texture_cache: Style and CorrectionsFernando Sahmkow1-1/+2
2019-06-21shader_ir: Fix image copy rebase issuesFernando Sahmkow1-2/+7
2019-06-21shader: Implement bindless imagesReinUsesLisp3-2/+40
2019-06-21shader: Decode SUST and implement backing image functionalityReinUsesLisp4-1/+140
2019-06-21shader: Implement texture buffersReinUsesLisp2-0/+46
2019-06-07shader: Split SSY and PBK stackReinUsesLisp2-11/+14
Hardware testing revealed that SSY and PBK push to a different stack, allowing code like this: SSY label1; PBK label2; SYNC; label1: PBK; label2: EXIT;
2019-06-07shader/node: Minor changesReinUsesLisp1-50/+54
Reflect std::shared_ptr nature of Node on initializers and remove constant members in nodes. Add some commentaries.
2019-06-07shader: Move Node declarations out of the shader IR headerReinUsesLisp3-493/+517
Analysis passes do not have a good reason to depend on shader_ir.h to work on top of nodes. This splits node-related declarations to their own file and leaves the IR in shader_ir.h
2019-06-06shader: Use shared_ptr to store nodes and move initialization to fileReinUsesLisp32-192/+238
Instead of having a vector of unique_ptr stored in a vector and returning star pointers to this, use shared_ptr. While changing initialization code, move it to a separate file when possible. This is a first step to allow code analysis and node generation beyond the ShaderIR class.
2019-05-23shader/shader_ir: Make Comment() take a std::string by valueLioncash2-3/+3
This allows for forming comment nodes without making unnecessary copies of the std::string instance. e.g. previously: Comment(fmt::format("Base address is c[0x{:x}][0x{:x}]", cbuf->GetIndex(), cbuf_offset)); Would result in a copy of the string being created, as CommentNode() takes a std::string by value (a const ref passed to a value parameter results in a copy). Now, only one instance of the string is ever moved around. (fmt::format returns a std::string, and since it's returned from a function by value, this is a prvalue (which can be treated like an rvalue), so it's moved into Comment's string parameter), we then move it into the CommentNode constructor, which then moves the string into its member variable).
2019-05-23shader/decode/*: Add missing newline to files lacking themLioncash18-18/+18
Keeps the shader code file endings consistent.
2019-05-23shader/decode/*: Eliminate indirect inclusionsLioncash6-1/+5
Amends cases where we were using things that were indirectly being satisfied through other headers. This way, if those headers change and eliminate dependencies on other headers in the future, we don't have cascading compilation errors.
2019-05-22shader/decode/memory: Remove left in debug pragmaLioncash1-2/+0
2019-05-21shader/memory: Implement ST (generic memory)ReinUsesLisp1-21/+35
2019-05-21shader/memory: Implement LD (generic memory)ReinUsesLisp2-11/+23
2019-05-20shader: Implement S2R Tid{XYZ} and CtaId{XYZ}ReinUsesLisp2-15/+35
2019-05-19shader/shader_ir: Remove unnecessary inline specifiersLioncash1-2/+2
constexpr internally links by default, so the inline specifier is unnecessary.
2019-05-19shader/shader_ir: Simplify constructors for OperationNodeLioncash1-15/+6
Many of these constructors don't even need to be templated. The only ones that need to be templated are the ones that actually make use of the parameter pack. Even then, since std::vector accepts an initializer list, we can supply the parameter pack directly to it instead of creating our own copy of the list, then copying it again into the std::vector.
2019-05-19shader/shader_ir: Remove unnecessary template parameter packs from Operation() overloads where applicableLioncash1-2/+0
These overloads don't actually make use of the parameter pack, so they can be turned into regular non-template function overloads.
2019-05-19shader/shader_ir: Mark tracking functions as const member functionsLioncash2-8/+11
These don't actually modify instance state, so they can be marked as const member functions
2019-05-19shader/shader_ir: Place implementations of constructor and destructor in cpp fileLioncash2-5/+9
Given the class contains quite a lot of non-trivial types, place the constructor and destructor within the cpp file to avoid inlining construction and destruction code everywhere the class is used.
2019-05-10video_core/shader/decode/texture: Remove unused variable from GetTld4Code()Lioncash1-1/+0
2019-05-04shader/decode/texture: Remove unused variableLioncash1-1/+0
This isn't used anywhere, so we can get rid of it.
2019-05-03shader_ir/other: Implement IPA.IDXReinUsesLisp1-5/+8
2019-05-03shader_ir/memory: Assert on non-32 bits ALD.PHYSReinUsesLisp1-0/+3
2019-05-03shader: Add physical attributes commentariesReinUsesLisp3-4/+6
2019-05-03gl_shader_decompiler: Implement GLSL physical attributesReinUsesLisp1-1/+1
2019-05-03shader_ir/memory: Implement physical input attributesReinUsesLisp3-6/+28
2019-05-03shader: Remove unused AbufNode Ipa modeReinUsesLisp4-29/+10
2019-05-03shader_ir/memory: Emit AL2P IRReinUsesLisp2-0/+22
2019-04-26shader_ir: Move Sampler index entry in operand< to sort declarationsReinUsesLisp1-2/+2
2019-04-26shader_ir: Add missing entry to Sampler operand< comparisonReinUsesLisp1-2/+3
2019-04-26shader_ir/texture: Fix sampler const buffer key shiftReinUsesLisp1-1/+1
2019-04-21Corrections Half Float operations on const buffers and implement saturation.Fernando Sahmkow2-15/+16
2019-04-18video_core: Silent -Wswitch warningsReinUsesLisp4-9/+16
2019-04-16shader_ir/decode: Fix half float pre-operations and remove MetaHalfArithmeticReinUsesLisp7-52/+42
Operations done before the main half float operation (like HAdd) were managing a packed value instead of the unpacked one. Adding an unpacked operation allows us to drop the per-operand MetaHalfArithmetic entry, simplifying the code overall.
2019-04-16shader_ir/decode: Implement half float saturationReinUsesLisp3-4/+14
2019-04-16shader_ir/decode: Reduce severity of unimplemented half-float FTZReinUsesLisp3-3/+9
2019-04-16renderer_opengl: Implement half float NaN comparisonsReinUsesLisp2-18/+17
2019-04-16shader_ir: Avoid using static on heap-allocated objectsReinUsesLisp1-5/+4
Using static here might be faster at runtime, but it adds a heap allocation called before main.
2019-04-16Do some corrections in conversion shader instructions.Fernando Sahmkow1-16/+53
Corrects encodings for I2F, F2F, I2I and F2I Implements Immediate variants of all four conversion types. Add assertions to unimplemented stuffs.
2019-04-14shader_ir: Implement STG, keep track of global memory usage and flushReinUsesLisp2-38/+87
2019-04-08Correct XMAD mode, psl and high_b on different encodings.Fernando Sahmkow1-9/+30
2019-04-08Adapt Bindless to work with AOFFIFernando Sahmkow1-7/+18
2019-04-08Move ConstBufferAccessor to Maxwell3d, correct mistakes and clang format.Fernando Sahmkow2-3/+4
2019-04-08Fix TMMLFernando Sahmkow1-5/+7
2019-04-08Refactor GetTextureCode and GetTexCode to use an optional instead of optional parametersFernando Sahmkow2-34/+33
2019-04-08Implement TXQ_BFernando Sahmkow1-2/+8
2019-04-08Implement TMML_BFernando Sahmkow1-5/+10
2019-04-08Corrections to TEX_BFernando Sahmkow1-4/+5
2019-04-08Implement Bindless Handling on SetupTextureFernando Sahmkow1-4/+3
2019-04-08Unify both sampler types.Fernando Sahmkow2-18/+40
2019-04-08Implement Bindless Samplers and TEX_B in the IR.Fernando Sahmkow2-15/+74
2019-04-03shader_ir/memory: Reduce severity of LD_L cache management and log itReinUsesLisp1-2/+2
2019-04-03shader_ir/memory: Reduce severity of ST_L cache management and log itReinUsesLisp1-2/+3
2019-03-31shader_ir/decode: Silent implicit sign conversion warningMat M1-2/+2
Co-Authored-By: ReinUsesLisp <reinuseslisp@airmail.cc>
2019-03-30shader_ir/decode: Implement AOFFI for TEX and TLD4ReinUsesLisp2-27/+94
2019-03-30shader_ir: Implement immediate register trackingReinUsesLisp2-1/+19
2019-02-26shader/decode: Remove extras from MetaTextureReinUsesLisp2-15/+26
2019-02-26shader/decode: Split memory and texture instructions decodingReinUsesLisp4-493/+527
2019-02-25shader/track: Resolve variable shadowing warningsLioncash1-5/+5
2019-02-14shader_decompiler: Improve Accuracy of Attribute Interpolation.Fernando Sahmkow2-3/+14
2019-02-12gl_shader_decompiler: Re-implement TLDS lodReinUsesLisp1-1/+1
2019-02-11Corrected F2I None mode to RoundEven.Fernando Sahmkow1-3/+3
2019-02-11Fix incorrect value for CC bit in IADDFernando Sahmkow1-2/+2
2019-02-07shader_ir: Remove F4 prefix to texture operationsReinUsesLisp2-14/+13
This was originally included because texture operations returned a vec4. These operations now return a single float and the F4 prefix doesn't mean anything.
2019-02-07shader_ir: Clean texture management codeReinUsesLisp2-101/+63
Previous code relied on GLSL parameter order (something that's always ill-formed on an IR design). This approach passes spatial coordiantes through operation nodes and array and depth compare values in the the texture metadata. It still contains an "extra" vector containing generic nodes for bias and component index (for example) which is still a bit ill-formed but it should be better than the previous approach.
2019-02-07gl_shader_disk_cache: Save GLSL and entries into the precompiled fileReinUsesLisp1-0/+9
2019-02-03Fix TXQ not using the component mask.Fernando Sahmkow1-6/+9
2019-02-03shader_ir/memory: Add ST_L 64 and 128 bits storesReinUsesLisp1-3/+11
2019-02-03shader/track: Search inside of conditional nodesReinUsesLisp1-0/+11
Some games search conditionally use global memory instructions. This allows the heuristic to search inside conditional nodes for the source constant buffer.
2019-02-03shader_ir: Rename BasicBlock to NodeBlockReinUsesLisp29-119/+117
It's not always used as a basic block. Rename it for consistency.
2019-02-03shader_ir: Pass decoded nodes as a whole instead of per basic blocksReinUsesLisp27-57/+62
Some games call LDG at the top of a basic block, making the tracking heuristic to fail. This commit lets the heuristic the decoded nodes as a whole instead of per basic blocks. This may lead to some false positives but allows it the heuristic to track cases it previously couldn't.
2019-02-03shader_ir/memory: Add LD_L 128 bits loadsReinUsesLisp1-7/+19
2019-02-03shader_bytecode: Rename BytesN enums to BitsNReinUsesLisp1-4/+4
2019-02-03shader_ir/memory: Add LD_L 64 bits loadsReinUsesLisp1-6/+17
2019-01-30shader_ir: Unify constant buffer offset valuesReinUsesLisp14-22/+24
Constant buffer values on the shader IR were using different offsets if the access direct or indirect. cbuf34 has a non-multiplied offset while cbuf36 does. On shader decoding this commit multiplies it by four on cbuf34 queries.
2019-01-30shader_decode: Implement LDG and basic cbuf trackingReinUsesLisp3-4/+159
2019-01-28shader/shader_ir: Amend three comment typosLioncash1-3/+3
Given we're in the area, these are three trivial typos that can be corrected.
2019-01-28shader/shader_ir: Amend constructor initializer ordering for AbufNodeLioncash1-2/+2
Orders the class members in the same order that they would actually be initialized in. Gets rid of two compiler warnings.
2019-01-28shader/decode: Avoid a pessimizing std::move within DecodeRange()Lioncash1-1/+1
std::moveing a local variable in a return statement has the potential to prevent copy elision from occurring, so this can just be converted into a regular return.
2019-01-16shader_ir: Fixup clang buildReinUsesLisp1-4/+6
2019-01-15shader_decode: Fixup XMADReinUsesLisp1-1/+1
2019-01-15shader_ir: Pass to decoder functions basic block's codeReinUsesLisp27-82/+83
2019-01-15shader_decode: Improve zero flag implementationReinUsesLisp15-75/+79
2019-01-15shader_ir: Remove composite primitives and use temporals insteadReinUsesLisp3-175/+187
2019-01-15shader_decode: Use proper primitive namesReinUsesLisp3-15/+13
2019-01-15shader_decode: Use BitfieldExtract instead of shift + andReinUsesLisp7-48/+30
2019-01-15shader_ir: Remove Ipa primitiveReinUsesLisp2-5/+2
2019-01-15video_core: Rename glsl_decompiler to gl_shader_decompilerReinUsesLisp2-1631/+0
2019-01-15shader_ir: Remove RZ and use Register::ZeroIndex insteadReinUsesLisp3-12/+16
2019-01-15shader_decode: Implement TEXS.F16ReinUsesLisp3-15/+57
2019-01-15shader_decode: Fixup R2PReinUsesLisp1-2/+3
2019-01-15glsl_decompiler: Fixup TLDSReinUsesLisp1-1/+0
2019-01-15glsl_decompiler: Fixup geometry shadersReinUsesLisp1-10/+16
2019-01-15shader_decode: Fixup WriteLogicOperation zero comparisonReinUsesLisp1-1/+1
2019-01-15glsl_decompiler: Fixup permissive member function declarationsReinUsesLisp1-133/+133
2019-01-15shader_decode: Fixup PSETReinUsesLisp1-2/+3
2019-01-15shader_decode: Fixup clang-formatReinUsesLisp2-2/+4
2019-01-15video_core: Implement IR based geometry shadersReinUsesLisp3-2/+96
2019-01-15shader_decode: Implement VMAD and VSETPReinUsesLisp3-0/+125
2019-01-15shader_decode: Implement HSET2ReinUsesLisp3-1/+50
2019-01-15shader_decode: Rework HSETP2ReinUsesLisp4-47/+57
2019-01-15shader_decode: Implement R2PReinUsesLisp1-1/+28
2019-01-15shader_decode: Implement CSETPReinUsesLisp1-14/+37
2019-01-15shader_decode: Implement PSETReinUsesLisp1-1/+16
2019-01-15shader_decode: Implement HFMA2ReinUsesLisp3-5/+59
2019-01-15glsl_decompiler: Remove HNegate inliningReinUsesLisp1-10/+0
2019-01-15shader_decode: Implement POPCReinUsesLisp4-1/+22
2019-01-15shader_decode: Implement TLDS (untested)ReinUsesLisp3-10/+92
2019-01-15shader_decode: Update TLD4 reflecting #1862 changesReinUsesLisp2-52/+52
2019-01-15shader_ir: Fixup TEX and TEXS and partially fix TLD4 decompilingReinUsesLisp3-60/+72
2019-01-15shader_decode: Fixup FSETReinUsesLisp1-2/+2
2019-01-15shader_decode: Implement IADD32IReinUsesLisp1-0/+11
2019-01-15video_core: Return safe values after an assert hitsReinUsesLisp8-8/+19
2019-01-15shader_decode: Implement FFMAReinUsesLisp1-1/+36
2019-01-15video_core: Address feedbackReinUsesLisp4-13/+16
2019-01-15shader_ir: Fixup file inclusions and clang-formatReinUsesLisp3-2/+2
2019-01-15shader_ir: Move comment node stringMat M1-2/+2
Co-Authored-By: ReinUsesLisp <reinuseslisp@airmail.cc>
2019-01-15shader_ir: Address feedback to avoid UB in bit castingReinUsesLisp1-2/+4
2019-01-15shader_decode: Fixup clang-formatReinUsesLisp2-3/+2
2019-01-15shader_decode: Implement LEAReinUsesLisp1-0/+55
2019-01-15shader_decode: Implement IADD3ReinUsesLisp1-0/+61
2019-01-15shader_decode: Implement LOP3ReinUsesLisp2-0/+62
2019-01-15shader_decode: Implement ST_LReinUsesLisp1-0/+17
2019-01-15shader_decode: Implement LD_LReinUsesLisp1-0/+18
2019-01-15shader_decode: Implement HSETP2ReinUsesLisp1-1/+37
2019-01-15shader_decode: Implement HADD2 and HMUL2ReinUsesLisp1-1/+48
2019-01-15shader_decode: Implement HADD2_IMM and HMUL2_IMMReinUsesLisp1-1/+28
2019-01-15shader_decode: Implement MOV_SYSReinUsesLisp1-0/+27
2019-01-15shader_decode: Implement IMNMXReinUsesLisp1-0/+16
2019-01-15shader_decode: Implement F2F_CReinUsesLisp1-2/+10
2019-01-15shader_decode: Implement I2IReinUsesLisp1-0/+26
2019-01-15shader_decode: Implement BRA internal flagReinUsesLisp1-4/+8
2019-01-15shader_decode: Implement ISCADDReinUsesLisp1-0/+15
2019-01-15shader_decode: Implement XMADReinUsesLisp1-1/+85
2019-01-15shader_decode: Implement PBK and BRKReinUsesLisp1-1/+22
2019-01-15shader_decode: Implement LOPReinUsesLisp1-0/+15
2019-01-15shader_decode: Implement SELReinUsesLisp1-0/+8
2019-01-15shader_decode: Implement IADDReinUsesLisp1-1/+28
2019-01-15shader_decode: Implement ISETPReinUsesLisp1-1/+30
2019-01-15shader_decode: Implement BFIReinUsesLisp1-1/+22
2019-01-15shader_decode: Implement ISETReinUsesLisp1-1/+27
2019-01-15shader_decode: Implement LD_CReinUsesLisp1-0/+31
2019-01-15shader_decode: Implement SHLReinUsesLisp1-0/+8
2019-01-15shader_decode: Implement SHRReinUsesLisp1-1/+26
2019-01-15shader_decode: Implement LOP32IReinUsesLisp2-1/+72
2019-01-15shader_decode: Implement BFEReinUsesLisp1-1/+25
2019-01-15shader_decode: Implement FSETReinUsesLisp1-1/+36
2019-01-15shader_decode: Implement F2IReinUsesLisp1-0/+37
2019-01-15shader_decode: Implement I2FReinUsesLisp1-0/+23
2019-01-15shader_decode: Implement F2FReinUsesLisp1-1/+37
2019-01-15shader_decode: Stub DEPBARReinUsesLisp1-0/+4
2019-01-15shader_decode: Implement SSY and SYNCReinUsesLisp1-0/+19
2019-01-15shader_decode: Implement PSETPReinUsesLisp1-1/+21
2019-01-15shader_decode: Implement TMMLReinUsesLisp1-3/+45
2019-01-15shader_decode: Implement TEX and TXQReinUsesLisp2-0/+223
2019-01-15shader_decode: Implement TEXS (F32)ReinUsesLisp2-0/+217
2019-01-15shader_decode: Implement FSETPReinUsesLisp1-1/+33
2019-01-15shader_decode: Partially implement BRAReinUsesLisp1-0/+12
2019-01-15shader_decode: Implement IPAReinUsesLisp1-0/+12
2019-01-15shader_decode: Implement EXITReinUsesLisp1-1/+32
2019-01-15shader_decode: Implement ST_AReinUsesLisp1-0/+30
2019-01-15shader_decode: Implement LD_AReinUsesLisp1-1/+39
2019-01-15shader_decode: Implement FADD32IReinUsesLisp1-0/+12
2019-01-15shader_decode: Implement FMUL32_IMMReinUsesLisp1-0/+10
2019-01-15shader_decode: Implement MOV32_IMMReinUsesLisp1-1/+9
2019-01-15shader_decode: Stub RRO_C, RRO_R and RRO_IMMReinUsesLisp1-0/+9
2019-01-15shader_decode: Implement FMNMX_C, FMNMX_R and FMNMX_IMMReinUsesLisp1-0/+18
2019-01-15shader_decode: Implement MUFUReinUsesLisp1-0/+29
2019-01-15shader_decode: Implement FADD_C, FADD_R and FADD_IMMReinUsesLisp1-0/+15
2019-01-15shader_decode: Implement FMUL_C, FMUL_R and FMUL_IMMReinUsesLisp1-0/+42
2019-01-15shader_decode: Implement MOV_C and MOV_RReinUsesLisp1-1/+23
2019-01-15glsl_decompiler: ImplementationReinUsesLisp2-0/+1481
2019-01-15shader_ir: Add condition code helperReinUsesLisp2-0/+13
2019-01-15shader_ir: Add predicate combiner helperReinUsesLisp2-0/+15
2019-01-15shader_ir: Add comparison helpersReinUsesLisp2-0/+106
2019-01-15shader_ir: Add half float helpersReinUsesLisp2-0/+44
2019-01-15shader_ir: Add integer helpersReinUsesLisp2-0/+40
2019-01-15shader_ir: Add float helpersReinUsesLisp2-0/+24
2019-01-15shader_ir: Add settersReinUsesLisp2-0/+24
2019-01-15shader_ir: Add local memory gettersReinUsesLisp2-0/+7
2019-01-15shader_ir: Add internal flag gettersReinUsesLisp2-0/+10
2019-01-15shader_ir: Add attribute gettersReinUsesLisp2-0/+26
2019-01-15shader_ir: Add constant buffer gettersReinUsesLisp2-0/+25
2019-01-15shader_ir: Add register getterReinUsesLisp2-0/+9
2019-01-15shader_ir: Add immediate node constructorsReinUsesLisp2-1/+34
2019-01-15shader_ir: Initial implementationReinUsesLisp28-0/+1542
2018-01-13Remove references to PICA and rasterizers in video_coreJames Rowe9-2453/+0
2017-09-17Improved performance of FromAttributeBufferHuw Pascoe1-1/+2
Ternary operator is optimized by the compiler whereas std::min() is meant to return a value. I've noticed a 5%-10% emulation speed increase.
2017-08-19pica/shader/jit: implement SETEMIT and EMITwwylele2-2/+49
2017-08-19correct constnesswwylele2-2/+4
2017-08-19pica/shader/interpreter: implement SETEMIT and EMITwwylele1-0/+16
2017-08-19pica/shader: extend UnitState for GSwwylele2-0/+84
Among four shader units in pica, a special unit can be configured to run both VS and GS program. GSUnitState represents this unit, which extends UnitState (which represents the other three normal units) with extra state for primitive emitting. It uses lots of raw pointers to represent internal structure in order to keep it standard layout type for JIT to access. This unit doesn't handle triangle winding (inverting) itself; instead, it calls a WindingSetter handler. This will be explained in the following commits
2017-07-27pica/shader_interpreter: fix off-by-one in LOOPwwylele1-1/+1
2017-06-17Stop using reserved operator names (and/or/xor) with XbyakYuri Kunde Schlesner1-13/+13
Also has the Dynarmic upgrade with the same change
2017-05-11Pica: Set program code / swizzle data limit to 4096Jannik Vogel5-13/+16
One of the later commits will enable writing to GS regs. It turns out that on startup, most games will write 4096 GS program words. The current limit of 1024 would hence result in 3072 (4096 - 1024) error messages: ``` HW.GPU <Error> video_core/shader/shader.cpp:WriteProgramCode:229: Invalid GS program offset 1024 ``` New constants have been introduced to represent these limits. The swizzle data size has also been raised. This matches the given field sizes of [GPUREG_SH_OPDESCS_INDEX](https://3dbrew.org/wiki/GPU/Internal_Registers#GPUREG_SH_OPDESCS_INDEX) and [GPUREG_SH_CODETRANSFER_INDEX](https://www.3dbrew.org/wiki/GPU/Internal_Registers#GPUREG_SH_CODETRANSFER_INDEX) (12 bit = [0; 4095]).
2017-02-27Doxygen: Amend minor issues (#2593)Mat M2-2/+4
Corrects a few issues with regards to Doxygen documentation, for example: - Incorrect parameter referencing. - Missing @param tags. - Typos in @param tags. and a few minor other issues.
2017-02-12video_core/shader: Document sanitized MUL operationYuri Kunde Schlesner1-0/+8
2017-02-11video_core: Fix benign out-of-bounds indexing of array (#2553)Yuri Kunde Schlesner1-2/+1
The resulting pointer wasn't written to unless the index was verified as valid, but that's still UB and triggered debug checks in MSVC. Reported by garrettboast on IRC
2017-02-09VideoCore: Split regs.h inclusionsYuri Kunde Schlesner2-2/+4
2017-02-04VideoCore: Move Regs to its own fileYuri Kunde Schlesner2-2/+2
2017-02-04VideoCore: Split shader regs from Regs structYuri Kunde Schlesner4-6/+6
2017-02-04VideoCore: Split rasterizer regs from Regs structYuri Kunde Schlesner2-13/+13
2017-02-03ShaderJIT: add 16 dummy bytes at the bottom of the stackwwylele1-2/+5
2017-01-31Common/x64: remove legacy emitter and abi (#2504)Weiyi Wang1-1/+0
These are not used any more since we moved shader JIT to xbyak.
2017-01-31shader_jit_x64_compiler: esi and edi should be persistent (#2500)Merry1-0/+2
2017-01-30VideoCore: Extract swrast-specific data from OutputVertexYuri Kunde Schlesner2-37/+14
2017-01-30VideoCore/Shader: Clean up OutputVertex::FromAttributeBufferYuri Kunde Schlesner1-9/+14
This also fixes a long-standing but neverthless harmless memory corruption bug, whech the padding of the OutputVertex struct would get corrupted by unused attributes.
2017-01-30VideoCore: Split shader output writing from semantic loadingYuri Kunde Schlesner2-18/+16
2017-01-30VideoCore: Consistently use shader configuration to load attributesYuri Kunde Schlesner4-12/+12
2017-01-30VideoCore: Rename some types to more accurate namesYuri Kunde Schlesner4-6/+6
2017-01-26VideoCore/Shader: Move entry_point to SetupBatchYuri Kunde Schlesner5-22/+23
2017-01-26VideoCore/Shader: Move per-batch ShaderEngine state into ShaderSetupYuri Kunde Schlesner5-40/+36
2017-01-26Shader: Remove OutputRegisters structYuri Kunde Schlesner3-19/+13
2017-01-26Shader: Initialize conditional_code in interpreterYuri Kunde Schlesner2-3/+3
This doesn't belong in LoadInputVertex because it also happens for non-VS invocations. Since it's not used by the JIT it seems adequate to initialize it in the interpreter which is the only thing that cares about them.
2017-01-26Shader: Don't read ShaderSetup from global stateYuri Kunde Schlesner1-3/+3
2017-01-26shader_jit_x64: Don't read program from global stateYuri Kunde Schlesner3-22/+22
2017-01-26VideoCore/Shader: Move ProduceDebugInfo to InterpreterEngineYuri Kunde Schlesner4-19/+10
2017-01-26VideoCore/Shader: Split interpreter and JIT into separate ShaderEnginesYuri Kunde Schlesner6-96/+150
2017-01-26VideoCore/Shader: Rename shader_jit_x64{ => _compiler}.{cpp,h}Yuri Kunde Schlesner3-2/+2
2017-01-26VideoCore/Shader: Split shader uniform state and shader engineYuri Kunde Schlesner3-16/+46
Currently there's only a single dummy implementation, which will be split in a following commit.
2017-01-26VideoCore/Shader: Add constness to methodsYuri Kunde Schlesner2-4/+4
2017-01-26VideoCore/Shader: Use only entry_point as ShaderSetup paramYuri Kunde Schlesner2-9/+11
This removes all implicit dependency of ShaderState on global PICA state.
2017-01-26VideoCore/Shader: Use self instead of g_state.vs in ShaderSetupYuri Kunde Schlesner2-11/+8
2017-01-26VideoCore/Shader: Extract input vertex loading code into functionYuri Kunde Schlesner2-20/+22
2017-01-23video_core: fix shader.cpp signed / unsigned warningKloen1-2/+2
2017-01-04Fix some warnings (#2399)Jonathan Hao1-2/+0
2016-12-16VideoCore/Shader: Extract DebugData out from UnitStateYuri Kunde Schlesner7-101/+97
2016-12-16Remove unnecessary castYuri Kunde Schlesner1-3/+1
2016-12-16VideoCore/Shader: Extract evaluate_condition lambda to function scopeYuri Kunde Schlesner1-26/+24
2016-12-16VideoCore/Shader: Extract call lambda up a scope and remove unused paramYuri Kunde Schlesner1-21/+17
2016-12-16VideoCore/Shader: Remove dynamic control flow in (Get)UniformOffsetYuri Kunde Schlesner2-18/+11
2016-12-16VideoCore/Shader: Move DebugData to a separate fileYuri Kunde Schlesner3-172/+188
2016-12-15shader_jit_x64: Use LOOPCOUNT_REG as a 64-bit reg when indexingYuri Kunde Schlesner1-1/+1
2016-12-15VideoCore: Eliminate an unnecessary copy in the drawcall loopYuri Kunde Schlesner2-2/+2
2016-12-15shader_jit_x64: Use Reg32 for LOOP* registers, eliminating castsYuri Kunde Schlesner1-16/+16
2016-12-15VideoCore: Convert x64 shader JIT to use Xbyak for assemblyYuri Kunde Schlesner2-223/+225
2016-12-04shader_jit: Fix non-SSE4.1 path where FLR would not truncateJannik Vogel1-1/+1
2016-12-02shader_jit: Load LOOPCOUNT_REG and LOOPINC 4 bit left-shiftedJannik Vogel1-6/+9
2016-09-30VideoCore: Shader interpreter cleanupsYuri Kunde Schlesner1-32/+42
2016-09-30VideoCore: Fix out-of-bounds read in ShaderSetup::ProduceDebugInfoYuri Kunde Schlesner1-3/+1
As far as I can tell, memset was replaced by a fill without correcting the parameter type, causing an out-of-bounds array read in the Vec4 constructor.
2016-09-21Remove special rules for Windows.h and library includesYuri Kunde Schlesner1-1/+1
2016-09-21Use negative priorities to avoid special-casing the self-includeYuri Kunde Schlesner3-3/+3
2016-09-21Remove empty newlines in #include blocks.Emmanuel Gil Peyrot5-22/+3
This makes clang-format useful on those. Also add a bunch of forgotten transitive includes, which otherwise prevented compilation.
2016-09-19Manually tweak source formatting and then re-run clang-formatYuri Kunde Schlesner4-9/+6
2016-09-18Sources: Run clang-format on everything.Emmanuel Gil Peyrot6-311/+335
2016-09-16VideoCore: Fix dangling lambda context in shader interpreterYuri Kunde Schlesner1-1/+1
The static meant that after the first execution, these lambda context would be pointing to a random location on the stack. Fixes a random crash when using the interpreter.
2016-05-16Retrieve shader result from new OutputRegisters-typeJannik Vogel3-56/+68
2016-05-13Use new shader-jit signature for interpreterJannik Vogel3-8/+8
2016-05-13Refactor access to state in shader-jitJannik Vogel4-24/+42
2016-05-12Move program_counter and call_stack from UnitState to interpreterJannik Vogel3-45/+42
2016-05-12Move default_attributes into Pica stateJannik Vogel1-2/+0
2016-05-11Turn ShaderSetup into structJannik Vogel2-52/+53
2016-05-11Pica: Add tc0.w to OutputVertexJannik Vogel1-1/+2
2016-05-03Pica: Replace logic in shader.cpp with loopJannik Vogel1-34/+4
2016-04-30VideoCore: Run include-what-you-use and fix most includes.Emmanuel Gil Peyrot6-14/+43
2016-04-29Common: Remove section measurement from profiler (#1731)Yuri Kunde Schlesner1-3/+0
This has been entirely superseded by MicroProfile. The rest of the code can go when a simpler frametime/FPS meter is added to the GUI.
2016-04-28Refactor: Extract VertexLoader from command_processor.cpp.Henrik Rydgard1-1/+1
Preparation for a similar concept to Dolphin or PPSSPP. These can be JIT-ed and cached.
2016-04-24shader: Shader size is long uint, not uint.Sam Spilsbury1-1/+1
2016-04-24shader: Handle non-CALL opcodes with a breakSam Spilsbury1-0/+2
2016-04-24shader: Format string must be provided inline and not as a variableSam Spilsbury1-1/+1
2016-04-14shader_jit_x64: Rename RuntimeAssert to Compile_Assert.bunnei2-5/+5
2016-04-14shader_jit_x64.cpp: Rename JitCompiler to JitShader.bunnei3-92/+92
2016-04-14shader_jit_x64: Free memory that's no longer needed after compilation.bunnei1-0/+6
2016-04-14shader_jit_x64: Use a sorted vector instead of a set for keeping track of return addresses.bunnei2-5/+8
2016-04-14shader_jit_x64: Use CALL/RET instead of JMP for subroutines.bunnei1-17/+7
2016-04-14shader_jit_x64: Separate initialization and code generation for readability.bunnei1-9/+8
2016-04-14shader_jit_x64: Get rid of unnecessary last_program_counter variable.bunnei2-6/+2
2016-04-14shader_jit_x64: Execute certain asserts at runtime.bunnei2-5/+19
- This is because we compile the full shader code space, and therefore its common to compile malformed instructions.
2016-04-14shader: Remove unused 'state' argument from 'Setup' function.bunnei2-3/+2
2016-04-14shader_jit_x64: Specify shader main offset at runtime.bunnei3-10/+6
2016-04-14shader_jit_x64: Allocate each program independently and persist for emu session.bunnei3-38/+28
2016-04-14shader_jit_x64: Rewrite flow control to support arbitrary CALL and JMP instructions.bunnei2-35/+119
2016-04-14shader_jit_x64: Fix strict memory aliasing issues.bunnei1-1/+3
2016-04-05Common: Remove Common::make_unique, use std::make_uniqueMerryMage1-1/+0
2016-03-17video_core: Don't cast away constLioncash1-1/+1
2016-03-17shader_interpreter: use std::inner_product for the dot productLioncash1-5/+3
Same thing, less code.
2016-03-15PICA: Fix MAD/MADI encodingJannik Vogel2-29/+33
2016-03-14Respect vs output mapJannik Vogel1-4/+14
2016-03-12shader_jit_x64: Clear cache after code space fills up.bunnei3-2/+19
2016-03-12shader_jit_x64: Make assert outputs more useful & cleanup formatting.bunnei1-4/+7
2016-03-12shader: Update log message to use proper log class.bunnei1-1/+1
2016-03-09Common: Get rid of alignment macrosLioncash1-4/+4
The gl rasterizer already uses alignas, so we may as well move everything over.
2016-03-03Add immediate mode vertex submissionDwayne Slater4-2/+22
2016-02-05pica: Implement decoding of basic fragment lighting components.bunnei2-5/+9
- Diffuse - Distance attenuation - float16/float20 types - Vertex Shader 'view' output
2016-01-25Shader: Implement "invert condition" feature of IFU instructionYuri Kunde Schlesner2-2/+5
If the bit 0 of the JMPU instruction is set, then the jump condition will be inverted. That is, a jump will happen when the boolean is false instead of when it is true.
2016-01-24Shader JIT: Fix off-by-one error when compiling JMPsYuri Kunde Schlesner2-6/+6
There was a mistake in the JMP code which meant that one instruction at the destination would be skipped when the jump was taken. This commit also changes the meaning of the culprit parameter to make it less confusing and avoid similar mistakes in the future.
2015-09-11video_core: Reorganize headersLioncash3-6/+4
2015-09-11video_core: Remove unnecessary includes from headersLioncash1-2/+0
2015-09-10video_core: Remove unused variablesLioncash2-2/+0
2015-09-07Shader JIT: Use SCALE constant from emitteraroulin1-4/+4
2015-09-07Shader: Fix size_t to int casts of register offsetsaroulin2-15/+21
2015-09-02video_core: Fix format specifiers warningsaroulin1-1/+2
2015-09-01x64: Proper stack alignment in shader JIT function callsaroulin2-28/+18
Import Dolphin stack handling and register saving routines Also removes the x86 parts from abi files
2015-08-31Shader JIT: Fix SGE/SGEI NaN behavioraroulin1-3/+3
SGE was incorrectly emulated w.r.t. NaN behavior as the CMPSS SSE instruction was used with NLT
2015-08-27Shader JIT: Fix float to integer rounding in MOVAaroulin1-2/+2
MOVA converts new address register values from floats to integers using truncation
2015-08-27Shader JIT: ifdef out reference to ifdef'd out shader_maparchshift1-0/+2
shader_map was only defined on x86 architectures, but was cleared on shutdown with no ifdef protection. Ifdef this out so non-x86 architectures can be built.
2015-08-25Integrate the MicroProfile profiling libraryYuri Kunde Schlesner1-0/+3
This brings goodies such as a configurable user interface and multi-threaded timeline view.
2015-08-24Shader JIT: Tiny micro-optimization in DPHYuri Kunde Schlesner1-4/+4
2015-08-24Shaders: Fix multiplications between 0.0 and infYuri Kunde Schlesner2-39/+45
The PICA200 semantics for multiplication are so that when multiplying inf by exactly 0.0, the result is 0.0, instead of NaN, as defined by IEEE. This is relied upon by games. Fixes #1024 (missing OoT interface items)
2015-08-24Shaders: Explicitly conform to PICA semantics in MAX/MINYuri Kunde Schlesner2-2/+10
2015-08-24Shader JIT: Add name to second scratch register (XMM4)Yuri Kunde Schlesner1-3/+5
2015-08-24shader_jit: Replace two MDisp usages with MatRLioncash1-2/+2
2015-08-24Shader JIT: Fix CMP NaN behavior to match hardwareYuri Kunde Schlesner1-8/+23
2015-08-23Shader: Use std::sqrt for float instead of sqrtaroulin1-1/+1
2015-08-23Shader: RCP and RSQ computes only the 1st componentaroulin2-10/+10
2015-08-22Shader: implement DPH/DPHI in JITaroulin2-2/+36
2015-08-22Shader: implement DPH/DPHI in interpreteraroulin1-1/+8
Tests revealed that the component with w=1 is SRC1 and not SRC2, it is now fixed on 3dbrew.
2015-08-19Shader: implement SGE, SGEI and SLT in JITaroulin2-15/+36
2015-08-19Shader: implement SGE, SGEI in interpreteraroulin1-0/+14
2015-08-19Shader: Save caller-saved registers in JIT before a CALLaroulin2-0/+33
2015-08-17Shader: implement EX2 and LG2 in JITaroulin2-2/+22
2015-08-16Shader: implement EX2 and LG2 in interpreteraroulin1-0/+36
2015-08-16Build fix for Debug configurations.Tony Wasserka1-1/+1
2015-08-16Introduce a shader tracer to allow inspection of input/output values for each processed instruction.Tony Wasserka5-37/+322
2015-08-16citra-qt: Improve shader debugger.Tony Wasserka1-6/+0
Now supports dumping the current shader and recognizes a larger number of output semantics.
2015-08-16Shader: Use a POD struct for registers.bunnei5-40/+43
2015-08-16Rename ARCHITECTURE_X64 definition to ARCHITECTURE_x86_64.bunnei1-6/+5
2015-08-16Common: Cleanup CPU capability detection code.bunnei1-5/+5
2015-08-16Common: Move cpu_detect to x64 directory.bunnei1-2/+1
2015-08-16x64: Refactor to remove fake interfaces and general cleanups.bunnei5-144/+22
2015-08-16JIT: Support negative address offsets.bunnei1-26/+25
2015-08-16Shader: Initial implementation of x86_x64 JIT compiler for Pica vertex shaders.bunnei6-2/+924
- Config: Add an option for selecting to use shader JIT or interpreter. - Qt: Add a menu option for enabling/disabling the shader JIT.
2015-08-15Common: Added MurmurHash3 hash function for general-purpose use.bunnei1-1/+1
2015-08-15Shader: Define a common interface for running vertex shader programs.bunnei4-184/+278
2015-08-15Shader: Move shader code to its own subdirectory, "shader".bunnei2-0/+701