76
edits
Changes
no edit summary
2. We will record where the result was found and the position that we found this match. This would allow another more sophisticated device function to make these translations. This function on a CPU would be at most O(n^2)
3. Instead, introduce a structure to manage our complex data. (See Below)
[[File:Struct.PNG]]
Structures can be passed into the Kernel, but initialization and access are exceedingly difficult and slow.
'''New CPU Code to Parallelize'''
[[File:Matching_CPU.PNG]]
New CPU code - has an approximate runtime growth rate of O(n^2).
'''GPU Kernel'''
[[File:Matching_GPU.PNG]]
Internally Optimized to use shared memory for our result array. The kernel freely changes these values to true and false depending on where matches are found.
Notes:
Instead of using a result array __ballot(PREDICATE) could be considered (More research on this to be done).