Changes

Happy Valley

227 bytes added, 13:00, 9 April 2018

→‎Switching to CudaMallocHost

Low memcpy/compute overlap is related to the Concurrent Kernel Execution. In theory, you can pass chunks of the input array asynchronously into each kernel in the array. However, it seems to be hard to partition the inout data in any meaningful way.

==== Switching to CudaMallocHost ====

There are slightly performance increase when switch to CudaMallocHost.

''' The data table '''

[[File:HVMallocHosttable.png|800px]]

''' The diagram '''

[[File:HVMallocHost.png|800px]]

==== Switching to x86 from x64 ====

Yalong

56

edits

Changes

Happy Valley

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools