100
edits
Changes
→Visual Studio 2017 and CUDA 9.1 Problem
{{GPU610/DPS915 Index | 2012320171}}=Student Resources=
The purpose of this page is to share useful information that can help groups with their CUDA projects.
= CUDA Enabled Cards =
[http://en.wikipedia.org/wiki/CUDA#Supported_GPUs List @ CUDA Wiki]
= Workshop Notes =
==BLAS Documentation==
See the [[GPU610/DPS915_BLAS_Documentation | BLAS Documentation Page]]
====Troubleshooting====
Problem with CUDA driver version 5.0.24 on MacBook Pro 2012 [http://blogs.adobe.com/premiereprotraining/2012/08/known-issues-with-cuda-5-0-17-driver-including-crashes-and-kernel-panics.html Fix]
After following the instructions,,provided in today'''''incx'''''s lecture, to setting up the library and include files in the project properties to run Cuda on VS 2012 Express at home, I still encounter:is the stride for vector xlinker error; "unable to find cuda_runtime.h". Googling around, there are two ways around this. It By default, VS Studio uses the 32bit debugger, which you can change in project properties. You will have any valueto use the Win32 version of the library directives (ie in my case "C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\Win32") with the default debugger. If use the x64 library files, change the debugger to 64bit (which I neglected and lost a good portion of time). Cheers.
Find nvcc.profile (usually located in "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin") and replace everything inside it with this (if you have not changed it before):
<pre>
TOP = $(_HERE_)/..
== Dynamically Allocated Shared Memory ==Here is a roundabout way of working around the shared memory limitations of your graphics card.The idea is to send in chunks that your kernel can handle, then keep on sending chunks until there are none to be sent. The address being sent is also being shifted based on the chunk size.<div style='color:#000000;background:#ffffff;'> CHUNKSIZE <span style='color:#808030; '>=</span> <span style='color:#008c00; '>512</span><span style='color:#800080; 'lda>;</span> shared_ <span style='color:#808030; '>=</span> CHUNKSIZE <span style='color:#808030; '>*</span> <span style='color:#800000; font-weight:bold; '>sizeof</span><span style='color:#808030; '>(</span>SimBody<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> <span style='color:#800000; font-weight:bold; '>while</span> <span style='color:#808030; '>(</span>chunks <span style='color:#808030; '>></span> <span style='color:#008c00; '>0</span><span style='color:#808030; '>)</span> <span style='color:#800080; '>{</span> BodyArray ba <span style='color:is the leading dimension of the #808030; '>=</span> <span style='color:#800080; '>{</span> <span style='color:#808030; '>&</span>arr<span style='color:#808030; '>.</span><span style='color:#603000; '>array specified by a. The leading dimension must be greater than zero. If transa is specified as </span><span style='color:#808030; '>[</span>index<span style='color:#808030; '>]</span><span style='color:#808030; '>,</span> CHUNKSIZE <span style='color:#800080; '>}</span><span style='color:#800080; '>;</span> SimCalc <span style='color:#808030; '><</span><span style='color:#808030; '><</span><span style='color:#808030; '><</span> numBlocks_<span style='color:#808030; '>,</span> numThreads_<span style='color:#808030; '>,</span> shared_ <span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='color:#808030; '>(</span>ba<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> cudaThreadSynchronize<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> SimTick <span style='color:#808030; '><</span><span style='color:#808030; '><</span><span style='Ncolor:#808030; ' or ><</span> numBlocks_<span style='ncolor:#808030; '>, the leading dimension must be greater than or equal to 1. If transa is specified as </span> numThreads_<span style='color:#808030; '>,</span> shared_ <span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='Tcolor:#808030; ' or >(</span>ba<span style='tcolor:#808030; '>, the leading dimension must be greater than or equal to the value specified in m</span> timeStep<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> cudaThreadSynchronize<span style='color:#808030; '>(</span><span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> index <span style='color:#808030; '>+</span><span style='color:#808030; '>=</span> CHUNKSIZE<span style='color:#800080; '>;</span> <span style='color:#808030; '>-</span><span style='color:#808030; '>-</span>chunks<span style='color:#800080; '>;</span> <span style='color:#800080; '>}</span> chunks <span style='color:#808030; '>=</span> arr<span style='color:#808030; '>.</span>size <span style='color:#808030; '>/</span> CHUNKSIZE <span style='color:#808030; '>+</span> <span style='color:#008c00; '>1</span><span style='color:#800080; '>;</span> index <span style='color:#808030; '>=</span> <span style='color:#008c00; '>0</span><span style='color:#800080; '>;</span></div>