100
edits
Changes
→Visual Studio 2017 and CUDA 9.1 Problem
{{GPU610/DPS915 Index | 2012320171}}=Student Resources=
The purpose of this page is to share useful information that can help groups with their CUDA projects.
=CUDA Enabled Cards =BLAS Documentation --- Please do not remove the reference tags==[http://en.wikipedia.org/wiki/CUDA#Supported_GPUs List @ CUDA Wiki]
===segmv=Getting Started on Mac==void '''cblas_sgemv''' (''const enum CBLAS_ORDER '''order''', const enum CBLAS_TRANSPOSE '''TransA''', const int '''M''', const int '''N''', const float '''alpha''', const float * '''A''', const int '''lda''', const float * '''x''', const int '''incx''', const float '''beta''', float * '''y''', const int '''incy''''')<ref>http://wwwdeveloper.gnudownload.org/softwarenvidia.com/gslcompute/manualDevZone/html_nodedocs/Level-2-CBLAS-Functions.html</ref><ref>http://www.prism.gatech.edu/~ndantam3/cblas-docC/doc/html/cblas_8h.html#23ac27150577c29a7ad4ddb427f255f7</ref><ref>http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibmCUDA_Getting_Started_Mac.xlcpp8a.doc%2Fproguide%2Fref%2Fblaslib.htm</ref>pdf
After following the instructions,,provided in today'''''beta'''''s lecture, to setting up the library and include files in the project properties to run Cuda on VS 2012 Express at home, I still encounterthe linker error; "unable to find cuda_runtime.h". Googling around, there are two ways around this. By default, VS Studio uses the 32bit debugger, which you can change in project properties. You will have to use the Win32 version of the library directives (ie in my case "C:is \Program Files\NVIDIA Corporation\NvToolsExt\lib\Win32") with the default debugger. If use the x64 library files, change the scaling constant for vector ydebugger to 64bit (which I neglected and lost a good portion of time). Cheers.
===segmm=SVGALIBS - Graphics Library ==void '''cblas_sgemm''' This library is a Linux graphics library and thus will not work on windows (''const enum CBLAS_ORDER '''Order''', const enum CBLAS_TRANSPOSE '''TransA''', const enum CBLAS_TRANSPOSE '''TransB''', const int '''M''', const int '''N''', const int '''K''', const float '''alpha''', const float * '''A''', const int '''lda''', const float * '''B''', const int '''ldb''', const float '''beta''', float * '''C''', const int '''ldc''''')<ref>http:I have tried very briefly on finding a way but could not for the reason that Windows does not have X11/xorgs/wwwlinux tty devices).gnu.org/software/gsl/manual/html_node/Level-3-CBLAS-Functions.html</ref><ref>http://www.prism.gatech.edu/~ndantam3/cblas-doc/doc/html/cblas_8h.html#7d42dfcb6073c56391fee28494809cc5</ref><ref>http://publibThe program needs to be run on a Linux machine because it is using svgalibs which is an archaic way to display stuff on the linux screen (from quick google search on the svga library).boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Fproguide%2Fref%2Fblaslib.htm</ref>
== Dynamically Allocated Shared Memory ==Here is a roundabout way of working around the shared memory limitations of your graphics card.The idea is to send in chunks that your kernel can handle, then keep on sending chunks until there are none to be sent. The address being sent is also being shifted based on the chunk size.<div style='color:#000000;background:#ffffff;'> CHUNKSIZE <span style='color:#808030; '>=</span> <span style='color:#008c00; '>512</span><span style='color:#800080; '>;</span> shared_ <span style='color:#808030; '>=</span> CHUNKSIZE <span style='color:#808030; '>*</span> <span style='color:#800000; font-weight:bold; '>sizeof</span><span style='color:#808030; '>(</span>SimBody<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> <span style='color:#800000; font-weight:bold; '>while</span> <span style='color:#808030; '>(</span>chunks <span style='color:#808030; '>></span> <span style='color:#008c00; '>0</span><span style='color:#808030; '>)</span> <span style='color:#800080; '>{</span> BodyArray ba <span style='color:#808030; '>=</span> <span style='color:#800080; '>{</span> <span style='color:#808030; '>&</span>arr<span style='color:#808030; '>.</span><span style='color:#603000; '>array</span><span style='color:#808030; '>[</span>index<span style='color:#808030; '>]</span><span style='color:#808030; '>,</span> CHUNKSIZE <span style='color:#800080; '>}</span><span style='color:#800080; '>;</span> SimCalc <span style='color:#808030; '><</span><span style='color:#808030; '><</span><span style='color:#808030; '><</span> numBlocks_<span style='color:#808030; '>,</span> numThreads_<span style='color:#808030; '>,</span> shared_ <span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='Ccolor:#808030; '>(</span>ba<span style='color:#808030; '>)</span><span style='color:#800080; '>;</span> cudaThreadSynchronize<span style='color:#808030; '>(</span><span style='color:is the output matrix c of float #808030; '>)</span><span style='color:#800080; '>;</span> SimTick <span style='color:#808030; '><</span><span style='color:#808030; '><</span><span style='color:#808030; '><</span> numBlocks_<span style='color:#808030; '>,</span> numThreads_<span style='color:#808030; '>,</span> shared_ <span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='color:#808030; '>></span><span style='color:#808030; '>(for sgemm</span>ba<span style='color:#808030; '>,</span> timeStep<span style='color:#808030; '>) or double </span><span style='color:#800080; '>;</span> cudaThreadSynchronize<span style='color:#808030; '>(for dgemm</span><span style='color:#808030; '>) values</span><span style='color:#800080; '>;</span> index <span style='color:#808030; '>+</span><span style='color:#808030; '>=</span> CHUNKSIZE<span style='color:#800080; '>;</span> <span style='color:#808030; '>-</span><span style='color:#808030; '>-</span>chunks<span style='color:#800080; '>;</span> <span style='color:#800080; '>}</span> chunks <span style='color:#808030; '>=</span> arr<span style='color:#808030; '>.</span>size <span style='color:#808030; '>/</span> CHUNKSIZE <span style='color:#808030; '>+</span> <span style='color:#008c00; '>1</span><span style='color:#800080; '>;</span> index <span style='color:#808030; '>=</span> <span style='color:#008c00; '>0</span><span style='color:#800080; '>;</span></div>
void gputrack_self_ ( \
int *direction, \
int *nrep, \
float *yp, \
float *xp, \
int *turnnow, \
int *sizeofarrays, \
float *dphi, \
float *denergy, \
float *c1, \
float *c2, \
float *dEbin, \
float *dtbin, \
float *h, \
float *hratio, \
float *omegarev0, \
float *phi0, \
float *phi12, \
float *q, \
float *tatturn, \
float *VRF1, \
float *VRF1dot, \
float *VRF2, \
float *VRF2dot, \
float *xorigin, \
float *yat0, \
int *p, \
int *dturns, \
float *phiwrap, \
float *selfvolt, \
int *vselfDimRow, \
int *vselfDimCol, \
float *vself \
)
{
/* Local Variables */
int l,i,mm,t;
l = *sizeofarrays;
t = *turnnow;
// longtrack_self specific local variables
int cp;
cp = *p;
/* dphi=(xp+xorigin)*h*omegarev0(turnnow)*dtbin-phi0(turnnow) */
for(mm = 0; mm < l; mm++) {
dphi[mm] = (xp[mm] + *xorigin) * *h * omegarev0[t] * *dtbin - phi0[t];
}
/* denergy=(yp-yat0)*dEbin */
for(mm = 0; mm < l; mm++) {
denergy[mm] = (yp[mm] - *yat0) * *dEbin;
}
/* IF (direction.GT.0) THEN */
if (*direction > 0) {
/* p=turnnow/dturns+1 */
cp = t / *dturns + 1;
/* DO i=1,nrep */
for(i = 1; i <= *nrep; i++) {
/* forall(mm=1:size(xp)) dphi(mm)=dphi(mm)-c1(turnnow)*denergy(mm) */
for(mm=0;mm<l;mm++) {
dphi[mm] = dphi[mm] - c1[t] *denergy[mm];
}
/* turnnow=turnnow+1 */
t=t+1;
/* forall(mm=1:size(xp)) xp(mm)=dphi(mm)+phi0(turnnow)-&
xorigin*h*omegarev0(turnnow)*dtbin */
for(mm=0;mm<l;mm++) {
xp[mm] = dphi[mm] + phi0[t] - \
*xorigin * *h * omegarev0[t] * *dtbin;
}
/* forall(mm=1:size(xp)) xp(mm)=(xp(mm)-&
phiwrap*FLOOR(xp(mm)/phiwrap))/(h*omegarev0(turnnow)*dtbin) */
for(mm = 0; mm < l; mm++) {
xp[mm] = (xp[mm] - \
*phiwrap * floor(xp[mm] / *phiwrap)) / (*h * omegarev0[t] * *dtbin);
}
/* forall(mm=1:size(xp)) selfvolt(mm)=vself(p,FLOOR(xp(mm))+1) */
for(mm = 0; mm < l; mm++) {
int itemp = floor(xp[mm]);
selfvolt[mm] = vself[(*vselfDimRow * (itemp)) + (cp-1)];
}
/* forall(mm=1:size(xp)) denergy(mm)=denergy(mm)+q*((&
(VRF1+VRF1dot*tatturn(turnnow))*SIN(dphi(mm)+phi0(turnnow))+&
(VRF2+VRF2dot*tatturn(turnnow))*&
SIN(hratio*(dphi(mm)+phi0(turnnow)-phi12)))+selfvolt(mm))-c2(turnnow) */
for(mm = 0; mm < l; mm++) {
denergy[mm] = denergy[mm] + *q *(( \
(*VRF1 + *VRF1dot * tatturn[t]) * sin(dphi[mm] + phi0[t]) + \
(*VRF2 + *VRF2dot * tatturn[t]) * \
sin(*hratio * (dphi[mm] + phi0[t] - *phi12))) + selfvolt[mm]) -c2[t];
}
/* END DO */
}
}
else {
// p=turnnow/dturns
cp = t / *dturns;
// DO i=1,nrep
for (i=1;i<=*nrep;i++) {
// forall(mm=1:size(xp)) selfvolt(mm)=vself(p,FLOOR(xp(mm))+1)
for(mm = 0; mm < l; mm++) {
int itemp = (int)floor(xp[mm]);
selfvolt[mm] = vself[(*vselfDimRow*(itemp)) + (cp-1)];
}
/* forall(mm=1:size(xp)) denergy(mm)=denergy(mm)-q*((&
(VRF1+VRF1dot*tatturn(turnnow))*SIN(dphi(mm)+phi0(turnnow))+&
(VRF2+VRF2dot*tatturn(turnnow))*&
SIN(hratio*(dphi(mm)+phi0(turnnow)-phi12)))+selfvolt(mm))+c2(turnnow) */
for(mm = 0; mm < l; mm++) {
denergy[mm]=denergy[mm] - *q *(( \
(*VRF1 + *VRF1dot * tatturn[t]) *sin(dphi[mm] + phi0[t]) + \
(*VRF2 + *VRF2dot * tatturn[t]) * \
sin(*hratio * (dphi[mm] + phi0[t] - *phi12))) + selfvolt[mm]) + c2[t];
}
// turnnow=turnnow-1
t--;
/* forall(mm=1:size(xp)) dphi(mm)=dphi(mm)-c1(turnnow)*denergy(mm) */
for(mm = 0; mm < l; mm++) {
dphi[mm]=dphi[mm] + c1[t] * denergy[mm];
}
/* forall(mm=1:size(xp)) xp(mm)=dphi(mm)+phi0(turnnow)-&
xorigin*h*omegarev0(turnnow)*dtbin */
for(mm = 0; mm < l; mm++) {
xp[mm] = dphi[mm] + phi0[t] - \
*xorigin * *h * omegarev0[t] * *dtbin;
}
/* forall(mm=1:size(xp)) xp(mm)=(xp(mm)-&
phiwrap*FLOOR(xp(mm)/phiwrap))/(h*omegarev0(turnnow)*dtbin) */
for(mm = 0; mm < l; mm++) {
xp[mm] = (xp[mm] - \
*phiwrap * floor(xp[mm] / *phiwrap)) / (*h * omegarev0[t] * *dtbin);
}
}
}
// yp=denergy/dEbin+yat0
for(mm=0; mm<l; mm++) {
yp[mm] = denergy[mm] / *dEbin + *yat0;
}
*turnnow ==References==t; return;}</pre>
= Visual Studio 2017 and CUDA 9.1 Problem =I ran into this problem when trying to build '''thrust_sort.cu''' in the Thrust lecture. The only way I was able to build and run successfully was to create a '''CUDA 9.1 project'''. However, in the current version of Visual Studio 2017, unless you set the '''Platform Toolset''' to '''Visual Studio 2015 (v140)''', you will not be able to build and run CUDA 9.1 projects. This can be done by going to project properties, then to the General section, then changing the '''Platform Toolset'''. However, this is where I ran into a problem where Visual Studio would display an error and would not let me change the platform toolset. So I came up with the following workaround and it works:*If you haven't already done so, install the optional '''Visual Studio 2015 (v140)''' component which is available from the Visual Studio 2017 installer.*From Visual Studio, create a CUDA 9.1 project, then close the solution.*Using a text editor, open <project name>.vcxproj*Add the following as the first element in the XML under the '''Project''' tag: <nowiki><PropertyGroup> <CUDAPropsPath Condition="'$(CUDAPropsPath)'==''">$(VCTargetsPath)\BuildCustomizations</CUDAPropsPath> </PropertyGroup></nowiki>*Replace all occurrences (there are 2 of them) of v141 with v140.*Search for "CUDA 9.1" (you will find 2 occurrences). Then replace the first entire line with <code><nowiki><Import Project="$(CUDAPropsPath)\CUDA 9.1.props" /></nowiki></code> and the second entire line with <code><nowiki><Import Project="$(CUDAPropsPath)\CUDA 9.1.targets" /></nowiki><references/code>.*Close the file in the text editor then re-open the solution in Visual Studio. You should now be able to add your .cu files, build and run.