Revision as of 15:10, 3 December 2014

Selected Program

Ray Tracer

Team Members

Progress

Assignment 1

Bruno

For assignment 1 I looked into finding a simple Ray Tracer that could be easily understood by someone with no image processing background and benefit from parallelization.

I found ray tracer that matched the criteria I was looking for at http://scratchapixel.com/assets/Uploads/Lesson001/Source%20Code/raytracer.cpp

Looking into Big O notation I believe this program falls under [f(n) = n ^ 2]

I profiled the ray tracer by modifying trace depths, as seen below:

Depth of 70

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total
time   seconds   seconds    calls  ms/call  ms/call  name
99.99    517.97   517.97   307200     1.69     1.69  Vec3<float> trace<float>(Vec3<float> const&, Vec3<float> const&, std::vector<Sphere<float>*, std::allocator<Sphere<float>*> > const&, int const&)
 0.01    518.00     0.03                             void render<float>(std::vector<Sphere<float>*, std::allocator<Sphere<float>*> > const&)
 0.00    518.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_main

The program spends nearly 100% of all processing with in its Vec3 trace method, this is a recursive method.



Vec3<T> trace(const Vec3<T> &rayorig, const Vec3<T> &raydir,const std::vector<Sphere<T> *> &spheres, const int &depth)

.......
.......

	if ((sphere->transparency > 0 || sphere->reflection > 0) && depth < MAX_RAY_DEPTH) {

		T facingratio = -raydir.dot(nhit);

		// change the mix value to tweak the effect

		T fresneleffect = mix<T>(pow(1 - facingratio, 3), 1, 0.1);

		Vec3<T> refldir = raydir - nhit * 2 * raydir.dot(nhit);

		refldir.normalize();

		Vec3<T> reflection = trace(phit + nhit * bias, refldir, spheres, depth + 1);

		Vec3<T> refraction = 0;

		if (sphere->transparency) {

			T ior = 1.1, eta = (inside) ? ior : 1 / ior; 

			T cosi = -nhit.dot(raydir);

			T k = 1 - eta * eta * (1 - cosi * cosi);

			Vec3<T> refrdir = raydir * eta + nhit * (eta *  cosi - sqrt(k));

			refrdir.normalize();

			refraction = trace(phit - nhit * bias, refrdir, spheres, depth + 1);

		}

		surfaceColor = (reflection * fresneleffect + refraction * (1 - fresneleffect) * sphere->transparency) * sphere->surfaceColor;

	}
...
...

void render(const std::vector<Sphere<T> *> &spheres)

{
	unsigned width = 640, height = 480;

	Vec3<T> *image = new Vec3<T>[width * height], *pixel = image;

	T invWidth = 1 / T(width), invHeight = 1 / T(height);

	T fov = 30, aspectratio = width / T(height);

	T angle = tan(M_PI * 0.5 * fov / T(180));

	// Trace rays

	for (unsigned y = 0; y < height; ++y) {

		for (unsigned x = 0; x < width; ++x, ++pixel) {

			T xx = (2 * ((x + 0.5) * invWidth) - 1) * angle * aspectratio;

			T yy = (1 - 2 * ((y + 0.5) * invHeight)) * angle;

			Vec3<T> raydir(xx, yy, -1);

			raydir.normalize();

			*pixel = trace(Vec3<T>(0), raydir, spheres, 0);

		}
	}

Michael

I have picked the Monte Carlo simulation for this assignment. Source code was from here and was modified to take and argument as the number of iterations and factored out the function which is going to be used for parallelzation.

Runtime for this program O(N) and the results are as follows:

N Iterations	Time (seconds)
1,000,000	0.02
5,000,000	0.24
10,000,000	0.42
50,000,000	2.22
100,000,000	4.75
500,000,000	19.97
1,000,000,000	44.5
5,000,000,000	94.08

Due to the simplicity of the program, all of the time spent was on the calc function below, as iterations increased, the time it takes increases at a linear rate.

Function to parallelize:

void calc(int iterations, int* count){
    double x, y, z;
    for (int i=0;i<iterations;i++){
        x = (double)rand()/RAND_MAX;
        y = (double)rand()/RAND_MAX;
        z = x*x+y*y;
        if (z<=1){
            (*count)++;
        }
    }
}

Assignment 2

Assignment 3

What is a Ray Tracer?

A program that performs ray tracing, which is a technique for creating images by tracing light paths (rays) through all the pixels in the image and simulating the effects light would have based on the objects the light (rays) encounters.

We decided this would be a problem that could benefit greatly from parallel processing, and it has, by over 350 times speed up.

Optimizations Used

__device__ __host__ : we used this so that parts of classes Vec3 and Sphere could be accessed by both the host and device were necessary.

__device__ __forceinline__ : because the program uses various loops and recursion we force the compiler to use inline functions to speed up the trace and mix functions as well as some methods in the Vec3 and Sphere class.

sqrtf, tanf, fmaxf : where std:: was being used we replaced it with CUDA's math library equivalents although gains were marginal from this.

shared memory : we implemented shared memory but quickly realized that it was actually slower then sticking to global memory, we believe this has to do with the number of times the array has to be copied into shared memory.

- We also needed to rework a few parts of code in order to be parallelized

@@ Line 159: / Line 159: @@
 === Assignment 2 ===
 === Assignment 3 ===
+====What is a Ray Tracer?====
+A program that performs ray tracing, which is a technique for creating images by tracing light paths (rays) through all the pixels in the image and simulating the effects light would have based on the objects the light (rays) encounters.
+We decided this would be a problem that could benefit greatly from parallel processing, and it has, by over 350 times speed up.
+[[File:A3.jpg]]
+====Optimizations Used====
+'''__device__ __host__''' : we used this so that parts of classes Vec3 and Sphere could be accessed by both the host and device were necessary.
+'''__device__ __forceinline__''' : because the program uses various loops and recursion we force the compiler to use inline functions to speed up the trace and mix functions as well as some methods in the Vec3 and Sphere class.
+'''sqrtf, tanf, fmaxf''' : where std:: was being used we replaced it with CUDA's math library equivalents although gains were marginal from this.
+'''shared memory''' : we implemented shared memory but quickly realized that it was actually slower then sticking to global memory, we believe this has to do with the number of times the array has to be copied into shared memory.
+**We also needed to rework a few parts of code in order to be parallelized

Difference between revisions of "Skynet/GPU610"

Revision as of 15:10, 3 December 2014

Contents

Selected Program

Team Members

Progress

Assignment 1

Bruno

Michael

Assignment 2

Assignment 3

What is a Ray Tracer?

Optimizations Used

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools