Changes

Jump to: navigation, search

GPU621/Threadless Horsemen

1,807 bytes added, 11:02, 28 November 2018
Introduction: The Julia Programming Language
<div style="font-size: 1.300em; width: 85%">
== Introduction: The Julia Programming Language ==
 
[[File:Julia_lang_logo.png|300px]]
=== Bit of History ===
More Use Cases:
https://juliacomputing.com/case-studies/
 
* Julia computing was co-founded by the co-creators of Julia to provide support, consulting and other services to organizations using Julia
* The company raised $4.6M in seed funding last year (http://www.finsmes.com/2017/06/julia-computing-raises-4-6m-in-seed-funding.html)
== Julia's Forms of Parallelism ==
<source>
#Example
# @everywhere lets all processes be able to call the function
remotecall_fetch(whoami, 2)
remotecall_fetch(whoami, 4)
 
# remotecall_fetch is the same as fetch(remotecall(...))
</source>
Source: https://www.dursi.ca/post/julia-vs-chapel.html#parallel-primitives
* The reason loop interchange works for OpenMP is the way we store our array in memory originally favored "Row-Major" which allows the processor to move across cached data in a row fashion faster than column based... * As you might have seen, Julia’s Loop loop interchange is worseit's for opposite reason OpenMP improves from loop interchange. * [https://docs.julialang.org/en/v1/manual/performance-tips/index.html Julia favours "Column-Major" layouts in cache memory.] [[File:255px-Row_and_column_major_order.svg.png]] * Julia has several levels of runtime optimization (0-3)* julia -O2 scriptName.jl or julia --optimize=2 * Set the optimization level (default level is 2 if unspecified or 3 if used without a level -O)  == Vectorization == *We want to briefly touch on vectorization{| class="wikitable"|-! Using Vectorization! Expanded axpy function|-|<source>function axpy(a,x,y) @simd for i=1:length(x) @inbounds y[i] += a*x[i] endend n = 1003x = rand(Float32,n)y = rand(Float32,n)axpy(1.414f0, x, y)</source>|<source>function axpy(a::Float32, x::Array{Float32,1}, y::Array{Float32,1}) n=length(x) i = 1 @inbounds while i<=n t1 = x[i] t2 = y[i] t3 = a*t1[i] t4 = t2+t3 y[i] = t4 i += 1 endend</source>|}
* The @simd macro gives the compiler license to vectorize without checking whether it will change the program's visible behavior.
* The vectorized code will behave as if the code were written to operate on chunks of the arrays.
* @inbounds turns off subscript checking that might throw an exception.
* Make sure your subscripts are in bounds before using it or you might corrupt your Julia session.
* recap loop interchange benefits for openmp (locality of reference)* discuss [https://software.intel.com/en-us/articles/vectorization-in-julia storing arrays as column major, loop interchange was worse for julia* discuss different levels of optimizationMore info on vectorization in Julia]
== Conclusion ==
93
edits

Navigation menu