In this article by Malcolm Sherrington, author of the book Mastering Julia, we will see why write a book on Julia when the language is not yet reached the version v1.0 stage? It was the first question which needed to be addressed when deciding on the contents and philosophy behind the book.

(For more resources related to this topic, see here.)

Julia at the time as v0.2, it is now soon to achieve a stable v0.4 but already the blueprint for v0.5 is being touted. There were some common misconceptions which I wished to address:

It is a language designed for Geeks
It's main attribute, possibly only, was its speed
It was a scientific language primarily a MATLAB clone
It is not as easy to use compared with the alternatives such Python and R
There are not enough library support to tackle Enterprise Solutions

In fact none of these apply to Julia.

True it is a relatively young programming language. The initial design work on Julia project began at MIT in August 2009, by February 2012 it became open source. It is largely the work of three developers Stefan Karpinski, Jeff Bezanson, and Viral Shah.

Initially Julia was envisaged by the designers as a scientific language sufficiently rapid to make the necessity of modeling in an interactive language and subsequently having to redevelop in a compiled language, such as C or Fortran. To achieve this, Julia code would need to be transformed to the underlying machine code of the computer but using the low level virtual machine (LLVM) compilation system, at the time itself a new project.

This was a masterstroke. LLVM is now the basis of a variety of systems, the Apple C compiler (clang) uses it, Google V8 JavaScript engine and Mozilla's Rust language use it and Python is attempting to achieve significant increases in speed with its numba module. In Julia LLVM always works, there are no exceptions because it has to.

When launched possibly the community itself saw Julia as a replacement for MATLAB but that proved not to be just case. The syntax of Julia is similar to MATLAB, so much so that anyone competent in the latter can easily learn Julia but, it is a much richer language with many significant differences. The task of the book was to focus on these. In particular my target audience was the data scientist and programmer analyst but have sufficient for the "jobbing" C++ and Java programmer.

Julia's features

The Julia programming language is free and open source (MIT licensed) and the source is available on GitHub.

To the veteran programmer it has a look and feel similar to MATLAB. Blocks created by for, while, and if statements are all terminated by end rather than by endfor, endwhile, and endif or using the familiar {} style syntax. However it is not a MATLAB clone and sources written for MATLAB will not run on Julia.

The following are some of the Julia's features:

Designed for parallelism and distributed computation (multicore and cluster)
C functions called directly (no wrappers or special APIs needed)
Powerful shell-like capabilities for managing other processes
Lisp-like macros and other metaprogramming facilities
User-defined types are as fast and compact as built-ins
LLVM-based, just-in-time (JIT) compiler that allows Julia to approach and often match the performance of C/C++
An extensive mathematical function library (written in Julia)
Integrated mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing

Julia's core is implemented in C and C++, its parser in Scheme, and the LLVM compiler framework is used for JIT generation of machine code.

The standard library is written in Julia itself using the Node.js's libuv library for efficient, cross-platform I/O.

Julia has a rich language of types for constructing and describing objects that can also optionally be used to make type declarations. The ability to define function behavior across many combinations of argument types via multiple dispatch which is a key cornerstone of the language design.

Julia can utilize code in other programming languages by a directly calling routines written in C or Fortran and stored in shared libraries or DLLs. This is a feature of the language syntax.

In addition it is possible to interact with Python via the PyCall and this is used in the implementation of the IJulia programming environment.

A quick look at some Julia

To get feel for programming in Julia let's look at an example which uses random numbers to price an Asian derivative on the options market.

A share option is the right to purchase a specific stock at a nominated price sometime in the future. The person granting the option is called the grantor and the person who has the benefit of the option is the beneficiary. At the time the option matures the beneficiary may choose to exercise the option if it is in his/her interest the grantor is then obliged to complete the contract.

The following snippet is part of the calculation and computes a single trial and uses the Winston package to display the trajectory:

using Winston;
S0 = 100;     # Spot price
K   = 102;     # Strike price
r   = 0.05;     # Risk free rate
q   = 0.0;      # Dividend yield
v   = 0.2;     # Volatility
tma = 0.25;     # Time to maturity
T = 100;       # Number of time steps
dt = tma/T;     # Time increment
S = zeros(Float64,T)
S[1] = S0;
dW = randn(T)*sqrt(dt);
[ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] + 0.5*v*v*dW[t]*dW[t]) for t=2:T ]
 
x = linspace(1, T, length(T));
p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%")
add(p, Curve(x,S,color="red"))
display(p)

Plot one track so only compute a vector S of T elements.
The stochastic variance dW is computed in a single vectorized statement.
The track S is computed using a "list comprehension".
The array x is created using linspace to define a linear absicca for the plot.
Using the Winston package to produce the display, which only requires 3 statements: to define the plot space, add a curve to it and display the plot as shown in the following figure:

Generating Julia sets

Both the Mandelbrot set and Julia set (for a given constant z0) are the sets of all z (complex number) for which the iteration z → z*z + z0 does not diverge to infinity. The Mandelbrot set is those z0 for which the Julia set is connected.

We create a file jset.jl and its contents defines the function to generate a Julia set.

functionjuliaset(z, z0, nmax::Int64)
for n = 1:nmax
if abs(z) > 2 (return n-1) end
z = z^2 + z0
end
returnnmax
end

Here z and z0 are complex values and nmax is the number of trials to make before returning. If the modulus of the complex number z gets above 2 then it can be shown that it will increase without limit.

The function returns the number of iterations until the modulus test succeeds or else nmax.

Also we will write a second file pgmfile.jl to handling displaying the Julia set.

functioncreate_pgmfile(img, outf::String)
s = open(outf, "w")
write(s, "P5n")
n, m = size(img)
write(s, "$m $n 255n")
for i=1:n, j=1:m
   p = img[i,j]
   write(s, uint8(p))
end
close(s)
end

It is quite easy to create a simple disk file using the portable bitmap (netpbm) format. This consists of a "magic" number P1 - P6, followed on the next line the image height, width and a maximum color value which must be greater than 0 and less than 65536; all of these are ASCII values not binary.

Then follows the image values (height x width) which make be ASCII for P1, P2, P3 or binary for P4, P5, P6. There are three different types of portable bitmap; B/W (P1/P4), Grayscale (P2/P5), and Colour (P3/P6).

The function create_pgmfile() creates a binary grayscale file (magic number = P5) from an image matrix where the values are written as Uint8. Notice that the for loop defines the indices i, j in a single statement with correspondingly only one end statement. The image matrix is output in column order which matches the way it is stored in Julia.

So the main program looks like:

include("jset.jl")
include("pgmfile.jl")
h = 400; w = 800; M = Array(Int64, h, w);
c0 = -0.8+0.16im;
pgm_name = "juliaset.pgm";
 
t0 = time();
for y=1:h, x=1:w
c = complex((x-w/2)/(w/2), (y-h/2)/(w/2))
M[y,x] = juliaset(c, c0, 256)
end
t1 = time();
create_pgmfile(M, pgm_name);
print("Written $pgm_namenFinished in $(t1-t0) seconds.n");

This is how the previous code works:

We define an array N of type Int64 to hold the return values from the juliaset function.
The constant c0 is arbitrary, different values of c0 will produce different Julia sets.
The starting complex number is constructed from the (x,y) coordinates and scaled to the half width.
We 'cheat' a little by defining the maximum number of iterations as 256. Because we are writing byte values (Uint8) and value which remains bounded will be 256 and since overflow values wrap around will be output as 0 (black).

The script defines a main program in a function jmain():

julia>jmain
Written juliaset.pgm
Finished in 0.458 seconds # => (on my laptop)

more-about-julia-img-1

Julia type system and multiple dispatch

Julia is not an object-oriented language so when we speak of object they are a different sort of data structure to those in traditional O-O languages.

Julia does not allow types to have methods or so it is not possible to create subtypes which inherit methods. While this might seem restrictive it does permit methods to use a multiple dispatch call structure rather than the single dispatch system employed in object orientated ones.

Coupled with Julia's system of types, multiple dispatch is extremely powerful. Moreover it is a more logical approach for data scientists and scientific programmers and if for no other reason exposing this to you the analyst/programmer is a reason to use Julia.

A function is an object that maps a tuple of arguments to a return value. In the case where the arguments are not valid the function should handle the situation cleanly by catching the error and handling it or throw an exception.

When a function is applied to its argument tuple it selects the appropriate method and this process is called dispatch. In traditional object-oriented languages the method chosen is based only on the object type and this paradigm is termed single-dispatch. With Julia the combination of all a functions argument determine which method is chosen, this is the basis of multiple dispatch.

To the scientific programmer this all seems very natural. It makes little sense in most circumstances for one argument to be more important than the others. In Julia all functions and operators, which are also functions, use multiple dispatch. The methods chosen for any combination of operators.

For example looking at the methods of the power operator (^):

julia> methods(^)
# 43 methods for generic function "^":
^(x::Bool,y::Bool) at bool.jl:41
^(x::BigInt,y::Bool) at gmp.jl:314
^(x::Integer,y::Bool) at bool.jl:42
 
 
^(A::Array{T,2},p::Number) at linalg/dense.jl:180
^(::MathConst{:e},x::AbstractArray{T,2}) at constants.jl:87

We can see that there are 43 methods for ^ and the file and line where the methods is defined are given too. Because any untyped argument is designed as type Any, it is possible to define a set of function methods such that there is no unique most specific method applicable to some combinations of arguments.

julia> pow(a,b::Int64) = a^b;
julia> pow(a::Float64,b) = a^b;
Warning: New definition
pow(Float64,Any) at /Applications/JuliaStudio.app/Contents/Resources/Console/Console.jl:1
is ambiguous with:
pow(Any,Int64) at /Applications/JuliaStudio.app/Contents/Resources/Console/Console.jl:1.
To fix, define pow(Float64,Int64) before the new definition.

A call of pow(3.5, 2) can be handled by either function. In this case they will give the same result by only because of the function bodies and Julia can't know that.

Working with Python

The ability for Julia with call code written in other languages is one of its main strengths. From its inception Julia had to play "catchup" and a key feature was it makes calling code written in C, and by implication Fortran, very easy. The code to be called must be available as a shared library rather than just a standalone object file. There is a zero-overhead in the call, meaning that it is reduced to a single machine instruction in the LLVM compilation.

In addition Python models can be accessed via the PyCall package which provides a @pyimport macro that mimics a Python import statement. This imports a Python module and provides Julia wrappers for all of the functions and constants including automatic conversion of types between Julia and Python. This work has led to the creation of an IJulia kernel to the IPython IDE which now is a principle component of the Jupyter project.

In Pycall, type conversions are automatically performed for numeric, boolean, string, IO streams plus all tuples, arrays and dictionaries of these types.

julia> using PyCall
julia> @pyimport scipy.optimize as so
julia> @pyimport scipy.integrate as si
julia> so.ridder(x -> x*cos(x), 1, pi); # => 1.570796326795
julia> si.quad(x -> x*sin(x), 1, pi)[1]; # => 2.840423974650

In the preceding commands, the Python optimize and integrate modules are imported and functions in those modules called from the Julia REPL.

One difference imposed on the package is that calls using the Python object notation are not possible from Julia, so these are referenced using an array-style notation po[:atr] rather po.atr, where po is a PyObject and atr is an attribute.

It is also easy to use the Python matplotlib module to display simple (and complex) graphs.

@pyimport matplotlib.pyplot as plt
x = linspace(0,2*pi,1000); y = sin(3*x + 4*cos(2*x));
plt.plot(x, y, color="red", linewidth=2.0, linestyle="--")
1-element Array{Any,1}:
PyObject<matplotlib.lines.Line2D object at 0x0000000027652358>
plt.show()

Notice that keywords can also be passed such as the color, line width and the preceding style.

more-about-julia-img-2

Simple statistics using dataframes

Julia implements various approaches for handling data held on disk. These may be 'normal' files such as text files, CSV and other delimited files, or in SQL and NoSQL databases. Also there is an implementation of dataframe support similar to that provided in R and via the pandas module in Python.

The following looks at the Apple share prices from 2000 to 200, using a CSV file with provides opening, closing, high and low prices together with trading volumes over the day.

using DataFrames, StatsBase
aapl = readtable("AAPL-Short.csv");
 
naapl = size(aapl)[1]
m1 = int(mean((aapl[:Volume]))); # => 6306547

The data is read into a DataFrame and we can estimate the mean (m1). For the volume, it is possible to cast it as an integer as this makes more sense. We can do this by creating a weighting vector.

using Match
wts = zeros(naapl);
for i in 1:naapl
   dt = aapl[:Date][i]
   wts[i] = @match dt begin
         r"^2000" => 1.0
         r"^2001" => 2.0
         r"^2002" => 4.0
         _       => 0.0
   end
end;
 
wv = WeightVec(wts);
m2 = int(mean(aapl[:Volume], wv)); # => 6012863

Computing weighted statistical metrics it is possible to 'trim' off the outliers from each end of the data.

Returning to the closing prices:

mean(aapl[:Close]);         # => 37.1255
mean(aapl[:Close], wv);     # => 26.9944
trimmean(aapl[:Close], 0.1); # => 34.3951

trimean() is the trimmed mean where 5 percent is taken from each end.

std(aapl[:Close]);           # => 34.1186
skewness(aapl[:Close])       # => 1.6911
kurtosis(aapl[:Close])       # => 1.3820

As well as second moments such as standard deviation, StatsBase provides a generic moments() function and specific instances based on these such as for skewness (third) and kurtosis (fourth).

It is also possible to provide some summary statistics:

summarystats(aapl[:Close])
 
Summary Stats:
Mean:         37.125505
Minimum:     13.590000
1st Quartile: 17.735000
Median:       21.490000
3rd Quartile: 31.615000
Maximum:     144.190000

The first and third quartiles related to the 25 percent and 75 percent percentiles for a finer granularity we can use the percentile() function.

percentile(aapl[:Close],5); # => 14.4855
percentile(aapl[:Close],95); # => 118.934

MySQL access using PyCall

We have seen previously that Python can be used for plotting via the PyPlot package which interfaces with matplotlib. In fact the ability to easily call Python modules is a very powerful feature in Julia and we can use this as an alternative method for connecting to databases.

Any database which can be manipulated by Python is also available to Julia. In particular since the DBD driver for MySQL is not fully DBT compliant, let's look this approach to running some queries.

Our current MySQL setup already has the Chinook dataset loaded some we will execute a query to list the Genre table.

In Python we will first need to download the MySQL Connector module.

For Anaconda this needs to be using the source (independent) distribution, rather than a binary package and the installation performed using the setup.py file.

The query (in Python) to list the Genre table would be:

import mysql.connector as mc
cnx = mc.connect(user="malcolm", password="mypasswd")
csr = cnx.cursor()
qry = "SELECT * FROM Chinook.Genre"
csr.execute(qry)
for vals in csr:
   print(vals)
 
(1, u'Rock')
(2, u'Jazz')
(3, u'Metal')
(4, u'Alternative & Punk')
(5, u'Rock And Roll')
...
...
csr.close()
cnx.close()

We can execute the same in Julia by using the PyCall to the mysql.connector module and the form of the coding is remarkably similar:

using PyCall
@pyimport mysql.connector as mc
 
cnx = mc.connect (user="malcolm", password="mypasswd");
csr = cnx[:cursor]()
query = "SELECT * FROM Chinook.Genre"
csr[:execute](query)
 
for vals in csr
   id   = vals[1]
   genre = vals[2]
   @printf “ID: %2d, %sn” id genre
end
ID: 1, Rock
ID: 2, Jazz
ID: 3, Metal
ID: 4, Alternative & Punk
ID: 5, Rock And Roll
...
...
csr[:close]()
cnx[:close]()

Note that the form of the call is a little different from the corresponding Python method, since Julia is not object-oriented the methods for a Python object are constructed as an array of symbols.

For example the Python csr.execute(qry) routine is called in Julia as csr[:execute](qry).

Also be aware that although Python arrays are zero-based this is translated to one-based by PyCall, so the first values is referenced as vals[1].

Scientific programming with Julia

Julia was originally developed as a replacement for MATLAB with a focus on scientific programming. There are modules which are concerned with linear algebra, signal processing, mathematical calculus, optimization problems, and stochastic simulation.

The following is a subject dear to my heart: the solution of differential equations.

Differential equations are those which involve terms which involve the rates of change of variates as well as the variates themselves. They arise naturally in a number of fields, notably dynamics and when the changes are with respect to one dependent variable, often time, the systems are called ordinary differential equations. If more than a single dependent variable is involved, then they are termed partial differential equations.

Julia supports the solution of ordinary differential equations thorough a couple of packages ODE and Sundials. The former (ODE) consists of routines written solely in Julia whereas Sundials is a wrapper package around a shared library.

ODE exports a set of adaptive solvers; adaptive meaning that the 'step' size of the algorithm changes algorithmically to reduce the error estimate to be below a certain threshold.

The calls take the form odeXY, where X is the order of the solver and Y the error control.

ode23: Third order adaptive solver with third order error control
ode45: Fourth order adaptive solver with fifth order error control
ode78: Seventh order adaptive solver with eighth order error control

To solve the explicit ODE defined as a vectorize set of equations dy/dt = F(t,y), all routines of which have the same basic form: (tout, yout) = odeXY(F, y0, tspan).

As an example, I will look at it as a linear three-species food chain model where the lowest-level prey x is preyed upon by a mid-level species y, which, in turn, is preyed upon by a top level predator z. This is an extension of the Lotka-Volterra system from to three species.

Examples might be three-species ecosystems such as mouse-snake-owl, vegetation-rabbits-foxes, and worm-sparrow-falcon.

x' = a*x − b*x*y
y' = −c*y + d*x*y − e*y*z
z' = −f*z + g*y*z #for a,b,c,d,e,f g > 0

Where a, b, c, d are in the two-species Lotka-Volterra equations:

e represents the effect of predation on species y by species z
f represents the natural death rate of the predator z in the absence of prey
g represents the efficiency and propagation rate of the predator z in the presence of prey

This translates to the following set of linear equations:

x[1] = p[1]*x[1] - p[2]*x[1]*x[2]
x[2] = -p[3]*x[2] + p[4]*x[1]*x[2] - p[5]*x[2]*x[3]
x[3] = -p[6]*x[3] + p[7]*x[2]*x[3]

It is slightly over specified since one of the parameters can be removed by rescaling the timescale. We define the function F as follows:

function F(t,x,p)
d1 = p[1]*x[1] - p[2]*x[1]*x[2]
d2 = -p[3]*x[2] + p[4]*x[1]*x[2] - p[5]*x[2]*x[3]
d3 = -p[6]*x[3] + p[7]*x[2]*x[3]
[d1, d2, d3]
end

This takes the time range, vectors of the independent variables and of the coefficients and returns a vector of the derivative estimates:

p = ones(7); # Choose all parameters as 1.0
x0 = [0.5, 1.0, 2.0]; # Setup the initial conditions
tspan = [0.0:0.1:10.0]; # and the time range

Solve the equations by calling the ode23 routine. This returns a matrix of the solutions in a columnar order which we extract and display using Winston:

(t,x) = ODE.ode23((t,x) -> F(t,x,pp), x0, tspan);
 
n = length(t);
y1 = zeros(n); [y1[i] = x[i][1] for i = 1:n];
y2 = zeros(n); [y2[i] = x[i][2] for i = 1:n];
y3 = zeros(n); [y3[i] = x[i][3] for i = 1:n];
 
using Winston
plot(t,y1,"b.",t,y2,"g-.",t,y3,"r--")

This is shown in the following figure:

more-about-julia-img-3

Graphics with Gadlfy

Julia now provides a vast array of graphics packages. The "popular" ones may be thought of as Winston, PyPlot and Gadfly and there is also an interface to the increasingly more popular online system Plot.ly.

Gadfly is a large and complex package and provides great flexibility in the range and breadth of the visualizations possible in Julia. It is equivalent to the R module ggplot2 and similarly is based on the seminal work The Grammar of Graphics by Leland Wilkinson.

The package was written by Daniel Jones and the source on GitHub contains numerous examples of visual displays together with the accompanying code.

An entire text could be devoted just to Gadfly so I can only point out some of the main features here and encourage the reader interested in print standard graphics in Julia to refer to the online website at http://gadflyjl.org.

The standard call is to the plot() function with creates a graph on the display device via a browser either directly or under the control of IJulia if that is being used as an IDE.

It is possible to assign the result of plot() to a variable and invoke this using display(), In this way output can be written to files including: SVG, SVGJS/D3 PDF, and PNG.

dd = plot(x = rand(10), y = rand(10));
draw(SVG(“random-pts.svg”, 15cm, 12cm) , dd);

Notice that if writing to a backend, the display size is provided, this can be specified in units of cm and inch.

Gadfly works well with C libraries of cairo, pango and, fontconfig installed.

It will produce SVG and SVGJS graphics but for PNG, PostScript (PS) and PDF cairo is required. Also complex text layouts are more accurate when pango and fontconfig are available.

The plot() call can operate on three different data sources:

Dataframes
Functions and expressions
Arrays and collections

Unless otherwise specified the type of graph produced is a scatter diagram.

The ability to work directly with data frames is especially useful.

To illustrate this let's look at the GCSE result set.

Recall that this is available as part of the RDatasets suite of source data.

using Gadfly, RDatasets, DataFrames;
set_default_plot_size(20cm, 12cm);
mlmf = dataset("mlmRev","Gcsemv")
df = mlmf[complete_cases(mlmf), :]

After extracting the data we need to operate with values with do not have any NA values, so we use the complete_cases() function to create a subset of the original data.

names(df)
5-element Array{Symbol,1}: ;
# => [ :School, :Student, :Gender, :Written, :Course ]

If we wish to view the data values for the exam and course work results and at the same time differentiate between boys and girls, this can be displayed by:

plot(df, x="Course", y="Written", color="Gender")

more-about-julia-img-4

The JuliaGPU community group

Many Julia modules build on the work of other authors working within the same field of study and these have classified as community groups (http://julialang.org/community).

Probably the most prolific is the statistics group: JuliaStats (http://juliastats.github.io).

One of the main themes in my professional career has been working with hardware to speed up the computing process. In my work on satellite data I worked with the STAR-100 array processor, and once back in the UK, used Silicon Graphics for 3D rendering of medical data . Currently I am interested in using NVIDIA GPUs in financial scenarios and risk calculations.

Much of this work has been coded in C, with domain specific languages to program the ancillary hardware. It is now possible to do much of this in Julia with packages contained in the JuliaGPU group.

This has routines for both CUDA and OpenCL, at present covering:

Basic runtime: CUDA.jl, CUDArt.jl, OpenCL.jl
BLAS integration: CUBLAS.jl, CLBLAS
FFT operations: CUFFT.jl, CLFFT.jl

The CU*-style routines only applies to NVIDIA cards and requires the CUDA SDK to be installed, whereas CL*-functions can be used with variety of GPU s. The CLFFT and CLBLAS do require some additional libraries to be present but we can use OpenCL as is.

The following is output from a Lenovo Z50 laptop with an i7 processor and both Intel and NVIDIA graphics chips.

julia> using OpenCL
julia> OpenCL.devices()
OpenCL.Platform(Intel(R) HDGraphics 4400)
OpenCL.Platform(Intel(R) Core(TM) i7-4510U CPU)
OpenCL.Platform(GeForce 840M on NVIDIA CUDA)

To do some calculations we need to define a kernel to be loaded on the GPU. The following multiplies two 1024x1024 matrices of Gaussian random numbers:

import OpenCL
const cl = OpenCL
const kernel_source = """
__kernel void mmul(
const int Mdim, const int Ndim, const int Pdim,
__global float* A, __global float* B, __global float* C) {
   int k;
   int i = get_global_id(0);
   int j = get_global_id(1);
   float tmp;
   if ((i < Ndim) && (j < Mdim)) {
     tmp = 0.0f;
     for (k = 0; k < Pdim; k++)
       tmp += A[i*Ndim + k] * B[k*Pdim + j];
       C[i*Ndim+j] = tmp;
   }
}
"""

The kernel is expressed as a string and the OpenCL DSL has a C-like syntax:

const ORDER = 1024; # Order of the square matrices A, B and C
const TOL   = 0.001; # Tolerance used in floating point comps
const COUNT = 3;     # Number of runs
 
sizeN = ORDER * ORDER;
h_A = float32(randn(ORDER)); # Fill array with random numbers
h_B = float32(randn(ORDER)); # --- ditto --
h_C = Array(Float32, ORDER); # Array to hold the results
 
ctx   = cl.Context(cl.devices()[3]);
queue = cl.CmdQueue(ctx, :profile);
 
d_a = cl.Buffer(Float32, ctx, (:r,:copy), hostbuf = h_A);
d_b = cl.Buffer(Float32, ctx, (:r,:copy), hostbuf = h_B);
d_c = cl.Buffer(Float32, ctx, :w, length(h_C));

We now create the Open CL context and some data space on the GPU for the three arrays d_A, d_B, and D_C.

Then we copy the data in the host arrays h_A and h_B to the device and then load the kernel onto the GPU.

prg = cl.Program(ctx, source=kernel_source) |> cl.build!
mmul = cl.Kernel(prg, "mmul");

The following loop runs the kernel COUNT times to give an accurate estimate of the elapsed time for the operation.

This includes the cl-copy!() operation which copies the results back from the device to the host (Julia) program.

for i in 1:COUNT
fill!(h_C, 0.0);
global_range = (ORDER. ORDER);
mmul_ocl = mmul[queue, global_range];
evt = mmul_ocl(int32(ORDER), int32(ORDER),
int32(ORDER), d_a, d_b, d_c);
run_time = evt[:profile_duration] / 1e9;
cl.copy!(queue, h_C, d_c);
mflops = 2.0 * Ndims^3 / (1000000.0 * run_time);
@printf “%10.8f seconds at %9.5f MFLOPSn” run_time mflops
end
0.59426405 seconds at 3613.686 MFLOPS
0.59078856 seconds at 3634.957 MFLOPS
0.57401651 seconds at 3741.153 MFLOPS

This compares with the figures for running this natively, without the GPU processor:

7.060888678 seconds at 304.133 MFLOPS

That is using the GPU gives a 12-fold increase in the performance of matrix calculation.

Summary

This article has introduced some of the main features which sets Julia apart from other similar programming languages. I began with a quick look some Julia code by developing a trajectory used in estimating the price of a financial option which was displayed graphically. Continuing with the graphics theme we presented some code to generating a Julia set and to write this to disk as a PGM formatted file.

The type system and use of multiple dispatch is discussed next. This a major difference for the user between Julia and object-orientated languages such as R and Python and is central to what gives Julia the power to generate fast machine-level code via LLVM compilation.

We then turned to a series of topics from the Julia armory:

Working with Python: The ability to call C and Fortran, seamlessly, has been a central feature of Julia since its initial development by the addition of interoperability with Python has opened up a new series of possibilities, leading to the development of the IJulia interface and its integration in the Jupyter project.
Simple statistics using DataFrames :As an example of working with data highlighted the Julia implementation of data frames by looking at Apple share prices and applying some simple statistics.
MySQL Access using PyCall: Returns to another usage of Python interoperability to illustrate an unconventional method to interface to a MySQL database.
Scientific programming with Julia: The case of solution of the ordinary differential equations is presented here by looking at the Lotka-Volterras equation but unusually we develop a solution for the three species model.
Graphics with Gadfly: Julia has a wide range of options when developing data visualizations. Gadfly is one of the ‘heavyweights’ and a dataset is extracted from the RDataset.jl package, containing UK GCSE results and the comparison between written and course work results is displayed using Gadfly, categorized by gender.

Finally we showcased the work of Julia community groups by looking at an example from the JuliaGPU group by utilizing the OpenCL package to check on the set of supported devices. We then selected an NVIDIA GeForce chip, in order to run execute a simple kernel which multiplied a pair of matrices via the GPU. This was timed against conventional evaluation against native Julia coding in order to illustrate the speedup involved in this approach from parallelizing matrix operations.