Elixir Basics – Foundational Steps toward Functional Programming

0
1019
20 min read

In this article by Kenneth Ballou, author of the book Learning Elixir, we are going to go into depth on the syntax and basic built-in types of Elixir (and implicitly, Erlang).

(For more resources related to this topic, see here.)

A short introduction to types

Like most programming languages, Elixir has its fair share of numerical, Boolean, character, and collection types. It also has some extra types, namely, atoms and binaries. In this article, we will see how all of these types work. However, let’s start our discussion with numerical types.

Numerical types

Numerical types include the obvious integers. For example, in the interactive prompt (iex), we can enter a few basic numbers:


iex(1)> 42
42

We can also do some basic arithmetic with numbers, of course:

iex(2)> 42 + 5
47
iex(3)> 6 * 7
42
iex(4)> 42 - 10
32
iex(5) 42 / 6
7.0

So, addition, subtraction, and multiplication work as we expect. Division, however, did what is typically called implicit type widening or implicit type casting. That is, we took two integer types and converted it into a floating type through division. In fact, the / operator will always return a floating point type. If you want an integer type back, you can use the div and rem functions:

iex(6) div(10, 3)
3
iex(7) rem(10, 3)
1

Memory usage

To understand the assumptions of our system and make sure we are using the right types for the job, it’s often important to know how big each of our types is. Numerical types are architecture dependent. That is, integers on 32-bit machines will be smaller than integers on 64-bit machines. Integers, on either sized architectures will be 1 word, where a word is defined as 4 bytes or 8 bytes for 32-bit architectures or 64-bit architectures, respectively. This gives us a range of values between -134217729 and 134217728 for 32-bit architectures, and a range of values between -576460752303423489 and 576460752303423488 for 64-bit architectures. Big integers will start at 3 words and will grow to n words to fit.

Floating point numbers should always be double precision, which means they will use 4 words on 32-bit machines and 3 words on 64-bit machines. Furthermore, they will also follow the IEEE-754 specification for memory arrangement and usage.

Binary, hexadecimal, and octal Numbers

Elixir also has some neat shortcuts to represent numbers in different basis. That is, we can represent and convert between different numerical basis nearly trivially. To print a binary number in decimal, we can use the 0b prefix:

iex(1)> 0b1010
10

Furthermore, to print an octal number as a decimal, we can use the 0o prefix:

iex(2)> 0o755
493

Finally, as the pattern continues, to print hexadecimal numbers as decimals, use the 0x prefix:

iex(3)> 0xFF
255

Elixir also has support to enter numbers in exponent form. This has some obvious benefits of allowing us to define constants in a more readable form among other uses.

Since these are all still integers, we already know their size in memory.

Exponent floats

For example, we can input, say, Newton’s gravitational constant in a friendly form:

iex(4)> 6.674e-11
6.674e-11

Note that it doesn’t convert the exponent form to its full form. [How do we get it to print in the full form?]

Atoms

Atoms, sometimes referred to as symbols in other languages, are literal, constant terms where the name is the value itself. In Elixir, they are always prefixed with a leading colon, :. You may notice, in Erlang, they are not. Here are some examples of atoms we have seen already:

  • :ok
  • :error

And some we haven’t seen yet, but will later:

  • :reply
  • :noreply
  • :stop

But really, any alphanumeric with underscores (_) can be used as an atom. Some examples may include:

  • :first_name
  • :last_name
  • :address1
  • :address2

Atoms are really useful for signaling success or failure results, or as keys in a dictionary type. You may also think of atoms as enumerated types from other languages.

In the interactive interpreter, atoms behave very similarly to numbers. For example:

iex(1)> :ok
:ok
iex(2)> :error
:error

There are also a few internal functions for converting atoms into strings or strings into atoms. For example, here we change the :ok atom to ‘ok’:

iex(3)> Atom.to_string(:ok)
"ok"

Or, to convert the other way, we use this:

iex(4)> String.to_atom("ok")
:ok

We will go into some more detail about the String module shortly.

Atom memory usage

Atoms use 1 word or 8 bytes of memory and are unique. That is, once you define (read use) an atom, it will point to the same memory as all the other occurrences of that atom. Furthermore, they are not garbage-collected, nor are they mutable. The memory used by atoms will never be freed until the termination of the program.

From the Erlang Efficiency Guide:

1 word. Note: an atom refers into an atom table which also consumes memory. The atom text is stored once for each unique atom in this table. The atom table is not garbage-collected.

Booleans

Booleans, as you expect, are the simple values—either true or false. Furthermore, Boolean expressions have all the expected operators:

iex(1)> true
true
iex(2)> true == false
false
iex(3)> 2 < 42
true
iex(4)> 5 > 10
false

And as with most languages, we have our typical negation, the AND and OR operators:

iex(5)> not false
true
iex(6)> true and false
false
iex(7)> 1 and true
** (ArgumentError) argument error: 1
iex(8)> false or true
true

The not, and, and or operators are type strict with their first argument. The second argument can be anything. This explains what happened at prompt 7 here. However, or and and are short-circuit operators, so we can do the following as well:

iex(9)> true or error("I will not raise")
true
iex(10)> false and error("I will not raise either")
false

Although Elixir provides not, and, and or operators, it also provides non-type strict operators, !, &&, and ||, respectively:

iex(11)> !42
false
iex(12)> 1 || false
1
iex(13)> true && 42
42

Let’s take a quick moment to discuss what is happening in the last two examples. When we try true && 13, we may expect to get either a type error or simply true. Why didn’t we?

In Elixir, Boolean expressions return the last evaluated value. This is also true in short-circuiting. When we ask Elixir to evaluate 1 || false, we are given back 1 because it evaluates to true and the ||/2 operator is short-circuited. Similarly, an expression like true && 42 returns 42 because the &&/2 operator requires the evaluation of both elements to succeed. Obviously, true will pass. Thus, 42 is returned, because it is the last evaluated value of the two.

On top of the typical Boolean operators, we have access to some other operators that allow us to determine types and minor inspection:

  • is_boolean/1
  • is_atom/1
  • is_integer/1
  • is_float/1
  • is_number/1

Remember, Elixir functions are described by {name}/{arity}.

For example:

iex(14)> is_boolean(true)
true
iex(15)> is_boolean(1)
false
iex(16)> is_atom(false)
true

Okay, I lied. Elixir doesn’t actually have Booleans. It turns out that they are just atoms to show true or false:

iex(17)> is_boolean(:false)
true
iex(18)> is_atom(true)
true

To fully enumerate our different comparison operators, we have the following:

  • ==
  • !=
  • ===
  • < and <=
  • > and >=

You are, probably, already familiar with most of these. The == operator is used for testing equality, != is used for testing against equality, < and <= is used to test for less than or less than or equal, respectively, and finally,> and >= is used to test for greater than or greater than or equal, again, respectively.

But what about ===?. This operator test equality and equivalence. That is, the values must be equal in value and type. This may be easier to see in an example:

iex(1)> 1 == 1.0
true
iex(2)> 1 === 1.0
false

We can see that 1 shows, semantically, the same value as 1.0. However, integers are not of the same type as floating point numbers. Therefore, they are not equivalent.

Furthermore, as may be hinted by 1 == 1.0, the comparison operators can be used across types. We can do something like this, without issue:

iex(1)> 1 < :atom
true

Since we can compare across types (we can do cool things such as sorting algorithms without care for the types in the collection we are sorting), the </2 operator will be sufficient for all the cases.

The overall ordering of types is number < atom < reference < function < port < pid < tuple < maps < list < bitstring.

It’s not necessary to memorize this ordering, but you should know it exists.

Strings

Strings in Elixir, as you would expect, are just UTF-8 characters surrounded by double quotes. But they have a few extra qualities that most other languages either don’t have or treat differently. For example, strings may contain line breaks, by either including them using escape sequences or by actually writing the new line into the string.

We’ve seen this earlier:

iex(1)> "Hello, World!"
"Hello, World!"
iex(2)> "Hello,\nWorld!"
"Hello,\nWorld"

But we can also do this:

iex(3)> "Hello,
...(3)> World!"
"Hello,\nWorld!"

Note that the …(#)> prompt is used by iex to denote the continuation of input. It’s expecting you to finish your expression.

We can also use the IO.puts/1 function again to print the newline character:

iex(4)> IO.puts("Hello,\nWorld!")
Hello,
World!
:ok
iex(5)> IO.puts("Hello,
...(5)> World!")
Hello,
World!
:ok

UTF-8 strings mean we can also use any character supported by the UTF-8 encoding. So, we can go ahead and write this into our interactive session as well:

iex(6)> "こにちは、せかい!"
"こにちは、せかい!"

On Windows, you may have to adjust your output encoding of your terminal or use a terminal that supports UTF-8. You can change the current encoding for the session using chcp 65001 before launching iex.

Elixir strings also have great support for string interpolation. This is accomplished via the #{} character. For example, we can try executing the following command:

iex(7)> "Hello, #{:world}!"
"Hello, world!"

Furthermore, the String module has plenty of functions that we can use to manipulate strings. For example, we can easily reverse strings, determine the length of a string, and pull a single character out of a string:

iex(8)> String.reverse("Hello, World!")
"!dlroW ,olleH"
iex(9)> String.length("Hello, World!")
13
iex(10)> String.at("Hello, World!", 6)
" "

This, of course, only scratches the surface of the String module and I strongly encourage you to read more about it (and other modules) in the documentation.

(Linked) Lists

No language is complete without its own implementation of (linked) lists. Elixir has its own primitive (linked) list type. Elixir lists are heterogeneously typed, meaning they can contain any type at the same time. For example, we can have a list of numbers with an atom somewhere in the middle:

iex(1)> [1, 2, 4, :ok, 6, true]
[1, 2, 4, :ok, 6, true]

We can concatenate two lists together using the ++/2 operator:

iex(2)> [1, 2, 3] ++ [4, 5, 6]
[1, 2, 3, 4, 5, 6]

Similarly, we can subtract two lists using the –/2 operator:

iex(3) [1, 2, true, false, true] -- [true, false]
[1, 2, true]

Note that it removes one for one. The subtraction operator will remove the first element to match and will not remove any repeated elements.

Many functional algorithms that process lists will do so in a process head, recursively, process tail. So, it would be nonsense for our functional language to not provide us functions for doing just this. To grab the head of the list, we use hd/1:

iex(4)> hd([1, 2, 3, 4, 5])
1

And to grab the tail of the list, we use tl/1:

iex(5)> tl([1, 2, 3, 4, 5])
[2, 3, 4, 5]

Wait! Why did it return [2, 3, 4, 5]? There is a bit of an overused terminology here. Often in other languages, when programmers refer to the tail of lists, they mean the last element of the list. However, in functional languages, it is very common for the tail of a list to refer to the rest of the list. Thus, tl/1 returns every element after the head.

This is, in essence, the true concept of a linked list. You have a head element that points to the next element. But more recursively, you have an element, the head, which points to a list.

Furthermore, you may notice that calling hd/1 or tl/1 on an empty list is an error:

iex(6)> hd([])
** (ArgumentError) argument error

 

iex(7)> tl([])
** (ArgumentError) argument error

A little more about Strings

In C, we know strings are really just character arrays. And, char is just an unsigned integer. Well, it turns out that we really haven’t quite escaped this, even in Elixir.

You may notice that if you put big integers into a list, the interactive prompt will display characters:

iex(8)> 'hełło'
[104, 101, 322, 322, 111]
iex(9)> is_list('hełło')
true
iex(10)> [104, 101, 108, 108, 111]
'hello'

Why have two different representations for Strings? This is a bit of a historical holdover from the early Erlang days. Erlang was built around telephony switches. Bits and binaries were more important at the time. String handling wasn’t important. So many old Erlang libraries did string handling, what may have been the natural way to do it then, as lists of characters (or numbers, really).

Since this is the case, you may find yourself using an older Erlang library that only uses lists of characters, and your representation is an actual string, or vice versa. In that case, there’s both to_string/1 and to_char_list/1 to help you convert back and forth:

iex(11)> to_string('hallo')
"hallo"
iex(12)> to_char_list("hallo")
'hallo'

The to_string/1 is more versatile than just converting character lists to strings. You can pass numbers and other types that follow the String.Chars protocol and get a string representation out.

Ranges

Similar to lists, we can create number ranges simply with ... For example, to generate a list of numbers from 1 to 100, we would simply use 1..100. However, typing this into iex is less than interesting:

iex(1)> 1..100
1..100

Actually, this is interesting. Why didn’t Elixir expand the range? This is because ranges are considered lazy. The result is not actually enumerated until it absolutely has to be. Lazy evaluation enables us to solve problems that are potentially infinite; however, we don’t have to worry about that since it turns out that we will only need a tiny subset of the dataset. Turns out, though, most things in Erlang, and thus, Elixir, are considered eager.

Tuples

Tuples, similar to lists, allow us to collect the elements together into a single structure. Tuples differ from lists in denotion in that they are surrounded by curly brackets—{ and }. Furthermore, tuples can hold any value:

iex(1)> tuple = {1, 2, :ok, "hello"}
{1, 2, :ok, "hello}

However, the difference between tuples and lists is that tuples store elements contiguously in memory. Lists are inherently linked, and thus, accessing an element of a list by index is a slow, order n time, operation. Tuples, on the other hand, enable fast, constant-time element access.

Using the tuple_size/1 function, we can get the length or size of any given tuple:

iex(2)> tuple_size(tuple)
4

Using the elem/2 function, we can access any element of a tuple with an index (indexes start at 0):

iex(3)> elem(tuple, 1)
2

We can also use the put_elem/3 function to insert (read replace) elements in a tuple at the provided index:

iex(4)> put_elem(tuple, 3, "world")
{1, 2, :ok, "world"}
iex(5)> tuple
{1, 2, :ok, "hello"}

Remember, Elixir is a side-effect free language. The put_elem/3 function does not mutate the existing tuple, but creates a new one.

Tuples or lists

Why have both tuples and lists? Why use one over the other? What’s the difference?

It depends. As mentioned earlier, tuples are stored in contiguous blocks of memory, whereas lists are stored as linked lists in memory. The access characteristics of each is given by these two memory layouts. For tuples, accessing arbitrary individual elements is cheap. However, growing and inserting more elements into a tuple is not cheap. For lists, though accessing arbitrary individual elements requires traversing the list. But, prepending elements is cheap, constant time. Appending elements requires traversal.

Binaries

Elixir brings over the binary type from Erlang, and for good reason. The binary type gives us a lot of power over bits and bytes, and does not only enables us to use them effectively, but makes it as pain-free as binary data munging could be.

Binaries, or sometimes referred to as bit-strings, in Elixir are enclosed in between << and >> and may look a little strange at first:

iex(1)> <<1, 2, 4>>
<<1, 2, 4>>
iex(2)> <<255, 255, 256>>
<<255, 255, 0>>

What happened at command 2? It turns out, binaries, the disqualified ones, have a maximum value of 255. We must specify to the runtime that it should use more bits to represent our data:

iex(3)> <<256 :: size(16)>>
<<1, 0>>

The internal representation of each element in binaries will not use values greater than 255. Instead, as we told it to use two bytes (16 bits) for 256, it carried up the extra value into the next byte. See, each element in a binary is actually a byte. And, the maximum value we can represent with a single byte (unsigned) is 255. To store larger values, we must use more bytes. With 16 bytes, we can represent only up to 65535, unsigned. So, if we try the same trick, we should see 0 again:

iex(4)> <<65536 :: size(16)>>
<<0, 0>>

We see two elements because we are using two bytes, and Elixir, when printing binaries, prints them in the single-byte representation. Furthermore, we see two zeros because 65536 is out of the range of 16 bits. If we wanted this to fit, we could increase the size:

iex(5)> <<65536 :: size(17)>>
<<128, 0, 0::size(1)>>

Here, we see something a little different. By increasing the value only by one bit, we are now able to fit the value we want, but the representation of those bits is different.

The runtime is going to try as best as it can to use good-sized word boundaries when it is asked to store non-conforming binaries, that is, bits that do not align (already) to a word boundary.

Even more about Strings

Let’s take yet another moment to talk about Strings. This time, we will focus on double-quoted strings that are real strings.

Erlang, in its early days, didn’t require strings. Erlang code was hardly, if ever, used for user-facing code. Thus, string handling, by manipulation or otherwise, was regarded secondary. Bit handling, on the other hand, was paramount. It was absolutely necessary to have really good bit and binary manipulation. This is a language that is handling telephony data; these are all bits and bytes.

When it came to adding strings and string handling to Erlang, instead of introducing a change that would likely break a lot of systems, it was decided to use the existing types to facilitate. After all, what are characters but a couple of bytes per character (using decent encoding, or they are just one). That is, Strings are either a list of numbers, as we mentioned before, or actually binaries:

iex(1)> is_binary("Hello")
true
iex(2)> is_binary(<<"Hello">>)
true
iex(3)> <<"Hello">> === "Hello"
true

Notice that these all say that strings are binaries and vice versa. In fact, <<“Hello”>> and “Hello” are equivalent.

We can take this String concept further:

iex(4)> <<"Hello, せかい" :: utf8>>
"Hello せかい"
iex(5)> <<"Hello, せかい">>
"Hello せかい"

The :: utf8 size pattern specification isn’t necessary when constructing strings.

Strings in Elixir are, actually, UTF-8 binaries, and there is a nice String module that can and will work with UTF-8 binaries. This marks a clear distinction between Elixir and Erlang. String in Erlang refers to a list of characters and there is a string module, but it is not UTF-8 aware and will not handle the preceding strings correctly.

Some more built-in types

There are a few more types we should, at least, be aware of before continuing. We aren’t going to dive into the real depth of these types just yet, but we are going to introduce them. These are two—functions and PIDs.

Functions

Functions, as mentioned previously, are first-class citizens of Elixir, Erlang, and pretty much any other functional language. This means that we can reference them and pass them around as if they were any other type. We can pass them as parameters to other functions, injecting some sort of functionality into that function, or we can use this function passing as another form of composing programs.

For a quick example, here is a function that squares its input and returns the result:

iex(1)> double = fn x -> x * 2 end
#Function<6.90072148/1 in :erl_eval.expr/5>

The output tells us how the function was interpreted and saved into memory. Don’t mind the details for now, but the take away is we can now use this as a function as it is. Or, we could pass it to another function to compute some result:

iex(2)> double.(2)
4
iex(3)> Enum.map(1..10, double)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In 2, we used double/1 as a standalone function and doubled the value of the number 2.

In 3, we passed our function, double/1, to the function, Enum.map/2. The Enum.map/2 function takes a collection and a function and maps the function over the collection. That is, it processes the collection, passing each element of the collection to our function, and returns the result.

Process IDs

Another built-in type that comes from Erlang is that of process IDs (PIDs). These are not to be confused with regular OS PIDs. These are Erlang processes. Most often, when considering processes in the context of Erlang, we mean Erlang processes unless otherwise stated.

PIDs are used for referencing and signaling or message passing between Erlang processes. These are the addresses used to denote each process and its mailbox. Without the mailbox identifier, we have no idea how to send a message to it.

We won’t create any processes yet, but we can see all the processes running in the current Erlang VM with Process.list/0:

iex(1)> Process.list()
[#PID<0.0.0>, #PID<0.3.0>, #PID<0.6.0>, #PID<0.7.0>, #PID<0.9.0>, #PID<0.10.0>,
 #PID<0.11.0>, #PID<0.12.0>, #PID<0.13.0>, #PID<0.14.0>, #PID<0.15.0>,
 #PID<0.16.0>, #PID<0.18.0>, #PID<0.19.0>, #PID<0.20.0>, #PID<0.21.0>,
 #PID<0.22.0>, #PID<0.23.0>, #PID<0.24.0>, #PID<0.25.0>, #PID<0.26.0>,
 #PID<0.27.0>, #PID<0.28.0>, #PID<0.29.0>, #PID<0.37.0>, #PID<0.38.0>,
 #PID<0.39.0>, #PID<0.40.0>, #PID<0.41.0>, #PID<0.42.0>, #PID<0.44.0>,
 #PID<0.45.0>, #PID<0.46.0>, #PID<0.47.0>, #PID<0.50.0>, #PID<0.51.0>,
 #PID<0.52.0>, #PID<0.53.0>, #PID<0.54.0>, #PID<0.55.0>, #PID<0.56.0>,
 #PID<0.57.0>, #PID<0.59.0>]

Note that your output may differ.

This prints all the currently running processes in the current iex session.

Summary

We covered a lot in this article, so let’s take a quick moment to recap what we went over.

Everything in Elixir is a statement and this has a cool result on composability in the functional world. So, it’s easier to compose expressions than statements.

We discussed the basic types of Elixir: numbers (integers, floats), atoms, Booleans, Strings, lists, tuples, binaries, functions, and PIDs.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here