16 min read

This article by William Smith, author of the book Everyday Data Structures reviews the most common and most important fundamental data types from the 10,000-foot view. Calling data types foundational structures may seem like a bit of a misnomer but not when you consider that developers use data types to build their classes and collections. So, before we dive into examining proper data structures, it’s a good idea to quickly review data types, as these are the foundation of what comes next.

In this article, we will briefly explain the following topics:

  • Numeric data types
  • Casting,Narrowing, and Widening
  • 32-bit and 64-bit architecture concerns
  • Boolean data types
  • Logic operations
  • Order of operations
  • Nesting operations
  • Short-circuiting
  • String data types
  • Mutability of strings

(For more resources related to this topic, see here.)

Numeric data types

A detailed description of all the numeric data types in each of these four languages namely, C#, Java, Objective C, and Swift, could easily encompass a book of its own. The simplest way to evaluate these types is based on the underlying size of the data, using examples from each language as a framework for the discussion.

When you are developing applications for multiple mobile platforms, you should be aware that the languages you use could share a data type identifier or keyword, but under the hood, those identifiers may not be equal in value. Likewise, the same data type in one language may have a different identifier in another.

For example, examine the case of the 16 bit unsigned integer, sometimes referred to as an unsigned short. Well, it’s called an unsigned short in Objective-C. In C#, we are talking about a ushort, while Swift calls it a UInt16. Java, on the other hand, uses a char for this data type.

Each of these data types represents a 16 bit unsigned integer; they just use different names. This may seem like a small point, but if you are developing apps for multiple devices using each platform’s native language, for the sake of consistency, you will need to be aware of these differences. Otherwise, you may risk introducing platform-specific bugs that are extremely difficult to detect and diagnose.

Integer types

The integer data types are defined as representing whole numbers and can be either signed (negative, zero, or positive values) or unsigned (zero or positive values). Each language uses its own identifiers and keywords for the integer types, so it is easiest to think in terms of memory length. For our purpose, we will only discuss the integer types representing 8, 16, 32, and 64 bit memory objects.

8 bit data types, or bytes as they are more commonly referred to, are the smallest data types that we will examine. If you have brushed up on your binary math, you will know that an 8 bit memory block can represent 28, or 256 values. Signed bytes can range in values from -128 to 127, or -(27) to (27) – 1. Unsigned bytes can range in values from 0 to 255, or 0 to (28) -1.

A 16 bit data type is often referred to as a short, although that is not always the case. These types can represent 216, or 65,536 values. Signed shorts can range in values from -32,768 to 32,767, or -(215) to (215) – 1. Unsigned shorts can range in values from 0 to 65,535, or 0 to (216) – 1.

A 32 bit data type is most commonly identified as an int, although it is sometimes identified as a long. Integer types can represent 232, or 4,294,967,296 values. Signed ints can range in values from -2,147,483,648 to 2,147,483,647, or -(231) to (231) – 1. Unsigned ints can range in values from 0 to 4,294,967,295, or 0 to (232) – 1.

Finally, a 64 bit data type is most commonly identified as a long, although Objective-C identifies it as a long long. Long types can represent 264, or 18,446,744,073,709,551,616 values. Signed longs can range in values from −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807, or -(263) to (263) – 1. Unsigned longs can range in values from 0 to 18,446,744,073,709,551,615, or 0 to (263) – 1.

Note that these values happen to be consistent across the four languages we will work with, but some languages will introduce slight variations. It is always a good idea to become familiar with the details of a language’s numeric identifiers. This is especially true if you expect to be working with cases that involve the identifier’s extreme values.

Single precision float

Single precision floating point numbers, or floats as they are more commonly referred to, are 32 bit floating point containers that allow for storing values with much greater precision than the integer types, typically 6 or 7 significant digits. Many languages use the float keyword or identifier for single precision float values, and that is the case for each of the four languages we are discussing.

You should be aware that floating point values are subject to rounding errors because they cannot represent base-10 numbers exactly. The arithmetic of floating point types is a fairly complex topic, the details of which will not be pertinent to the majority of developers on any given day. However, it is still a good practice to familiarize yourself with the particulars of the underlying science as well as the implementation in each language.

Double precision float

Double precision floating point numbers, or doubles as they are more commonly referred to, are 64 bit floating point values that allow for storing values with much greater precision than the integer types, typically to 15 significant digits. Many languages use the double identifier for double precision float values and that is also the case for each of the four languages: C#, Objective C, Java, and Swift.

In most circumstances, it will not matter whether you choose float over double, unless memory space is a concern in which case you will want to choose float whenever possible. Many argue that float is more performant than double under most conditions, and generally speaking, this is the case. However, there are other conditions where double will be more performant than float. The reality is the efficiency of each type is going to vary from case to case, based on a number of criteria that are too numerous to detail in the context of this discussion. Therefore, if your particular application requires truly peak efficiency, you should research the requirements and environmental factors carefully and decide what is best for your situation. Otherwise, just use whichever container will get the job done and move on.

Currency

Due to the inherent inaccuracy found in floating point arithmetic, grounded in the fact that they are based on binary arithmetic, floats, and doubles cannot accurately represent the base-10 multiples we use for currency. Representing currency as a float or double may seem like a good idea at first as the software will round off the tiny errors in your arithmetic. However, as you begin to perform more and complex arithmetic operations on these inexact results, your precision errors will begin to add up and result in serious inaccuracies and bugs that can be very difficult to track down. This makes float and double data types insufficient for working with currency where perfect accuracy for multiples of 10 is essential.

Typecasting

In the realm of computer science, type conversion or typecasting means to convert an instance of one object or data type into another. This can be done through either implicit conversion, sometimes called coercion, or explicit conversion, otherwise known as casting. To fully appreciate casting, we also need to understand the difference between static and dynamic languages.

Statically versus dynamically typed languages

A statically typed language will perform its type checking at compile time. This means that when you try to build your solution, the compiler will verify and enforce each of the constraints that apply to the types in your application. If they are not enforced, you will receive an error and the application will not build. C#, Java, and Swift are all statically typed languages.

Dynamically typed languages, on the other hand, do the most or all of their type checking at run time. This means that the application could build just fine, but experience a problem while it is actually running if the developer wasn’t careful in how he wrote the code. Objective-C is a dynamically typed language because it uses a mixture of statically typed objects and dynamically typed objects. The Objective-C classes NSNumber and NSDecimalNumber are both examples of dynamically typed objects. Consider the following code example in Objective-C:

double myDouble = @"chicken";
NSNumber *myNumber = @"salad";

The compiler will throw an error on the first line, stating Initializing ‘double’ with an expression of incompatible type ‘NSString *’. That’s because double is a plain C object, and it is statically typed. The compiler knows what to do with this statically typed object before we even get to the build, so your build will fail.

However, the compiler will only throw a warning on the second line, stating Incompatible pointer types initializing ‘NSNumber *’ with an expression of type ‘NSString *’. That’s because NSNumber is an Objective-C class, and it is dynamically typed. The compiler is smart enough to catch your mistake, but it will allow the build to succeed (unless you have instructed the compiler to treat warnings as errors in your build settings).

Although the forthcoming crash at runtime is obvious in the previous example, there are cases where your app will function perfectly fine despite the warnings. However, no matter what type of language you are working with, it is always a good idea to consistently clean up your code warnings before moving on to new code. This helps keep your code clean and avoids any bugs that can be difficult to diagnose.

On those rare occasions where it is not prudent to address the warning immediately, you should clearly document your code and explain the source of the warning so that other developers will understand your reasoning. As a last resort, you can take advantage of macros or pre-processor (pre-compiler) directives that can suppress warnings on a line by line basis.

Implicit and explicit casting

Implicit casting does not require any special syntax in your source code. This makes implicit casting somewhat convenient. However, since implicit casts do not define their types manually, the compiler cannot always determine which constraints apply to the conversion and therefore will not be able to check these constraints until runtime. This makes the implicit cast also somewhat dangerous.

Consider the following code example in C#:

double x = "54";

This is an implicit conversion because you have not told the compiler how to treat the string value. In this case, the conversion will fail when you try to build the application, and the compiler will throw an error for this line, stating Cannot implicitly convert type ‘string’ to ‘double’. Now, consider the explicitly cast version of this example:

double x = double.Parse("42");
Console.WriteLine("40 + 2 = {0}", x);

/*
  Output
  40 + 2 = 42
*/

This conversion is explicit and therefore type safe, assuming that the string value is parsable.

Widening and narrowing

When casting between two types, an important consideration is whether the result of the change is within the range of the target data type. If your source data type supports more bytes than your target data type, the cast is considered to be a narrowing conversion.

Narrowing conversions are either casts that cannot be proven to always succeed or casts that are known to possibly lose information. For example, casting from a float to an integer will result in loss of information (precision in this case), as the result will be rounded off to the nearest whole number. In most statically typed languages, narrowing casts cannot be performed implicitly. Here is an example by borrowing from the C# single precision:

//C#
piFloat = piDouble;

In this example, the compiler will throw an error, stating Cannot implicitly convert type ‘double’ to ‘float’. And explicit conversion exists (Are you missing a cast?). The compiler sees this as a narrowing conversion and treats the loss of precision as an error. The error message itself is helpful and suggests an explicit cast as a potential solution for our problem:

//C#
piFloat = (float)piDouble;

We have now explicitly cast the double value piDouble to a float, and the compiler no longer concerns itself with loss of precision.

If your source data type supports fewer bytes than your target data type, the cast is considered to be a widening conversion. Widening conversions will preserve the source object’s value, but may change its representation in some way. Most statically typed languages will permit implicit widening casts. Let’s borrow again from our previous C# example:

//C#
piDouble = piFloat;

In this example, the compiler is completely satisfied with the implicit conversion and the app will build. Let’s expand the example further:

//C#
piDouble = (double)piFloat;

This explicit cast improves readability, but does not change the nature of the statement in any way. The compiler also finds this format to be completely acceptable, even if it is somewhat more verbose. Beyond improved readability, explicit casting when widening adds nothing to your application. Therefore, it is your preference if you want to use explicit casting when widening is a matter of personal preference.

Boolean data type

Boolean data types are intended to symbolize binary values, usually denoted by 1 and 0, true and false, or even YES and NO. Boolean types are used to represent truth logic, which is based on Boolean algebra. This is just a way of saying that Boolean values are used in conditional statements, such as if or while, to evaluate logic or repeat an execution conditionally.

Equality operations include any operations that compare the value of any two entities. The equality operators are:

  • == implies equal to
  • != implies not equal to

Relational operations include any operations that test a relation between two entities. The relational operators are:

  • > implies greater than
  • >= implies greater than or equal to
  • < implies less than
  • <= implies less than or equal to

Logic operations include any operations in your program that evaluate and manipulate Boolean values. There are three primary logic operators, namely AND, OR, and NOT. Another, slightly less commonly used operator, is the exclusive or, or XOR operator.  All Boolean functions and statements can be built with these four basic operators.

The AND operator is the most exclusive comparator. Given two Boolean variables A and B, AND will return true if and only if both A and B are true. Boolean variables are often visualized using tools called truth tables. Consider the following truth table for the AND operator:

A

B

A ^ B

0

0

0

0

1

0

1

0

0

1

1

1

This table demonstrates the AND operator.  When evaluating a conditional statement, 0 is considered to be false, while any other value is considered to be true. Only when the value of both A and B is true, is the resulting comparison of A ^ B also true.

The OR operator is the inclusive operator. Given two Boolean variables A and B, OR will return true if either A or B are true, including the case when both A and B are true. Consider the following truth table for the OR operator:

A

B

A v B

0

0

0

0

1

1

1

0

1

1

1

1

Next, the NOT A operator is true when A is false, and false when A is true. Consider the following truth table for the NOT operator:

A

!A

0

1

1

0

Finally, the XOR operator is true when either A or B is true, but not both. Another way to say it is, XOR is true when A and B are different. There are many occasions where it is useful to evaluate an expression in this manner, so most computer architectures include it. Consider the following truth table for XOR:

A

B

A xor B

0

0

0

0

1

1

1

0

1

1

1

0

Operator precedence

Just as with arithmetic, comparison and Boolean operations have operator precedence. This means the architecture will give a higher precedence to one operator over another. Generally speaking, the Boolean order of operations for all languages is as follows:

  • Parenthesis
  • Relational operators
  • Equality operators
  • Bitwise operators (not discussed)
  • NOT
  • AND
  • OR
  • XOR
  • Ternary operator
  • Assignment operators

It is extremely important to understand operator precedence when working with Boolean values, because mistaking how the architecture will evaluate complex logical operations will introduce bugs in your code that you will not understand how to sort out. When in doubt, remember that as in arithmetic parenthesis, take the highest precedence and anything defined within them will be evaluated first.

Short-circuiting

As you recall, AND only returns true when both of the operands are true, and OR returns true as soon as one operand is true. These characteristics sometimes make it possible to determine the outcome of an expression by evaluating only one of the operands. When your applications stops evaluation immediately upon determining the overall outcome of an expression, it is called short-circuiting. There are three main reasons why you would want to use short-circuiting in your code.

First, short-circuiting can improve your application’s performance by limiting the number of operations your code must perform. Second, when later operands could potentially generate errors based on the value of a previous operand, short-circuiting can halt execution before the higher risk operand is reached. Finally, short-circuiting can improve the readability and complexity of your code by eliminating the need for nested logical statements.

Strings

Strings data types are simply objects whose value is text. Under the hood, strings contain a sequential collection of read-only char objects. This read-only nature of a string object makes strings immutable, which means the objects cannot be changed once they have been created in memory.

It is important to understand that changing any immutable object, not just a string, means your program is actually creating a new object in memory and discarding the old one. This is a more intensive operation than simply changing the value of an address in memory and requires more processing. Merging two strings together is called concatenation, and this is an even more costly procedure as you are disposing of two objects before creating a new one. If you find that you are editing your string values frequently, or frequently concatenating strings together, be aware that your program is not as efficient as it could be.

Strings are strictly immutable in C#, Java, and Objective-C. It is interesting to note that the Swift documentation refers to strings as mutable. However, the behavior is similar to Java, in that, when a string is modified, it gets copied on assignment to another object. Therefore, although the documentation says otherwise, strings are effectively immutable in Swift as well.

Summary

In this article, you learned about the basic data types available to a programmer in each of the four most common mobile development languages. Numeric and floating point data type characteristics and operations are as much dependent on the underlying architecture as on the specifications of the language. You also learned about casting objects from one type to another and how the type of cast is defined as either a widening cast or a narrowing cast depending on the size of the source and target data types in the conversion. Next, we discussed Boolean types and how they are used in comparators to affect program flow and execution. In this, we discussed order of precedence of operator and nested operations. You also learned how to use short-circuiting to improve your code’s performance. Finally, we examined the String data type and what it means to work with mutable objects.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here