(For more resources on Oracle, see here.)
In order to join two tables, we will utilize the basic structure of a SELECT statement; but, we must add a few qualifiers.
Understanding the structure and syntax of ANSI join statements
When we join two tables, we add a WHERE clause that qualifies that the common columns between the tables are equivalent. The following is the syntax tree for an ANSI-compliant join:
SELECT column1, column2, ...
FROM table1, table2
WHERE table1.common_column = table2.common_column;
While the join syntax is similar in many ways to a typical select statement, we notice some differences.
- As you might expect, since we are selecting data from two tables, both table names are specified in the FROM clause, separated by commas
- The syntax of the WHERE clause is different from what we’ve previously seen
- In our WHERE clause, we specify the condition that the common column from the first table must be equal to the common column in the second
In constructing this part of the statement, it is important to have first identified the common column between the tables that forms the inter-table relationship. This equivalence forms the bond between the two tables.
Examining ambiguous Cartesian joins
Before we look at some examples of typical join statements, it is important to discuss one type of join that is considered undesirable in most circumstances. A Cartesian join is a join between two tables that omits the WHERE clause. The result is known as a Cartesian product. A Cartesian product is formed when every row of one table is joined to every row of another table. An example of this is shown as follows:
The resulting number of rows shown has been truncated for the sake of brevity. It is fairly easy to see what is happening without displaying all 40 rows that are returned. In a Cartesian join, the query returns the first row selected from the project table, in this case Desktop Rollout, and joins it to every row selected from the award table; first, Salesperson of the year, then Technological paper winner, and so on. It then moves to the second row returned from the project table, Security awareness training, and again joins each row from the award table. It continues in this manner until every row in the project table has been joined to every row in the award table.
The Cartesian product returned from a Cartesian join is generally considered undesirable because it has little meaning. Since no relationship has been established between the two tables based on a common column, the data from the project table does not relate in any logical way to the award table. Such joins are said to be ambiguous, since no row has any particular relationship to any other row.
It is important to remember that even if the two tables share a common column, that relationship must be specified in the WHERE clause. Failure to do so will result in a Cartesian product.
A Cartesian join is generally said to produce an a times b product, where a and b are the number of rows in the two tables. In the next screenshot, the project table has five rows, while the award table has eight rows. We can therefore say that the number of rows returned by the Cartesian product of the two tables will be 5 x 8, or 40, rows. In this way, we can predict the number of rows returned by any Cartesian join as being the number of rows in the first table times the number of rows in the second.
SQL in the real world
Another practical reason that Cartesian joins are considered undesirable is the immense strain they can put on a system from a performance perspective. While the 40-row Cartesian product from our example may not seem significant, consider two tables with one million rows each. The resulting number of rows from such a Cartesian product would be 1,000,000 x 1,000,000, or 1×1012, rows—one trillion rows. Such a mistake can cause excessive resource usage on your database system to the point of affecting other users. In fact, when tuning SQL statements, one of the most common examples of improper code to watch out for is Cartesian joins.
Using equi joins—joins based on equivalence
The core of the RDBMS is the relationships that are formed between tables. The most common relationships are based on equivalence. In this section, we examine the concept of an equi join.
Implementing two table joins with a table-dot notation
To see an example of the kind of join that would be advantageous to an SQL programmer, let’s return to the earlier request to display name, date of birth, and address information for all of the Companylink employees. We use two different queries to find the required information and note the inefficiency of the process. To get this information using a join, we issue the query shown in the following screenshot:
Let’s deconstruct the statement one line at a time. The first line is simply a list of all the columns that we want to display. The first four are columns from the employee table and the last four come from the address table. The second line specifies the tables that are involved in the query. We are requesting columns from the employee and address tables, so those tables are listed. The third line contains the clause that actually performs the join. We have stated that a join requires the linking of a common column between the tables. The only column that is common between the employee and address tables is the employee_id column; this column forms the relationship between the two tables. It is this column that we use to execute the join. The join clause performs this joining by creating a condition that sets the values for the employee_id column in the employee table equal to the employee_id column in the address table. Thus, each row in the address table is joined to each row in the employee table, but only where the values in the common columns are equivalent. The clause, however, requires that we specify which columns are being referred to, since they have the same name. To clearly delineate them, we prefix each of the columns in the WHERE clause with the name of the originating table, followed by a dot (.). We refer to this as the table-dot notation. As a result, the previous statement could be read as: Display name and address information from the employee and address tables, where the employee_id column in the employee table is equivalent to the employee_id column in the address table.
To see the relationship more clearly, we could rewrite the preceding statement to include the columns that form the relationship. In the following example, we have reduced the number of columns returned (for clarity) and included both columns that form the relationship. The result shows how the employee_id values for each table match.
As you can see, the rows displayed essentially show two tables joined together. The first_name, last_name, and employee.employee_id columns all belong to the employee table. The address.employee_id, street_address, and city columns belong to the address table. Yet information from both tables can be displayed together, provided that we join them with a common column. These joins are categorized as equi joins; joins based on the equivalence of values between common columns.
It is crucial that we explicitly define the two columns that form the join in the WHERE clause. A failure to do so will generate an ORA-00918 error or a column ambiguously defined error.
The following example shows the previous statements rewritten to exclude the tabledot notation with the originating tables, and the resulting error. Again, this results because the columns have the same name in each table, yet we have not defined the originating tables.
Using two table joins with alias notation
Thus far, our join examples have used two tables. However, a join can be done with any number of tables, provided they have common columns between them. We will discuss multi-table joins later in the article, but using the table-dot notation with multi-table joins is considered by some to be cumbersome, since each table and several columns must be prefixed with the associated table name.
We use double quotation marks to present column headings that are different than the actual column name. It allows us not only to display a different column name, but also to utilize case sensitivity and whitespace. This type of alias is called a column alias. We now look at a table alias, a type of alias that allows us to reference a table using a different name. In the following screenshot, we have rewritten the join shown previously, this time using table aliases, or alias notation:
This example is very similar to the original statement, but there is one significant difference. In the FROM clause, we have added aliases for our two tables. These aliases are designated by the letters that follow each of the table names. For the employee table, the alias is e, and for the address table, it is a. Thus, in the WHERE clause, instead of prefixing the table names employee and address to our columns, we simply use e and a in place of them. These same aliases are also used in the SELECT clause, where employee.employee_id and address.employee_id are simply written as e.employee_id and a.employee_id, respectively. Note that there is nothing particularly significant about using the letters a and e as aliases. We could just as easily have used emp and addr as our aliases. The purpose is simply to reduce the amount of coding that has to be written. Many SQL developers feel that using aliases is an efficient and readable way to write joins, especially those that involve numerous tables.
SQL in the real world
Although both table-dot notation and alias notation are supported in ANSI-compliant joins, the organization you work for may decide that one is preferable to the other. As with many of the choices offered to a SQL programmer, an organization’s coding standards may determine how code will be written. This isn’t to say that one way is necessarily better than the other, but rather to support the idea of having standards for the code written in an organization. Code standards provide rules for writing and reading code that generally lead to better interoperability between programmers.