Tcl: Handling Email

July 12, 2010 - 12:00 am

5591

13 min read

(For more resources on Tcl, see here.)

Before working with e-mail, we need to understand a bit about how e-mail works, as well as what it provides and what our application needs to perform on its own. In general, e-mails are easy to understand—someone sends a message, the e-mailing system takes care of delivering it to the correct target machine(s), and the recipients are then able to retrieve that message. From the e-mail system’s perspective, it does not care about what the e-mail contains, as long as it knows who it is from and who it should be delivered to.

Learn Programming & Development with a Packt Subscription

From the user’s perspective, he/she does not need to know how it is delivered—their mail application delivers the message to the server handling their e-mail, and all messages can be retrieved from that same server. When we interact with e-mails, it works the same way for us. In the majority of cases, our application only needs to interact with our e-mail server.

All e-mail messages are built using a common structure—each message consists of headers that describe the message and the body. Headers describe who the message is from, its recipients, and the subject of the message. They also provide the content type, which tells e-mail applications what type of data the message contains. Message headers can also contain a history of the servers it passed, additional information such as the e-mail application used to generate this message, and any other information that the e-mail application has added. The message body is the actual text and/or data that was sent. Information about what is in the message body is described in the headers, for example we can send plain text, HTML message, or simply an image.

Learning MIME

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the e-mail format. It defines how messages can include character sets other than 7-bit ASCII in the message headers and body, and introduces the concept of multiple parts of an e-mail along with attachments. Over time, MIME became such an integral part of e-mail handling that all e-mails are now sent in accordance with MIME standards.

Content type

MIME introduced the concept of content type, which was originally meant for defining types of files in an e-mail. This was introduced so that e-mail applications could present the content of a message differently, depending on the actual file type. This grew to other protocols and can now be referred to as the Internet media type standard. The standard consists of two parts—MIME type and MIME subtype separated by a slash. The content type describes the type of a piece of media, for example, image. The subtype defines file format—for example, jpeg. In this example, the MIME type is image/jpeg.

A full list of standardized format types can be found on the following page: http://www.iana.org/assignments/media-types/

Whenever an application needs to use its own content type, it is recommended that an x- prefix is appended to the subtype—for example, application/x-tcl-dict could be used to transmit a dictionary’s contents.

The MIME standard defines several possibilities for embedding data that is outside a 7-bit ASCII character set, that is, data such as binary files, messages using different character sets, and so on. The Base64 standard is commonly used for encoding binary files within an e-mail—this standard uses 64 characters only, and requires 4 bytes to encode 3 bytes of actual data. This means that a 1M file will use up over 1.3M when sent via e-mail. Base64 is described in more detail at: http://en.wikipedia.org/wiki/Base64

The standard also defines the quoted-printable standard that is used for sending 8-bit data. Characters outside of 7-bit character set are encoded as multiple characters; this idea is described in more detail at: http://en.wikipedia.org/wiki/Quoted-printable

For the purpose of this article, we do not need to go into details of how both Base64 and quoted-printable encodings work.

Multipart messages

MIME also introduces the concept of multipart content. An e-mail message can only consist of a single item. However, the MIME standard provides ways to send multipart content by enclosing multiple items in a single message. It can also be used recursively, one of the elements can also contain additional parts. We’ll see this in the following example:

There are multiple types of multipart contents:

multipart/related is used to send messages that should be treated as a whole. The first part is the content that the e-mail application should use and other parts are related to it, for example, images that are used in a HTML message. However, adding a part that should be inline requires that this element also has specific headers, which is discussed in more detail later in this article.
multipart/ mixed is used for sending mixed content types. It is up to the e-mail application to decide how to handle this, but parts that it can show inline will be shown within e-mail application, and parts that it cannot show directly will be shown only as attachments. A typical example is attaching images and documents—e-mail applications will show images inline, but require documents to be opened in an external application.
multipart/alternative is used to define multiple parts, where each part is an alternate version of the same content. A typical example is sending plain text and HTML messages. E-mail applications choose the best format that they can handle. Representations should be sent in a way they are ordered by—preferable representation should be the last part.

Multipart content types allow each part to have its own individual headers—this is required in order to define which content type each part is, along with how it should be treated. Also, as each part can have its own type, each part can also be a multipart element on its own.

The following diagram illustrates how both multipart/mixed, multipart/ alternative, and multipart/related can be used to send e-mail that contains plain text message and HTML message, inlined images as well as attachments. This is actually how the majority of e-mail applications will embed such a message. The structure of the entire message would look as follows:

TCL: Handling Email

Now that we know how our e-mail might appear, let’s proceed to building such a structure from within Tcl.

MIME in Tcl

In order to send such an e-mail from within Tcl, we will need to use Tcl’s mime package. It is a part of the tcllib package and is available in ActiveTcl distributions of Tcl.

This package allows the building and parsing of messages and handles all aspects of a message—headers, content, and support for multipart messages. It also handles conversion between various content encodings such as base64 and quoted-printable. Thanks to this we’ll only need to build the message parts and combine them into a final message.

Creating messages

The command mime::initialize is used to set up a part or the content of the entire message. This command accepts one or more options and returns a new token that identifies the new MIME part. Based on the options specified, there are two modes in which it can be used— the first is to parse content (such as parse a received e-mail message), and the second is to create content. We will focus on the second case and leave parsing for sections that talk about receiving e-mail.

Whenever we want to create a MIME part we need to specify the -canonical option and provide the content type for this part. Type is the MIME type described earlier. There are three possibilities for creating MIME objects—from a file or from a string, and when creating multipart content.

To create it from a file or a string, we need to specify the -file or -string option and provide either the path to the file or the content of this part as string or binary data. We should also specify the -encoding option that states how content should be handled so that it can be passed over a 7-bit protocol such as SMTP. For binary files, we should usually use base64 encoding and for text files, it is best to use quoted-printable.
When creating a MIME part, we can also specify one or more headers that it should have by adding the -header option. This option can be specified multiple times and each parameter to this option should be a list containing a header name and corresponding value. These headers are then added to the actual MIME body. Their names and corresponding values are part of MIME’s specifications. We’ll cover a small subset that we need to know in order to send an e-mail with both inlined elements and attachments.

For example, in order to create a simple plaintext element, we can run the following command:

set token [mime::initialize -canonical "text/html" 
    string "Hello world!"]

If we want to send it, all we would need to do is use the smtp package:

smtp::sendmessage $token -recipients "[email protected]"

Sending e-mails is described in more detail later in this section—the preceding code simply shows that both packages can be combined very easily.

Multipart elements

In order to multipart content, we should provide the -parts option to the mime:: initialize command. The value for this option should contain a list of all parts that should be included in this multipart content. Parts are included in the same order as provided in the list.

Let’s walk through an exercise of building up an e-mail that we described earlier.

This code uses several files, mainly message.html and message.txt for the text of the e-mail, companylogo.gif for logo that is used in message.html, and attachment.jpg as an attachment.

First we have to load the mime package and create an HTML part:

package require mime
# create actual HTML part
# (1.2.1.2.1 from diagram)
set part_html [mime::initialize -canonical "text/html" 
    -encoding quoted-printable -file message.html]
# create logo as inlined image
# (1.2.1.2.2 from diagram)
set part_logo [mime::initialize -canonical "image/gif" 
    -encoding base64 -file companylogo.gif 
    -header [list Content-Disposition "inline"] 
    -header [list Content-ID "companylogo.gif"] 
    ]

This code builds up two elements—a part containing HTML version of the message and an image that we add, inline, in the message. Following that, we use these to build up the multipart/related part (element 1.2.1.2 from preceding diagram) that contains two elements created using the preceding code:

set part_htmlrelated [mime::initialize 
    -canonical multipart/related 
    -parts [list $part_html $part_logo]]

Next it’s time to create a plain text version of the e-mail (element 1.2.1.1 from diagram) and build the multipart/alternative element that binds the HTML message and the plain text message into one piece, which is element 1.2.1.

set part_txt [mime::initialize 
    -canonical "text/plain" 
    -encoding quoted-printable -file message.txt]
set part_alternative [mime::initialize 
    -canonical multipart/alternative 
    -parts [list $part_txt $part_htmlrelated]]

Finally, we create a part for the attachment (element 1.2.2 from diagram) and create an element that combines the previously created container for the plain text and HTML message along with the attachment—element 1.2 from diagram.

set part_attachment [mime::initialize 
    -canonical "image/jpeg" 
    -header [list Content-Disposition 
   "attachment; filename=attachment.jpg"] 
    -header [list Content-ID "attachment.jpg"] 
    -encoding base64 -file attachment.jpg]

set all [mime::::initialize -canonical multipart/mixed 
    -parts [list $part_alternative $part_attachment]]

This makes our code complete and a full version of the message is now ready.

There are three types of elements that we are building:

HTML and plain text messages: Their context is defined by multipart elements they are included in, therefore, we only need to define content type.
JPEG image: It is an attachment, therefore, we need to provide more information in the part headers—filename, Content-ID, and disposition.
Multipart elements: These are used to combine other types of elements into a structure that we’ve described earlier.

Sending text messages also relates to character sets, encodings, and issues with internationalization. When sending messages that contain characters outside of 7-bit ASCII, we need to be aware of two things.

First of all, Tcl sends both strings and file contents in binary form. If we want to send text from a file, then that file needs to be encoded properly, using encodings such as UTF-8. If we want to send text from Tcl, we need to convert that text to proper encoding. Secondly, we need to specify the encoding of a part when specifying the canonical type—usually this means appending a semi-colon and charset=<charsetName>. For example:

set part_html [mime::initialize -canonical 
    "text/html; charset=UTF-8" 
    -encoding quoted-printable –string [encoding 
        convertto utf-8 "u25ba Omega symbol: u2126 u25c4"]]

Next we have an inlined image—in this case, we need to define additional headers. The first header is Content-Disposition, which specifies how this part should be handled. Specifying inline means that this is an element that will be referenced from the main document and should not be shown as an attachment. The second header is Content-ID, which identifies and names an element. This is how an element can then be referenced from other parts. Any references should be made in the format of cid:<Content-ID>, so in our case, it would be cid:companylogo.gif. For example, our message.html file can contain the following HTML tag:

<img src="cid:companylogo.gif" width="400" height="40" />

Elements that are regular attachments should have Content-Disposition set to attachment. Also, it is recommended to add filename=<name> to this parameter, separated from the disposition type by a semi-colon. Content-ID in this case specifies an attachment name and should be the same as the filename specified in the Content-Disposition header. This is how the attachment.jpg file is sent.

There is also a difference between naming parts within an e-mail and actual filenames. However, this example names files from MIME’s perspective in the same way as files are named on disk. It is common to add prefixes and/or suffixes to avoid naming collisions, especially when a message contains parts from different sources. For example, we add create inlined image in the following way:

set part_logo [mime::initialize -canonical "image/gif" 
    -encoding base64 -file "/path/to/template/logo.gif" 
    -header [list Content-Disposition "inline"] 
    -header [list Content-ID "template.logo.gif@$messageId"] 
    ]

We can then build the HTML to include such an image from Tcl by doing something like:

set html "<img src="cid:template.logo.gif@$messageId" />"

It is a good idea to generate unique identifiers for each message and append them to inlined parts’ identifiers. This prevents poorly written e-mail applications from having issues with forwarding or replying to e-mails with such images. It can be done using the uuid package and the uuid::uuid generate command, but any mechanism for generating a unique ID, such as from a related database entry, will work.

Cleaning up a MIME item requires running the mime::finalize command and passing the token of a MIME part to it. In order to delete all elements that are used in that element recursively, we can add the -subordinates option with the value all. For example:

mime::finalize $all -subordinates all

The preceding code will delete the token created for the entire message along with all other elements we’ve created.

Information about all commands from mime package can be found in its documentation available at:

http://tcllib.sourceforge.net/doc/mime.html