Schemas and Models

August 27, 2013 - 12:00 am

3086

12 min read

(For more resources related to this topic, see here.)

So what is a schema? At its simplest, a schema is a way to describe the structure of data. Typically this involves giving each piece of data a label, and stating what type of data it is, for example, a number, date, string, and so on.

In the following example, we are creating a new Mongoose schema called userSchema. We are stating that a database document using this schema will have three pieces of data, which are as follows:

name: This data will contain a string
email: This will also contain a string value
createdOn: This data will contain a date

The following is the schema definition:

var userSchema = new mongoose.Schema({
name: String,
email: String,
createdOn: Date
});

Field sizes

Note that, unlike some other systems there is no need to set the field size. This can be useful if you need to change the amount of data stored in a particular object. For example, your system might impose a 16-character limit on usernames, so you set the size of the field to 16 characters. Later, you realize that you want to encrypt the usernames, but this will double the length of the data stored. If your database schema uses fixed field sizes, you will need to refactor it, which can take a long time on a large database. With Mongoose, you can just start encrypting that data object without worrying about it.

If you’re storing large documents, you should bear in mind that MongoDB imposes a maximum document size of 16 MB. There are ways around even this limit, using the MongoDB GridFS API.

Data types allowed in schemas

There are eight types of data that can—by default—be set in a Mongoose schema. These are also referred to as SchemaTypes; they are:

String
Number
Date
Boolean
Buffer
ObjectId
Mixed
Array

The first four SchemaTypes are self-explanatory, but let’s take a quick look at them all.

String

This SchemaType stores a string value, UTF-8 encoded.

Number

This SchemaType stores a number value, with restrictions. Mongoose does not natively support long and double datatypes for example, although MongoDB does. However, Mongoose can be extended using plugins to support these other types.

Date

This SchemaType holds a date and time object, typically returned from MongoDB as an ISODate object, for example, ISODate(“2013-04-03T12:56:26.009Z”).

Boolean

This SchemaType has only two values: true or false.

Buffer

This SchemaType is primarily used for storing binary information, for example, images stored in MongoDB.

ObjectId

This SchemaType is used to assign a unique identifier to a key other than _id. Rather than just specifying the type of ObjectId you need to specify the fully qualified version Schema.Types.ObjectId. For example:

projectSchema.add({
owner: mongoose.Schema.Types.ObjectId
});

Mixed

A mixed data object can contain any type of data. It can be declared either by setting an empty object, or by using the fully qualified Schema.Types.Mixed. These following two commands will do the same thing:

vardjSchema= new mongoose.Schema({
mixedUp: {}
});
vardjSchema= new mongoose.Schema({
mixedUp: Schema.Types.Mixed
});

While this sounds like it might be great, there is a big caveat. Changes to data of mixed type cannot be automatically detected by Mongoose, so it doesn’t know that it needs to save them.

Tracking changes to Mixed type

As Mongoose can’t automatically see changes made to mixed type of data, you have to manually declare when the data has changed. Fortunately, Mongoose exposes a method called markModified to do just this, passing it the path of the data object that has changed.

dj.mixedUp = { valueone: "a new value" };
dj.markModified('mixedUp');
dj.save();

Array

The array datatype can be used in two ways. First, a simple array of values of the same data type, as shown in the following code snippet:

var userSchema = new mongoose.Schema({
name: String,
emailAddresses: [String]
});

Second, the array datatype can be used to store a collection of subdocuments using nested schemas. Here’s an example in the following of how this can work:

var emailSchema = new mongoose.Schema({
email: String,
verified: Boolean
});
var userSchema = new mongoose.Schema({
name: String,
emailAddresses: [emailSchema]
});

Warning – array defined as mixed type

A word of caution. If you declare an empty array it will be treated as mixed type, meaning that Mongoose will not be able to automatically detect any changes made to the data. So avoid these two types of array declaration, unless you intentionally want a mixed type.

var emailSchema = new mongoose.Schema({
addresses: []
});
var emailSchema = new mongoose.Schema({
addresses: Array
});

Custom SchemaTypes

If your data requires a different datatype which is not covered earlier in this article, Mongoose offers the option of extending it with custom SchemaTypes. The extension method is managed using Mongoose plugins. Some examples of SchemaType extensions that have already been created are: long, double, RegEx, and even email.

Where to write the schemas

As your schemas sit on top of Mongoose, the only absolute is that they need to be defined after Mongoose is required. You don’t need an active or open connection to define your schemas.

That being said it is advisable to make your connection early on, so that it is available as soon as possible, bearing in mind that remote database or replica sets may take longer to connect than your localhost development server.

While no action can be taken on the database through the schemas and models until the connection is open, Mongoose can buffer requests made from when the connection is defined. Mongoose models also rely on the connection being defined, so there’s another reason to get the connection set up early in the code and then define the schemas and models.

Writing a schema

Let’s write the schema for a User in our MongoosePM application.

The first thing we have to do is declare a variable to hold the schema. I recommend taking the object name (for example, user or project) and adding Schema to the end of it. This makes following the code later on super easy.

The second thing we need to do is create a new Mongoose schema object to assign to this variable. The skeleton of this is as follows:

var userSchema = new mongoose.Schema({ });

We can add in the basic values of name, email, and createdOn that we looked at earlier, giving us our first user schema definition.

var userSchema = new mongoose.Schema({
name: String,
email: String,
createdOn: Date
});

Modifying an existing schema

Suppose we run the application with this for a while, and then decide that we want to record the last time each user logged on, and the last time their record was modified. No problem!

We don’t have to refactor the database or take it offline while we upgrade the schema, we simply add a couple of entries to the Mongoose schema. If a key requested in the schema doesn’t exist, neither Mongoose nor MongoDB will throw errors, Mongoose will just return null values. When saving the MongoDB documents, the new keys and values will be added and stored as required. If the value is null, then the key is not added.

So let’s add modifiedOn and lastLogin to our userSchema:

var userSchema = new mongoose.Schema({
name: String,
email: String,
createdOn: Date,
modifiedOn: Date,
lastLogin: Date
});

Setting a default value

Mongoose allows us to set a default value for a data key when the document is first created. Looking at our schema created earlier, a possible candidate for this is createdOn. When a user first signs up, we want the date and time to be set.

We could do this by adding a timestamp to the controller function when we create a user, or to make a point we can modify the schema to set a default value.

To do this, we need to change the information we are sending about the createdOn data object.

What we have currently is:

createdOn: Date

This is short for:

createdOn: { type: Date }

We can add another entry to this object to set a default value here, using the JavaScript Date object:

createdOn: { type: Date, default: Date.now }

Now every time a new user is created its createdOn value will be set to the current date and time.

Note that in JavaScript default is a reserved word. While the language allows reserved words to be used as keys, some IDEs and linters regard it as an error. If this causes issues for you or your environment, you can wrap it in quotes, like in the following code snippet:

createdOn: { type: Date, 'default': Date.now }

Only allowing unique entries

If we want to ensure that there is only ever one user per e-mail address, we can specify that the email field should be unique.

email: {type: String, unique:true}

With this in place, when saving to the database, MongoDB will check to see if the e-mail value already exists in another document. If it finds it, MongoDB (not Mongoose) will return an E11000 error. Note that this approach also defines a MongoDB index on the email field.

Our final User schema

Your userSchema should now look like the following:

var userSchema = new mongoose.Schema({
name: String,
email: {type: String, unique:true},
createdOn: { type: Date, default: Date.now },
modifiedOn: Date,
lastLogin: Date
});

A corresponding document from the database would look like the following (line breaks are added for readability):

{ "__v" : 0,
"_id" : ObjectId("5126b7a1f8a44d1e32000001"),
"createdOn" : ISODate("2013-02-22T00:11:13.436Z"),
"email" : "[email protected]",
"lastLogin" : ISODate("2013-04-03T12:54:42.734Z"),
"modifiedOn" : ISODate("2013-04-03T12:56:26.009Z"),
"name" : "Simon Holmes" }

What’s that “__v” thing?

You may have noticed a data entity in the document that we didn’t set: __v. This is an internal versioning number automatically set by Mongoose when a document is created. It doesn’t increment when a document is changed, but instead is automatically incremented whenever an array within the document is updated in such a way that might cause the indexed position of some of the entries to have changed.

Why is this needed?

When working with an array you will typically access the individual elements through their positional index, for example, myArray[3]. But what happens if somebody else deletes the element in myArray[2] while you are editing the data in myArray[3]? Your original data is now contained in myArray[2] but you don’t know this, so you quite happily overwrite whatever data is now stored in myArray[3]. The __v gives you a method to be able to sanity check this, and prevent this scenario from happening.

Defining the Project schema

As part of our MongoosePM application we also need to think about Projects. After all, PM here does stand for Project Manager.

Let’s take what we’ve learned and create the Project schema. We are going to want a few types of data to start with:

projectName: A string containing the name of the project.
createdOn: The date when the document was first created and saved. This option is set to automatically save the current date and time.
modifiedOn: The date and time when the document was last changed.
createdBy: A string that will for now contain the unique ID of the user who created the project.
tasks: A string to hold task information.

Transforming these requirements into a Mongoose schema definition, we create this in the following:

varprojectSchema = new mongoose.Schema({
projectName: String,
createdOn: Date,
modifiedOn: { type: Date, default: Date.now },
createdBy: String,
tasks: String
});

This is our starting point, and we will build upon it. For now we have these basic data objects as mentioned previously in this article.

Here’s an example of a corresponding document from the database (line breaks added for readability):

{ "projectName" : "Another test",
"createdBy" : "5126b7a1f8a44d1e32000001",
"createdOn" : ISODate("2013-04-03T17:47:51.031Z"),
"tasks" : "Just a simple task",
"_id" : ObjectId("515c6b47596acf8e35000001"),
"modifiedOn" : ISODate("2013-04-03T17:47:51.032Z"),
"__v" : 0 }

Improving the Project schema

Throughout the rest of the article we will be improving this schema, but the beauty of using Mongoose is that we can do this relatively easily. Putting together a basic schema like this to build upon is a great approach for prototyping—you have the data you need there, and can add complexity where you need, when you need it.

Building models

A single instance of a model maps directly to a single document in the database. With this 1:1 relationship, it is the model that handles all document interaction—creating, reading, saving, and deleting.

This makes the model a very powerful tool.

Building the model is pretty straightforward. When using the default Mongoose connection we can call the mongoose.model command, passing it two arguments:

The name of the model
The name of the schema to compile

So if we were to build a model from our user schema we would use this line:

mongoose.model( 'User', userSchema );

If you’re using a named Mongoose connection, the approach is very similar.

adminConnection.model( 'User', userSchema );

Instances

It is useful to have a good understanding of how a model works.

After building the User model, using the previous line we could create two instances.

var userOne = new User({ name: 'Simon' });
var userTwo = new User({ name: 'Sally' });

Summary

In this article, we have looked at how schemas and models relate to your data. You should now understand the roles of both schemas and models.

We have looked at how to create simple schemas and the types of data they can contain. We have also seen that it is possible to extend this if the native types are not enough.

In the MongoosePM project, you should now have added a User schema and a Project schema, and built models of both of these.

Resources for Article:

Further resources on this subject:

Understanding Express Routes [Article]
Validating and Using the Model Data [Article]
Creating Your First Web Page Using ExpressionEngine: Part 1 [Article]