22 min read

In this article by Mat Ryer, the author of the book Go Programming Blueprints Second Edition, we will see how to create a successful Google Application and deploy it in Google App Engine along with Googles Cloud data storage facility for App Engine Developers.

(For more resources related to this topic, see here.)

Google App Engine gives developers a NoOps (short for No Operations, indicating that developers and engineers have no work to do in order to have their code running and available) way of deploying their applications, and Go has been officially supported as a language option for some years now. Google’s architecture runs some of the biggest applications in the world, such as Google Search, Google Maps, Gmail, among others, so is a pretty safe bet when it comes to deploying our own code.

Google App Engine allows you to write a Go application, add a few special configuration files, and deploy it to Google’s servers, where it will be hosted and made available in a highly available, scalable, and elastic environment. Instances will automatically spin up to meet demand and tear down gracefully when they are no longer needed with a healthy free quota and preapproved budgets.

Along with running application instances, Google App Engine makes available a myriad of useful services, such as fast and high-scale data stores, search, memcache, and task queues. Transparent load balancing means you don’t need to build and maintain additional software or hardware to ensure servers don’t get overloaded and that requests are fulfilled quickly.

In this article, we will build the API backend for a question and answer service similar to Stack Overflow or Quora and deploy it to Google App Engine. In the process, we’ll explore techniques, patterns, and practices that can be applied to all such applications as well as dive deep into some of the more useful services available to our application.

Specifically, in this article, you will learn:

  • How to use the Google App Engine SDK for Go to build and test applications locally before deploying to the cloud
  • How to use app.yaml to configure your application
  • How Modules in Google App Engine let you independently manage the different components that make up your application
  • How the Google Cloud Datastore lets you persist and query data at scale
  • A sensible pattern for the modeling of data and working with keys in Google Cloud Datastore
  • How to use the Google App Engine Users API to authenticate people with Google accounts
  • A pattern to embed denormalized data into entities

The Google App Engine SDK for Go

In order to run and deploy Google App Engine applications, we must download and configure the Go SDK. Head over to https://cloud.google.com/appengine/downloads and download the latest Google App Engine SDK for Go for your computer. The ZIP file contains a folder called go_appengine, which you should place in an appropriate folder outside of your GOPATH, for example, in /Users/yourname/work/go_appengine.

It is possible that the names of these SDKs will change in the future—if that happens, ensure that you consult the project home page for notes pointing you in the right direction at https://github.com/matryer/goblueprints.

Next, you will need to add the go_appengine folder to your $PATH environment variable, much like what you did with the go folder when you first configured Go.

To test your installation, open a terminal and type this:

goapp version

You should see something like the following:

go version go1.6.1 (appengine-1.9.37) darwin/amd64

The actual version of Go is likely to differ and is often a few months behind actual Go releases. This is because the Cloud Platform team at Google needs to do work on its end to support new releases of Go.

The goapp command is a drop-in replacement for the go command with a few additional subcommands; so you can do things like goapp test and goapp vet, for example.

Creating your application

In order to deploy an application to Google’s servers, we must use the Google Cloud Platform Console to set it up. In a browser, go to https://console.cloud.google.com and sign in with your Google account. Look for the Create Project menu item, which often gets moved around as the console changes from time to time. If you already have some projects, click on a project name to open a submenu, and you’ll find it in there.

If you can’t find what you’re looking for, just search Creating App Engine project and you’ll find it.

When the New Project dialog box opens, you will be asked for a name for your application. You are free to call it whatever you like (for example, Answers), but note the Project ID that is generated for you; you will need to refer to this when you configure your app later. You can also click on Edit and specify your own ID, but know that the value must be globally unique, so you’ll have to get creative when thinking one up. Here we will use answersapp as the application ID, but you won’t be able to use that one since it has already been taken.

You may need to wait a minute or two for your project to get created; there’s no need to watch the page—you can continue and check back later.

App Engine applications are Go packages

Now that the Google App Engine SDK for Go is configured and our application has been created, we can start building it.

In Google App Engine, an application is just a normal Go package with an init function that registers handlers via the http.Handle or http.HandleFunc functions. It does not need to be the main package like normal tools.

Create a new folder (somewhere inside your GOPATH folder) called answersapp/api and add the following main.go file:

package api
import 
(
  "io"
  "net/http"
)
func init() 
{
  http.HandleFunc("/", handleHello)
}
func handleHello(w http.ResponseWriter, r *http.Request) 
{
  io.WriteString(w, "Hello from App Engine")
}

You will be familiar with most of this by now, but note that there is no ListenAndServe call, and the handlers are set inside the init function rather than main. We are going to handle every request with our simple handleHello function, which will just write a welcoming string.

The app.yaml file

In order to turn our simple Go package into a Google App Engine application, we must add a special configuration file called app.yaml. The file will go at the root of the application or module, so create it inside the answersapp/api folder with the following contents:

application: YOUR_APPLICATION_ID_HERE
version: 1
runtime: go
api_version: go1
handlers:
- url: /.*
  script: _go_app

The file is a simple human–(and machine) readable configuration file in YAML (Yet Another Markup Language format—refer to yaml.org for more details). The following table describes each property:

Property

Description

application

The application ID (copied and pasted from when you created your project).

version

Your application version number—you can deploy multiple versions and even split traffic between them to test new features, among other things. We’ll just stick with version 1 for now.

runtime

The name of the runtime that will execute your application. Since we’re building a Go application, we’ll use go.

api_version

The go1 api version is the runtime version supported by Google; you can imagine that this could be go2 in the future.

handlers

A selection of configured URL mappings. In our case, everything will be mapped to the special _go_app script, but you can also specify static files and folders here.

Running simple applications locally

Before we deploy our application, it makes sense to test it locally. We can do this using the App Engine SDK we downloaded earlier.

Navigate to your answersapp/api folder and run the following command in a terminal:

goapp serve

You should see the following output:

This indicates that an API server is running locally on port :56443, an admin server is running on :8000, and our application (the module default) is now serving at localhost:8080, so let’s hit that one in a browser.

As you can see by the Hello from App Engine response, our application is running locally. Navigate to the admin server by changing the port from :8080 to :8000.

The preceding screenshot shows the web portal that we can use to interrogate the internals of our application, including viewing running instances, inspecting the data store, managing task queues, and more.

Deploying simple applications to Google App Engine

To truly understand the power of Google App Engine’s NoOps promise, we are going to deploy this simple application to the cloud. Back in the terminal, stop the server by hitting Ctrl+C and run the following command:

goapp deploy

Your application will be packaged and uploaded to Google’s servers. Once it’s finished, you should see something like the following:

Completed update of app: theanswersapp, version: 1

It really is as simple as that.

You can prove this by navigating to the endpoint you get for free with every Google App Engine application, remembering to replace the application ID with your own:

https://YOUR_APPLICATION_ID_HERE.appspot.com/.

You will see the same output as earlier (the font may render differently since Google’s servers will make assumptions about the content type that the local dev server doesn’t).

The application is being served over HTTP/2 and is already capable of pretty massive scale, and all we did was write a config file and a few lines of code.

Modules in Google App Engine

A module is a Go package that can be versioned, updated, and managed independently. An app might have a single module, or it can be made up of many modules: each distinct but part of the same application with access to the same data and services. An application must have a default module—even if it doesn’t do much.

Our application will be made up of the following modules:

Description

The module name

The obligatory default module

default

An API package delivering RESTful JSON

api

A static website serving HTML, CSS, and JavaScript that makes AJAX calls to the API module

web

Each module will be a Go package and will, therefore, live inside its own folder.

Let’s reorganize our project into modules by creating a new folder alongside the api folder called default.

We are not going to make our default module do anything other than use it for configuration, as we want our other modules to do all the meaningful work. But if we leave this folder empty, the Google App Engine SDK will complain that it has nothing to build.

Inside the default folder, add the following placeholder main.go file:

package defaultmodule
func init() {}

This file does nothing except allowing our default module to exist.

It would have been nice for our package names to match the folders, but default is a reserved keyword in Go, so we have a good reason to break that rule.

The other module in our application will be called web, so create another folder alongside the api and default folders called web. Here we are only going to build the API for our application and cheat by downloading the web module.

Head over to the project home page at https://github.com/matryer/goblueprints, access the content for Second Edition, and look for the download link for the web components for this article in the Downloads section of the README file. The ZIP file contains the source files for the web component, which should be unzipped and placed inside the web folder.

Now, our application structure should look like this:

/answersapp/api
/answersapp/default
/answersapp/web

Specifying modules

To specify which module our api package will become, we must add a property to the app.yaml inside our api folder. Update it to include the module property:

application: YOUR_APPLICATION_ID_HERE
version: 1
runtime: go
module: api
api_version: go1
handlers:
- url: /.*
  script: _go_app

Since our default module will need to be deployed as well, we also need to add an app.yaml configuration file to it. Duplicate the api/app.yaml file inside default/app.yaml, changing the module to default:

application: YOUR_APPLICATION_ID_HERE
version: 1
runtime: go
module: default
api_version: go1
handlers:
- url: /.*
  script: _go_app

Routing to modules with dispatch.yaml

In order to route traffic appropriately to our modules, we will create another configuration file called dispatch.yaml, which will let us map URL patterns to the modules.

We want all traffic beginning with the /api/ path to be routed to the api module and everything else to the web module. As mentioned earlier, we won’t expect our default module to handle any traffic, but it will have more utility later.

In the answersapp folder (alongside our module folders—not inside any of the module folders), create a new file called dispatch.yaml with the following contents:

application: YOUR_APPLICATION_ID_HERE
dispatch:
  - url: "*/api/*"
    module: api
  - url: "*/*"
    module: web

The same application property tells the Google App Engine SDK for Go which application we are referring to, and the dispatch section routes URLs to modules.

Google Cloud Datastore

One of the services available to App Engine developers is Google Cloud Datastore, a NoSQL document database built for automatic scaling and high performance. Its limited feature-set guarantees very high scale, but understanding the caveats and best practices is vital to a successful project.

Denormalizing data

Developers with experience of relational databases (RDBMS) will often aim to reduce data redundancy (trying to have each piece of data appear only once in their database) by normalizing data, spreading it across many tables, and adding references (foreign keys) before joining it back via a query to build a complete picture. In schemaless and NoSQL databases, we tend to do the opposite. We denormalize data so that each document contains the complete picture it needs, making read times extremely fast—since it only needs to go and get a single thing.

For example, consider how we might model tweets in a relational database such as MySQL or Postgres:

A tweet itself contains only its unique ID, a foreign key reference to the Users table representing the author of the tweet, and perhaps many URLs that were mentioned in TweetBody.

One nice feature of this design is that a user can change their Name or AvatarURL and it will be reflected in all of their tweets, past and future: something you wouldn’t get for free in a denormalized world.

However, in order to present a tweet to the user, we must load the tweet itself, look up (via a join) the user to get their name and avatar URL, and then load the associated data from the URLs table in order to show a preview of any links. At scale, this becomes difficult because all three tables of data might well be physically separated from each other, which means lots of things need to happen in order to build up this complete picture.

Consider what a denormalized design would look like instead:

We still have the same three buckets of data, except that now our tweet contains everything it needs in order to render to the user without having to look up data from anywhere else. The hardcore relational database designers out there are realizing what this means by now, and it is no doubt making them feel uneasy.

Following this approach means that:

  • Data is repeated—AvatarURL in User is repeated as UserAvatarURL in the tweet (waste of space, right?)
  • If the user changes their AvatarURL, UserAvatarURL in the tweet will be out of date

Database design, at the end of the day, comes down to physics. We are deciding that our tweet is going to be read far more times than it is going to be written, so we’d rather take the pain up-front and take a hit in storage. There’s nothing wrong with repeated data as long as there is an understanding about which set is the master set and which is duplicated for speed.

Changing data is an interesting topic in itself, but let’s think about a few reasons why we might be OK with the trade-offs.

Firstly, the speed benefit to reading tweets is probably worth the unexpected behavior of changes to master data not being reflected in historical documents; it would be perfectly acceptable to decide to live with this emerged functionality for that reason.

Secondly, we might decide that it makes sense to keep a snapshot of data at a specific moment in time. For example, imagine if someone tweets asking whether people like their profile picture. If the picture changed, the tweet context would be lost. For a more serious example, consider what might happen if you were pointing to a row in an Addresses table for an order delivery and the address later changed. Suddenly, the order might look like it was shipped to a different place.

Finally, storage is becoming increasingly cheaper, so the need for normalizing data to save space is lessened. Twitter even goes as far as copying the entire tweet document for each of your followers. 100 followers on Twitter means that your tweet will be copied at least 100 times, maybe more for redundancy. This sounds like madness to relational database enthusiasts, but Twitter is making smart trade-offs based on its user experience; they’ll happily spend a lot of time writing a tweet and storing it many times to ensure that when you refresh your feed, you don’t have to wait very long to get updates.

If you want to get a sense of the scale of this, check out the Twitter API and look at what a tweet document consists of. It’s a lot of data. Then, go and look at how many followers Lady Gaga has. This has become known in some circles as “the Lady Gaga problem” and is addressed by a variety of different technologies and techniques that are out of the scope of this article.

Now that we have an understanding of good NoSQL design practices, let’s implement the types, functions, and methods required to drive the data part of our API.

Entities and data access

To persist data in Google Cloud Datastore, we need a struct to represent each entity. These entity structures will be serialized and deserialized when we save and load data through the datastore API. We can add helper methods to perform the interactions with the data store, which is a nice way to keep such functionality physically close to the entities themselves. For example, we will model an answer with a struct called Answer and add a Create method that in turn calls the appropriate function from the datastore package. This prevents us from bloating our HTTP handlers with lots of data access code and allows us to keep them clean and simple instead.

One of the foundation blocks of our application is the concept of a question. A question can be asked by a user and answered by many. It will have a unique ID so that it is addressable (referable in a URL), and we’ll store a timestamp of when it was created.

type Question struct 
{
  Key          *datastore.Key `json:"id" datastore:"-"`
  CTime        time.Time      `json:"created"`
  Question     string         `json:"question"`
  User         UserCard       `json:"user"`
  AnswersCount int            `json:"answers_count"`
}

The UserCard struct represents a denormalized User entity, both of which we’ll add later.

You can import the datastore package in your Go project using this:

import "google.golang.org/appengine/datastore"

It’s worth spending a little time understanding the datastore.Key type.

Keys in Google Cloud Datastore

Every entity in Datastore has a key, which uniquely identifies it. They can be made up of either a string or an integer depending on what makes sense for your case. You are free to decide the keys for yourself or let Datastore automatically assign them for you; again, your use case will usually decide which is the best approach to take Keys are created using datastore.NewKey and datastore.NewIncompleteKey functions and are used to put and get data into and out of Datastore via the datastore.Get and datastore.Put functions.

In Datastore, keys and entity bodies are distinct, unlike in MongoDB or SQL technologies, where it is just another field in the document or record. This is why we are excluding Key from our Question struct with the datastore:”-“ field tag. Like the json tags, this indicates that we want Datastore to ignore the Key field altogether when it is getting and putting data.

Keys may optionally have parents, which is a nice way of grouping associated data together and Datastore makes certain assurances about such groups of entities, which you can read more about in the Google Cloud Datastore documentation online.

Putting data into Google Cloud Datastore

Before we save data into Datastore, we want to ensure that our question is valid. Add the following method underneath the Question struct definition:

func (q Question) OK() error 
{
  if len(q.Question) < 10 
  {
    return errors.New("question is too short")
  }
  return nil
}

The OK function will return an error if something is wrong with the question, or else it will return nil. In this case, we just check to make sure the question has at least 10 characters.

To persist this data in the data store, we are going to add a method to the Question struct itself. At the bottom of questions.go, add the following code:

func (q *Question) Create(ctx context.Context) error {
  log.Debugf(ctx, "Saving question: %s", q.Question)
  if q.Key == nil {
    q.Key = datastore.NewIncompleteKey(ctx, "Question", nil)
  }
  user, err := UserFromAEUser(ctx)
  if err != nil {
    return err
  }
  q.User = user.Card()
  q.CTime = time.Now()
  q.Key, err = datastore.Put(ctx, q.Key, q)
  if err != nil {
    return err
  }
  return nil
}

The Create method takes a pointer to Question as the receiver, which is important because we want to make changes to the fields.

If the receiver was (q Question)—without *, we would get a copy of the question rather than a pointer to it, and any changes we made to it would only affect our local copy and not the original Question struct itself.

The first thing we do is use log (from the google.golang.org/appengine/log package) to write a debug statement saying we are saving the question. When you run your code in a development environment, you will see this appear in the terminal; in production, it goes into a dedicated logging service provided by Google Cloud Platform.

If the key is nil (that means this is a new question), we assign an incomplete key to the field, which informs Datastore that we want it to generate a key for us. The three arguments we pass are context.Context (which we must pass to all datastore functions and methods), a string describing the kind of entity, and the parent key; in our case, this is nil.

Once we know there is a key in place, we call a method (which we will add later) to get or create User from an App Engine user and set it to the question and then set the CTime field (created time) to time.Now—timestamping the point at which the question was asked.

One we have our Question function in good shape, we call datastore.Put to actually place it inside the data store. As usual, the first argument is context.Context, followed by the question key and the question entity itself.

Since Google Cloud Datastore treats keys as separate and distinct from entities, we have to do a little extra work if we want to keep them together in our own code. The datastore.Put method returns two arguments: the complete key and error. The key argument is actually useful because we’re sending in an incomplete key and asking the data store to create one for us, which it does during the put operation. If successful, it returns a new datastore.Key object to us, representing the completed key, which we then store in our Key field in the Question object.

If all is well, we return nil.

Add another helper to update an existing question:

func (q *Question) Update(ctx context.Context) error 
{
  if q.Key == nil {
    q.Key = datastore.NewIncompleteKey(ctx, "Question", nil)
  }
  var err error
  q.Key, err = datastore.Put(ctx, q.Key, q)
  if err != nil {
    return err
  }
  return nil
}

This method is very similar except that it doesn’t set the CTime or User fields, as they will already have been set.

Reading data from Google Cloud Datastore

Reading data is as simple as putting it with the datastore.Get method, but since we want to maintain keys in our entities (and datastore methods don’t work like that), it’s common to add a helper function like the one we are going to add to questions.go:

func GetQuestion(ctx context.Context, key *datastore.Key)
(*Question, error) 
{
  var q Question
  err := datastore.Get(ctx, key, &q)
  if err != nil {
    return nil, err
  }
  q.Key = key
  return &q, nil
}

The GetQuestion function takes context.Context and the datastore.Key method of the question to get. It then does the simple task of calling datastore.Get and assigning the key to the entity before returning it. Of course, errors are handled in the usual way.

This is a nice pattern to follow so that users of your code know that they never have to interact with datastore.Get and datastore.Put directly but rather use the helpers that can ensure the entities are properly populated with the keys (along with any other tweaks that they might want to do before saving or after loading).

Summary

This article thus gives us an idea about the Go App functionality, how to create a simple application and upload on Google App Engine thus giving a clear understanding of configurations and its working Further we also get some ideas about modules in Google App Engine and also Googles cloud data storage facility for App Engine Developers

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here