Predicting Migration Times With Google Prediction API

The Challenge:
After a series of challenges on the GAMME Estimation tool, I decided to do a blog post proudly showing my development, but when the CloudSpokes team told me I could do a guest post, I took it even more seriously! When I came across the GAMME Estimation contest, I had recently won my first challenge and I was just getting used to this awesome community. So, I read the description and started to do a proof of concept with the Google Prediction API. This API is awesome, as it works by “learning” samples and then predicting a result for given data. For a guy that has the sad record of failing an easy statistics course five times in college, this is gold! It can be really abstract, really complex statistics “under the hood” and let the developers focus on the functionality required for the app.

Diving In:
The challenge was to get an accurate estimate for the migration time of the GAMME tool, an app developed by Google to migrate from Outlook servers to Gmail, up in the clouds. The logs were provided with some sample data on the migration.

I started with a proof of concept with the prediction API, they provide some cool examples that can be used for free. After crafting a CSV from the challenge assets, I got successful results, so I dove into the coding!

I went with Google App Engine for the development, because it provides seamless access to the Google APIs, and this project is using three, Cloud StorageGoogle Drive and of course, the Prediction API. Looking into the past, I think I wouldn’t chose it again, because of some constraints of the project, like the 60 second limit for requests, or the limited java classes usage allowed in the Java Whitelist, but I still think it is a great platform to develop in.

For the frontend I used Twitter Bootstrap, which I am a huge fan of (who isn’t?), for its general awesomeness. I imagine this app being run by a consultant in front of the client prior to a migration, perhaps on a tablet, so its responsive design makes it also cool to run on a portable device. This is how it looks on iPad:

Nostra Gamme:
The initial task was simply to upload the CSV via upload to Google Drive, then trigger the prediction with the data provided. It should be able to predict the time spent migrating, depending on the server and thread numbers, for the number of emails, contacts and calendars given. It will also predict the other way around, it will give the number of servers required to migrate in a determinate time span. To achieve this, two groups of prediction models were created from the original data, as the output for the prediction changes. I thought it would be cool to visually display the data, so I added jqPlot, which is great, although a little tricky sometimes. Here is a screenshot of the first version:

The Great Refinement:
After some positive feedback from the judges, a follow-up contest was launched to make some enhancements. I really enjoyed doing the original one, so this was a possibility to do some enhancements of my own. The requirements were to add client maintenance to manage several models per client. That gave me the possibility to learn (and fight against) the App Engine Datastore. I am a traditional SQL guy, but the Google approach to store data has its advantages, as it is declarative and incredibly fast!

I thought it would be cool to expose some functionality to add the clients, so I added an Ajax-based maintenance, which works with a REST API so it could be called from a client program. This is a screenshot of the frontend:

Fully Entering The Clouds:
I was happy to know there was another challenge open for GAMME prediction, to polish it a bit more. An average calculation was added to compare it with the one given to the prediction API, to have a reference and have a better understanding of the data provided. It also pushed into the clouds the client program extracting the logs from GAMME, which rounded it up to be a fully cloud development! Here is a screenshot of the prediction for the final app:

I would like to thank all Cloudspokes team for the feedback during the development. You rock guys! Keep these cool challenges coming!

Install MySQL jdbc driver on Oracle Weblogic

Oracle Weblogic comes with the Oracle jdbc drivers bundled (indeed), but if we have to access a database from some other vendor, we will have to install the drivers separately. Due to the acquisition of Sun by Oracle (and MySQL with it), we should expect at least the MySQL driver in the next releases, but for now we will have to go through a manual install. We will explain how to install it in this tutorial.

For this tutorial I have used Oracle Weblogic 12c and MySQL jdbc driver, but should work with any jdbc vendor.

First, we will download the jdbc connector from this link (registration needed)

After that, we copy the jar to the server classpath, the path should be:

{ORACLE_HOME}/Middleware/oracle_common/lib/java

Now we will edit the startup script, so when we start the server it loads the driver. We will have to edit the following script:

{ORACLE_HOME}/Middleware/user_projects/domains/{DOMAIN_NAME}/bin/setDomainEnv.sh

Now, we locate the following lines, and add mysql.jar as shown:


if [ "${POST_CLASSPATH}" != "" ] ;

then

POST_CLASSPATH="${CLASSPATHSEP}${POST_CLASSPATH}:${COMMON_COMPONENTS_HOME}/lib/java/mysql.jar"

export POST_CLASSPATH

else

POST_CLASSPATH="${COMMON_COMPONENTS_HOME}/lib/java/mysql.jar"

export POST_CLASSPATH

fi

Now we just only have to restart the server, and we are done!

Create maven multi-module project using eclipse

It is possible to create a maven multi-module project to support special situations we will face when developing with maven. A typical situation would be when we develop a web app and a desktop app, both depending in the same backend logic, like the database access layer. If we are using eclipse as an IDE, this could be a bit complicated to configure, this tutorial shows how to do it.

For this tutorial we have used

But it should work with any version of eclipse and maven.

First we will create the parent project using mvn tool. We browse to the folder we want to create our project in, and then issue:

mvn archetype:create -DgroupId=multi.module.eclipse -DartifactId=multi_module_project

After this, maven will create the pom and the src folders from the basic archetype

Now, we are going to edit the pom.xml, and change the packaging to pom, to indicate this is a parent project:


<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0

http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <groupId>multi.module.eclipse </groupId>
  <artifactId>multi_module_project</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>multi_module_project</name>
  <url>http://maven.apache.org</url>
</project>

Now we will create three submodules, one for web, one for desktop, and one common:

cd multi_module_project
mvn archetype:create -DgroupId=multi.module.project -DartifactId=web
mvn archetype:create -DgroupId=multi.module.project -DartifactId=common
mvn archetype:create -DgroupId=multi.module.project -DartifactId=main

Now, if we open the parent POM.xml, we will see that the three modules have been automatically added:

<modules>
  <module>web</module>
  <module>common</module>
  <module>main</module>
</modules>

And that the parent has been added automatically to the child POMs:

<parent>
  <groupId>guide.ide.eclipse</groupId>
  <artifactId>multi_module_project</artifactId>
  <version>1.0-SNAPSHOT</version>
</parent>

Now, we will add the common dependency on the web and desktop projects:

<dependency>
  <groupId>multi.module.project</groupId>
  <artifactId>common</artifactId>
  <version>1.0-SNAPSHOT</version>
  </dependency>

We want the desktop project to output a jar, so we set the packaging to jar:

<packaging>jar</packaging>

Now we will set the web package to a war, so we go to its pom.xml

<packaging>war</packaging>

After that, our multi-module project is ready. So we open eclipse and open the import menu, selecting existing maven projects

We go to the parent directory and then click next, the parent and the three subprojects should appear:

Now we check they have been imported correctly:

You should have the complete directory layout like this:

This configuration will resolve the dependencies with common through the workspace. This is handy because we will not have to perform a maven install each time we change it. To disable this feature, just click on disable workspace resolution on the maven menu:

Please note that we will have to do a Run as -> Maven install each time we change the common project if we disable the workspace resolution. That would push the project to the local maven repository to be resolved after by the other child build.

When we change any pom.xml, we should do a maven update:

And update the three projects configuration and the parent, to be sure that all changes are updated.

Now, when we perform a maven build in the parent, it will trigger all three builds of the children, so we can automate even more the build processes.

Starting with Google Prediction API

Google offers a ton of cool APIs, but one of the coolest is Google Prediction API. It works as some sort of crystal ball with data. You only have to provide sample data into the service, and then it will “predict” future results for new data. The possibilities with this are endless, we can predict future sales according to previous ones, or detect patterns in texts, to see the mood of the writer.

However, I found it a little difficult the process of  starting a new project with it, that is why I am doing a kickstart guide to configure a project from scratch. This guide will cover the steps to set up prediction API to be ready to use with a Google App Engine Project. In another guide I will cover how to develop an app to access the prediction API.

Requirements:

Create project on Google App Engine

Go to My applications on App Engine control panel, and click create.

Google App Engine menu

This will show the new application creation wizard. Fill in the gaps with your app name and view permissions.

New application wizard

Once the app is created, go to the administration panel, and copy the Service Account Name, because we will need it later.

Detail of app administration

Congratulations! Your app is ready to go! Now let’s give it the necessary permissions.

Enable APIs

Go to Google API console and create a new project.

Screenshot of the console, after creating the project

 Using prediction API is free, but we need Google Cloud Storage space to put the data for the training. At the moment, Google offers 5GB for free, but we must have billing enabled to use it. So, we go to the billing option and enable billing.

Enable billing screen

This will require a credit card, but don’t worry, you won’t be charged anything if you don’t pass the courtesy limits. We will limit the calls for courtesy limits later.

Once billing is enabled, enable the following apis from Services menu:

  • Google Cloud Storage
  • Google prediction API

Then go to Team menu, and add the App engine service account we copied before as a team member. This will allow the app to access cloud storage.

App added as a team member

This is optional, but I find it useful. Once you pass the courtesy limit, you will be charged for the API usage, if we are only test driving it, we don’t want to pass the courtesy limit. Let’s set the billable limit match the courtesy limit (100 calls / day) so we won’t be charged. This can be changed later when we need more API calls.

Go to quotas, and click on set billable limit:

Setting the billable limit

Now go to the Cloud Storage Console, where we will create a bucket to store our app files. Note also this name for later. In my case it will be gammeprediction

Bucket creation

Now, we have several ways to authenticate the app to use google APIs, in this example I will use the combination service account + key, because we won’t be accessing personal user data. Feel free to use others.

Downloading the key

We will need a key (a .p12 file) and the service account name for our application to work. Go to API access and create a service account. Click on Generate new key.

Downloading the key

When we click on download private key, a .p12 file with our key will be generated. The default password is notasecret, this can be changed from the console, but for now we will keep it as this. After downloading it, note the Email address value, because we will use it as the service account. This will be a combination of alphanumeric characters, followed by @developer.gserviceaccount.com.

So, this is all we need for now, after the whole process we must have:

  • .p12 file with our key
  • Service account email
  • Bucket with permissions on Cloud Storage

That’s it for now! Check the continuation of this tutorial, how to create a google prediction api app in java