Archiving Gmail (Part 2)

In my last post, I addressed one of the issues I had with a script I had found that archives Gmail messages to PDF files and how to fix the issue of it creating needless temporary files that count toward your daily quota.

The remaining issues I had I addressed by moving the entire script from a Google spreadsheet to a Google web app script.  This was the original script UI:

Screen Shot 2013-11-20 at 8.38.27 AM.png

And here is the new UI:

DemoApp.png

As you can see, I it has a few more options than the original.  The gist of it is

  • you select the Gmail Label you want to use as a filter, you optionally select a label you want to ignore (You could create two labels for example 'Archive' and 'Processed').  You can also add your own custom filter to include or exclude additional messages.  
  • Choose if you want to save the attachments (If you are archiving the emails to PDF).
  • Optionally you can have it remove the first Label (e.g. 'Archive') and Optionally add a new label (e.g. 'Processed').  Both of these are helpful when you have a large number of emails to process and may bump against your daily Quotas for creating or forwarding mail.
  • If you want to Archive them, you can archive them to PDF with the attachments saved as separate files, or you can save them to EML files with the attachments embedded in the original email.  You can change the naming convention to a few different formats (This makes it easier to sort them later so you could specify From-Date-Id instead for example)
  • Or you can bulk forward them to some specified recipient(s).

If you want to see the finished product here is the link.  You will need to authorize it to access your mailbox etc since obviously it is processing mail.

 

So where to begin?

First you need to setup your Google Drive to write the scripts.  

  • In Drive, Click Create -> Connect More Apps
Screen Shot 2013-11-20 at 8.56.36 AM.png
  • Search for "script" and connect to Google Apps Script
Screen Shot 2013-11-20 at 8.59.24 AM.png
  • Now create your first script by creating a new script.  The type you want will be Script as Web App which will be available after you choose Script from the below image.
Screen Shot 2013-11-20 at 9.01.05 AM.png
  • You will now have your development environment all set to start scripting!
Screen Shot 2013-11-20 at 9.04.23 AM.png

Well, this is getting rather long, so in the next post I will just dump the code for the app I wrote and walk you through it.  Click Here to see Part 3

Archiving Google Mail To Files

I was recently tasked with extracting 2-300 gmail messages including attachments.  The requirements seemed simple enough:

  • Every message must include original meta data (from/to/cc/bcc/date etc)
  • all attachments must be saved in original format

Obviously one way would have been to sync the messages to outlook, create a folder in outlook, copy and/or move the messages to that folder, then export it to PST.  

But it seemed like there should be a 'native' gmail way to do it.  So I started poking around and found a lot of Google Apps Scripts had been written that would bulk forward or archive messages.   This one by Marcello Scacchetti caught my eye. 

But as I started using it, I realized there were a few aspects I didn't really like:

  1. It doesn't save the meta data
  2. It creates temp files in Google Drive
  3. Hard coded naming conventions
  4. Hard coded to PDF output.
  5. It hard coded labels
  6. It was built in a spreadsheet.

To keep this post short, I will break this into several posts and only address one issue here.  Since #1 is easy enough for the average bear to fix themselves so I won't bother addressing that.  The first real problem was with #2. Here we see the offending code:

 // Create the message PDF inside the new folder
var htmlBodyFile = newFolder.createFile('body.html', messageBody, "text/html");
var pdfBlob = htmlBodyFile.getAs('application/pdf');
pdfBlob.setName(newFolderName + ".pdf");
newFolder.createFile(pdfBlob);
htmlBodyFile.setTrashed(true);

Before it creates the pdf file, it creates an HTML file.  Why is this a problem? Well in scripts (particularly individual not paid corporate gmail accounts) there are quotas on how many files you can create via script per day - 250 as of the time of this writing.  That means that you could only archive 125 emails with this, and then only if none of them had attachments.  Luckily this is easy enough to fix:

var output = HtmlService.createHtmlOutput(messageBody)
var pdfBlob=output.getAs('application/pdf');
pdfBlob.setName(newFolderName + ".pdf");
newFolder.createFile(pdfBlob);
Here we use the HTMLService to create an object rather than a file. Easy fix and doubles the number of emails we can process per day.

I'll post some solutions to my other issues with the script in the next few days.