Author: mannetjieopennetworkscom

Time Travel

Sometimes I can just kick my own backside. I believe a feeling all developers have felt at some stage (like forgetting a semicolon).

So I have a cron job on Google AppEngine that stops and starts a server for a client every day ( don’t ask why ). For the last week that this cron job have been running it stops the server but every morning the server will not be started.

So I tested my code and it will run perfectly every time. So this morning after the server did not startup AGAIN, I had a second look at the cron jobs and thats when it happened. I forgot to set the time zone for the cron jobs. It was running on GMT time and not our local time zone. That set my cron jobs out with two hours. So the server would startup but not at the time I wanted it to start.

So when setting up a cron job in AppEngine please remember to set the timezone so that you do not have to sit with the same issue I have. It is as easy as adding the following line in your schedule:

timezone: Australia/NSW

Your timezone can be found at the following wikipedia page:

https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

So do not let your code time travel.

Gmail Logs in BigQuery – The Plot Thickens

As you all know I have been knee deep in the gmails logs that gets streamed into BigQuery.

Todays musing is just a short heads up on something that we have found in the logs.

While doing a search on a specific email for some reason I have been getting duplicates in my results. So here is what is happening. I do a SQL query with message_info.source.addressmessage_info.flattened_destinations and message_info.subject in the WHERE clause equal to specific search criteria. This then produces duplicate results of email. Meaning that a single email would be shown twice. Now this got my mind spinning.

So I started digging. I first got the message ID of the email and did a select on just that specific message ID (message_info.rfc2822_message_id). Having a look at the results I have found the following. Firstly the email gets listed with each message_info.message_set.type, that looks all good. Then the email gets listed with the same set types but without any subject in the log. Lastly the email gets listed in the log again (and we are talking micro seconds apart) with a message_info.message_set.type of 16. That was the reason for the duplicates.

Now Set Type 16 is not documented so I have no idea what is happening there. I have logged a support call with Google on this for further explanation and will be keeping you guys up to date.

In the mean time to make sure my queries do return correct results I have added the following two lines in my WHERE clause.

message_info.source.address <> 
and message_info.message_set.type <> 16

Till Next Time 🙂

Mounting Google Storage on an OpenVZ Debian Jessie container

So I have a few OpenVZ containers with Debian Jessie running on them. I needed to be able to move files from the container to Google Storage. Jumped on Google and started looking at my options. GCSFUSE was the answer I got.

Now as any documentation you get out there half of it works and half does not and you end up spending a lot of time searching for weird and wonderful errors. So this is what I found and what worked for me.

Firstly you need to enable access for the containers to the fuse file system on the host. On the Host system run the following commands:

modprobe fuse 
vzctl set VEID --devnodes fuse:rw --save

Where VEID is the veid of the container you want to give access to.

I spin up a new Debian Jessie container and run the following commands:

apt-get update
apt-get install sudo
apt-get install curl
apt-get install apt-transport-https

These packages are not installed by default so installing these gets the basic applications for the installation. Then we add the correct repository to apt for easy installation:

export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

Then I install the gcsfuse package on the system.

apt-get update
apt-get install gcsfuse

Next we create the mount point where we want to mount our Google storage bucket.

mkdir /target/folder

There is a bug in Debian Jessie where the rights to the fuse device is not correct so we need to make sure we have the access we need:

chmod g+rw /dev/fuse

After this it is a good idea to do a good old reboot of the system to make sure everything starts up and works as expected. I have found a lot of forum posts where after all the setup, people were getting errors ( me included ) and a reboot of the system got everything working as expected.

Now if you have not done the setup on your Google cloud project, here is a quick rundown of what needs to be done.

  1. If you have not created a project on your Google Cloud Console, I would suggest popping over there and create it.
  2. Create your storage bucket.
  3. Create a Service Account and give it the role of Storage – Storage Object Creator
  4. With the service Account download the JSON key file.
  5. Remember to give the Service account access to your bucket or else you will get Authorisation errors.

The Steps above are well documented and a quick Google search will furnish you with what you need to know.

Make sure to copy your key file to your container.

This brings us to the moment of truth, as they say. Mounting the storage bucket to our file system:

gcsfuse --key-file /path/keyfile.json bucket-name /target/folder

At this point your Storage bucket should be mounted to your target folder and you can interact with it like any other folder on your system. For a last bit of admin work, making life easier, adding our mount to /etc/fstab so that it gets mounted automatically after a reboot. Add the following to your /etc/fstab file

bucket-name /target/folder gcsfuse rw,noauto,user,key_file=/path/keyfile.json

And thats it for today.  As usual hope this helps someone out there.

Email Logs in BigQuery

Email Logs in BigQuery

So Google has added an Admin feature where you can stream your email logs to Bigquery. Readup on it here: Email Logs in Big Query

Oh this is so nice for all of you out there that want to run reports on your emails. One catch to this is understanding the logs. After spending some time on this I would like to bring you a quick and dirty run through.

On the link I shared earlier you will find links to query examples and the Schema Documentation. I personally did not find the schema documentation very user friendly and had to spend some time working through the document and queries.

So to help you out on a quick rundown this is what I found:

Each email has multiple entries in the logs. So to do a count of all emails you will have to do a “count” with a “Distinct” on message_info.rfc2822_message_id.

count(DISTINCT(message_info.rfc2822_message_id))

To go one step further, lets say you want to split your results into incoming and outgoing emails. For this we will have have to specify the message set type in our where clause.

WHERE
(message_info.message_set.type = 8 or
message_info.message_set.type = 9)

The number mean the following:

  • 8 – Total outgoing messages
  • 9 – Total incoming messages

Now lets take this one more step further. Lets say you want to split that up to messages that is from outside your domain and those that is internal to your domain. For this we need to not just have a look at message_info.message_set.type but also message_info.source.service.

Firstly lets have a look at message_info.message_set.type. We know already what numbers 8 and 9 mean. So lets add to that list:

  • 1 – ‘Inbound email from outside domain’
  • 2 – ‘Sent outside domain’
  • 8 – ‘Total outgoing messages’
  • 9 – ‘Total incoming messages’
  • 10 – ‘Message internal to your domain’

As you can see there is only one entry for emails internal to the domain. So how would we know the difference between incoming and outgoing emails. This is where message_info.source.service comes into play. To keep it simple I am only going to stick to 2 possible values of message_info.source.service, ‘smtp-inbound’ and ‘gmail-ui’. As stated smtp-inbound is inbound emails, but on the other hand what does not state so clearly gmail-ui means outbound emails ( Yea took me a while to figure that one out). So our where clause would look something like this:

(message_info.message_set.type = 2 or message_info.message_set.type = 10or message_info.message_set.type = 1 or
message_info.message_set.type = 8 or
message_info.message_set.type = 9)
and
(message_info.source.service = ‘gmail-ui’ or
message_info.source.service = ‘smtp-inbound’).

So to finish this off here is the SQL to a query to get all email counts from a single email address in your domain broken up into outgoing and incoming, internal to domain and external for the last 7 days.

SELECT
count (DISTINCT message_info.rfc2822_message_id) as MessageIDCount,
CASE
WHEN message_info.message_set.type = 1 then ‘Inbound email from outside domain’
WHEN message_info.message_set.type = 2 then ‘Sent outside domain’
WHEN message_info.message_set.type = 8 then ‘Total outgoing messages’
WHEN message_info.message_set.type = 9 then ‘Total incoming messages’
WHEN message_info.message_set.type = 10 then ‘Message internal to your domain’
end as SetType,
CASE
WHEN message_info.source.service = ‘smtp-inbound’ then ‘Incoming’
WHEN message_info.source.service = ‘gmail-ui’ then ‘Sending’
end as EmailType,
sum (message_info.payload_size) / 1024 / 1024 as TotalSize,
Date(event_info.timestamp_usec) as QDate
FROM (TABLE_DATE_RANGE([2017_2_20_gmail_logs_dataset2.daily_],
DATE_ADD(CURRENT_TIMESTAMP(), -7, ‘DAY’),
CURRENT_TIMESTAMP()))
where
(message_info.flattened_destinations like ‘%email@address.com%’ OR
message_info.source.address = ’email@address.com’ )
and
(message_info.message_set.type = 2 or message_info.message_set.type = 10
or message_info.message_set.type = 1 or
message_info.message_set.type = 8 or
message_info.message_set.type = 9)
and
(message_info.source.service = ‘gmail-ui’ or
message_info.source.service = ‘smtp-inbound’)
group by EmailType, SetType, QDate
order by QDate, SetType, EmailType

I hope this helps clear up some of the Google emails logs for some of you.