Email Logs in BigQuery

Email Logs in BigQuery

So Google has added an Admin feature where you can stream your email logs to Bigquery. Readup on it here: Email Logs in Big Query

Oh this is so nice for all of you out there that want to run reports on your emails. One catch to this is understanding the logs. After spending some time on this I would like to bring you a quick and dirty run through.

On the link I shared earlier you will find links to query examples and the Schema Documentation. I personally did not find the schema documentation very user friendly and had to spend some time working through the document and queries.

So to help you out on a quick rundown this is what I found:

Each email has multiple entries in the logs. So to do a count of all emails you will have to do a “count” with a “Distinct” on message_info.rfc2822_message_id.

count(DISTINCT(message_info.rfc2822_message_id))

To go one step further, lets say you want to split your results into incoming and outgoing emails. For this we will have have to specify the message set type in our where clause.

WHERE
(message_info.message_set.type = 8 or
message_info.message_set.type = 9)

The number mean the following:

  • 8 – Total outgoing messages
  • 9 – Total incoming messages

Now lets take this one more step further. Lets say you want to split that up to messages that is from outside your domain and those that is internal to your domain. For this we need to not just have a look at message_info.message_set.type but also message_info.source.service.

Firstly lets have a look at message_info.message_set.type. We know already what numbers 8 and 9 mean. So lets add to that list:

  • 1 – ‘Inbound email from outside domain’
  • 2 – ‘Sent outside domain’
  • 8 – ‘Total outgoing messages’
  • 9 – ‘Total incoming messages’
  • 10 – ‘Message internal to your domain’

As you can see there is only one entry for emails internal to the domain. So how would we know the difference between incoming and outgoing emails. This is where message_info.source.service comes into play. To keep it simple I am only going to stick to 2 possible values of message_info.source.service, ‘smtp-inbound’ and ‘gmail-ui’. As stated smtp-inbound is inbound emails, but on the other hand what does not state so clearly gmail-ui means outbound emails ( Yea took me a while to figure that one out). So our where clause would look something like this:

(message_info.message_set.type = 2 or message_info.message_set.type = 10or message_info.message_set.type = 1 or
message_info.message_set.type = 8 or
message_info.message_set.type = 9)
and
(message_info.source.service = ‘gmail-ui’ or
message_info.source.service = ‘smtp-inbound’).

So to finish this off here is the SQL to a query to get all email counts from a single email address in your domain broken up into outgoing and incoming, internal to domain and external for the last 7 days.

SELECT
count (DISTINCT message_info.rfc2822_message_id) as MessageIDCount,
CASE
WHEN message_info.message_set.type = 1 then ‘Inbound email from outside domain’
WHEN message_info.message_set.type = 2 then ‘Sent outside domain’
WHEN message_info.message_set.type = 8 then ‘Total outgoing messages’
WHEN message_info.message_set.type = 9 then ‘Total incoming messages’
WHEN message_info.message_set.type = 10 then ‘Message internal to your domain’
end as SetType,
CASE
WHEN message_info.source.service = ‘smtp-inbound’ then ‘Incoming’
WHEN message_info.source.service = ‘gmail-ui’ then ‘Sending’
end as EmailType,
sum (message_info.payload_size) / 1024 / 1024 as TotalSize,
Date(event_info.timestamp_usec) as QDate
FROM (TABLE_DATE_RANGE([2017_2_20_gmail_logs_dataset2.daily_],
DATE_ADD(CURRENT_TIMESTAMP(), -7, ‘DAY’),
CURRENT_TIMESTAMP()))
where
(message_info.flattened_destinations like ‘%email@address.com%’ OR
message_info.source.address = ’email@address.com’ )
and
(message_info.message_set.type = 2 or message_info.message_set.type = 10
or message_info.message_set.type = 1 or
message_info.message_set.type = 8 or
message_info.message_set.type = 9)
and
(message_info.source.service = ‘gmail-ui’ or
message_info.source.service = ‘smtp-inbound’)
group by EmailType, SetType, QDate
order by QDate, SetType, EmailType

I hope this helps clear up some of the Google emails logs for some of you.

 

One thought on “Email Logs in BigQuery

Leave a comment