Upping your Analytics Game: The Progression from Google Analytics to Server-Side Event Tracking…
There comes a point, when building a product, that you get serious and formal about product data collection and analysis. Over the past 18 months, we’ve moved from relying solely on Google Analytics, to using various turn key analytics platforms, to formalizing and centralizing the collection of event data in support of real business goals and objectives. This story outlines the phases we’ve been through, why we made changes, which tools we used during each phase, how we ended up centralizing analytics from the server instead of from our various clients, and some tips we learned while implementing server-side analytics.
Flying Solo with Google Analytics
When ZeeMee first launched, we were a web-only platform with very little extra engineering bandwidth. Using Google Analytics at this phase is a great choice.
When to use only Google Analytics
I’d recommend using only GA when you’re first starting, before you’ve “learned what you want to learn” about your users, when you don’t have tons of time to dig deep into user behavior, and when you don’t have data or analytics as a first-rate product (meaning, you don’t sell or re-package any event data).
Pros of Google Analytics
GA is free, easy to turn on, gives you good basic metrics, and lets you really dig in to web behavior. You can fine tune, enable goals, track searches, etc. You can scale this out pretty far.
When to move beyond
Start thinking about more nuanced event data when you start to wonder about specific user actions, the order of the actions, finer-grained detail on how users use your apps, or when you start to look at analytics across platforms (web, mobile apps, etc). For example, we wanted to know in which order users built their ZeeMee pages (profile photos, “My Story” section, “Meet Me” video, etc). This was the right time for us to bring in some different event processing tools.
Off the Shelf Event Processing Tools
When to use off-the-shelf analytics tools
Use off-the-shelf analytics tools when you don’t yet fully understand what you want to know about your users, and when you don’t need custom dashboards, executive summaries, etc, on the user behavior data.
We used MixPanel to visualize funnels (i.e. “Show me drop-off between signing up, adding a profile photo, and adding a Meet Me video”), and see user retention in general and for specific features (cohort analysis, MAUs, stickiness of site search, etc).
We added various events like “Did Search” and “Joined Group”. We replicated the event names, properties, and logic across iOS, Android, and Web. This allowed us to study user behavior across the various clients, and was a big improvement over using just Google Analytics. However, we still use GA for various metrics.
These tools are easy to get going. By using Segment, a lot of analytic tools are nearly one-click on and off for web apps. These tools give you a nice way to explore your data and learn what you really want to know about your users. This is great for an exploratory phase when you have resources available to study user behavior.
When to move beyond
We ran through the Google Ventures HEART framework and figured out new things that we wanted to learn from our users. This exercise forced us to think deeper about what we wanted to know, and made sure we were measuring the right things in the right way. We learned that to study certain metrics, we needed more flexibility and power than MixPanel.
Also, we were running into some pain with keeping event names, properties, and logic the same across web, iOS, and Android — all of which had separate implementations of the same logic.
Finally, we wanted to store the event data in a place that we controlled, so that we could run studies over time without worrying about losing any historical data.
Server-side event tracking and event warehousing
Server-side event tracking allows the server, rather than each individual client, to generate events. Event warehousing is long-term storage of events over time. These are two independent ideas, but we implemented them at the same time.
When to use server-side event tracking
Use server-side event tracking to reduce duplicated event code across various clients. For example, we had a “Did Search” event that had to be implemented with the same name, properties, and timing across our web, iOS, and Android clients. It was a mess!
By moving to server-side event tracking, all event names, properties, and logic were in a single location and therefore much easier to maintain.
When to store events in a data warehouse
Store event data in a data warehouse, such as RedShift, if you want to run custom analysis not supported by off-the-shelf tools, if you want to create custom dashboards of your data, or if you want to store your event data for long term analysis.
A reasonable, easy way to implement server-side events and warehousing
We architected our analytics and event tracking with the following goals:
- Generate events from our servers, instead of from the clients, whenever possible. This prevents the need to keep event names, properties, and logic in sync across client platforms.
- Allow events to continue to flow to MixPanel, Intercom, and various off-the-shelf tools for exploratory analytics.
- Save events in storage that we control, so that we can add charts to a dashboard to follow our identified HEART metrics.
Luckily for us, Segment has a really nice solution to address these goals. We used Segment’s server-side Ruby library to send events from the server and used the REST API to send client-only events (we use React Native on mobile, so we didn’t use Segment’s native iOS or Android libraries). Then we configured Segment to fork events to MixPanel, Google Analytics, and Intercom, and finally configured Segment to save the events into an Amazon Redshift database that we own and control via their warehouse feature . It’s amazing!
Once our data was being saved in a RedShift database, we looked at off-the-shelf SQL dashboarding tools. We evaluated Mode and Chartio, and ended up picking Chartio for its ease-of-use.
Mode seemed to be a good choice for exploring large amounts of data, and would make a lot of sense for a data science team (you can import warehouse data to a python notebook for use with numpy, pandas, etc., in one click). But, it cannot currently do cross-database joins, which was a problem for us since we wanted to join event data with data from our production database for certain types of charts (College database IDs are stored in the event as properties, but their actual names are only in the production database. So for a nice chart with names, we had to join across databases).
Chartio has a nice visual tool for creating basic charts. At first, I thought the UI chart-building was silly (why don’t I just write SQL to get the data instead?), but once I learned how the tool worked, I quickly learned to like it. Chartio also has a really neat “pipeline” feature that lets you manipulate the data after the query, but before charting, which is a much easier way to get certain types of graphs created (You can create new columns that are lagged in time, or averages of the rows, etc). Chartio hooked up pretty easily to our RedShift database and allowed us to create charts that visualize the HEART metrics we identified. Building the charts was a fun culmination of our quest to formalize and centralize our data collection and analysis.
Both Mode and Chartio (and a slew of other SQL dashboard solutions) easily work with the schema that Segment’s data warehouse sets up for you.
##Some random tips for server-side analytics
We found some reasonable solutions to various things you might want to do on server-side analytics.
##Differentiate between client types
We wanted to study events across all client types (web and mobile), but also wanted to know about actions on clients individually. Since most events are generated from the API server, we have our API clients send an extra request header with every API request:
X-ZeeMee-Application-Details: <client> where <client> can be “ios”, “android”, or “web_app”.
Then, the server adds a property to each event called “applicationDetails” with the same value. This makes it easy to do analysis on events for each client, or for all clients as a whole.
##Higher level events from API server
We want to study when ZeeMee users view other users’ ZeeMee pages. So we wanted to send an event any time a ZeeMee page was viewed.
On first glance, this seems like the type of information that only the client has available (I’m rendering a profile view screen right now), but in reality there are portions of our API that are only called during a profile view, so we could still log that event from the server side.
##Sometimes you should rely on post-processing the data
We wanted an event called “Did Search” that was called each time the user does a search. But, our search interface continuously updates as the user is typing. This means that the search API is called over and over during a single search. Our HEART exercise, however, helped us see that we really wanted study search events when the user intended to be done searching — effectively ignoring the intermediate “Did Search” events.
We worked through various ways to tackle this. We threw out ideas like “only send event when the user stops typing,” which is really hard to implement on the server side, but moderately easy on the client side. We also thought about “only send the event when there are > 0 results”, to weed out searches as people start typing. This could be done from the server side, but has way too many edge cases to be useful (what if we want to know legitimate things users are searching for that have 0 results?). We had several other solutions that were overly complicated.
We finally settled on sending the event every time the search API is called, even for not-yet-completed searches, and then using SQL window functions for time series analysis to consider a series of searches within a certain time window to be a single search. This ended up working well and allowing us to still send events from the server, making up the difference with more complicated analysis queries in the future.
##Don’t be afraid to duplicate properties
Even though events are being put into a SQL database where you can do joins between tables, it’s often simpler to duplicate properties into every event. We’ve often ended up sending properties and context along with every event, rather than relying on joins in the SQL later.
For example, to study the percent of messages sent to college counselors versus sent to students, it is easier to send “recipient is a college counselor” as a property for each “message sent” event, rather than doing a join across the “message sent event” table and the user table during analysis later. A join would have to cross between the data warehouse and the production database, which isn’t necessarily fun. Also, a college counselor could delete their account later in time, or transition away from being a counselor, but that doesn’t change the fact that the message was to an admissions counselor at the time it was sent. Sending the information as a property of the event solves those issues.
Migrating away from basic event collection is a natural step as a company grows. It’s helpful to chose the easiest tool for the stage that you’re in — don’t get too full-featured before you need it, because you may not yet know what you actually want to know. Overall, we are happy with the change toward mostly server-side events, and are generally satisfied with using Segment’s products as a good way to get there quickly.