Analysis at scale: How UserLeap leverages AI to supercharge user research
July 8, 2021
Get the best content on product management,
UX, and growth delivered to your inbox every week.
By: Kevin Mandich, VP of Artificial Intelligence & Machine Learning at UserLeap
Thematic Clustering as a Research Goal
One of the richest sources of customer experience data is an open-text survey response. Historically this is also one of the most difficult data formats from which to extract meaningful insights, especially at scale.
When user researchers run surveys with open-text questions, a common goal is to group the huge number of responses into a small number of bite-size and actionable take-aways. The identified themes are shared with product stakeholders and play a critical role in determining how to improve the product experience. An example with some responses and a summarizing theme could be:
Response: “I’m lost, can’t really find anything easily in the product”
Response: “It’d be nice if there was a way to find users by name”
Response: “Please add a search field”
Theme: “Add search functionality”
Performed manually, this analysis takes the form of placing the responses into a large spreadsheet, reading through them to locate patterns, defining themes that represent actionable groups of responses, and then assigning all responses to one or more of these themes (a.k.a. “coding”).
As you can imagine, this is a painstaking process and certainly can’t scale easily beyond a few hundred responses. Automating this process can be a powerful way to increase the leverage of researchers and bring the survey life cycle from weeks down to hours. The ability to do this accurately is also one of the key differentiators between UserLeap and other customer survey tools.
Existing Attempts at Automation
Many methods of grouping customer experience data exist in the industry, but they tend to lack the nuance required by product researchers and stakeholders to make informed decisions. Examples of existing methods include, in order of increasing complexity:
- Word and phrase counts and string matching
- Topic extraction and modeling
- Keyword extraction & post-processing
- Similarity matching of keywords (e.g. using a thesaurus, or maybe a neural network)
These natural language processing (NLP) methods are useful for a surface-level analysis, but they all have shortcomings. Take the following three sample responses received from users of a fictional web service:
“The subscription fee is a bit steep”
“The cost to change vendors might be too much for us at the moment”
“You should include a pricing plan based on usage”
The topic shared by all three responses is cost. However, none of the words associated with cost are the same, nor are the surrounding phrases. This eliminates word frequency counts, string matching, and keyword extraction as possible techniques, even if stemming or lemmatization techniques are used. Moreover, the intent of each response is different:
- Response 1 is complaining about the subscription costs
- Response 2 is referring to the switching costs of adopting a new tool, which include time, effort, and other resources in addition to just monetary value
- Response 3 is requesting a different subscription cost tier
Even if more sophisticated techniques such as topic modeling or neural network-based similarity scoring are used, grouping these three responses together would not make sense from a product perspective, and could adversely affect the decision making process resulting from such an analysis.
How We Do It at UserLeap
To capture the nuance of themes seen in open-text survey responses, we employ a multi-axis approach. Instead of just considering the topic, we also take into account various other derived information for each response. Here’s an example showing this information split into three possible axes, but in reality we employ many more than three:
At a minimum, to accurately describe an actionable theme, you need to identify the topic and describe the intent - oftentimes implicit - of the respondent. In the first response example above, “The subscription fee is a bit steep”, the intent of the respondent is exhibiting a negative sentiment towards the topic. Suppose a new response arrives: “It’s too expensive for me to use”. Here the topic and intent match the first example response, and so these will be considered part of the same actionable theme.
Another aspect of this problem has to do with what information the models behind each axis have. Some elements are global, and don’t depend on the context of the survey. Examples include answering the question “Is the respondent frustrated or not?” Some elements are more specific to the domain of survey results. An example here is the question “What portion of the product or service is the respondent referring to?”
Answering these questions is trivial in some cases but much more difficult for others. In all cases we utilize state-of-the-art deep neural networks as the basis for models whose jobs are to answer these questions. It’s by splitting the problem into separate portions - topic vs intent, global elements vs domain-specific attributes - that we are able to successfully replicate the efforts of expert human researchers.
The End Result
Here’s a screenshot showing the output of thematic clustering produced by UserLeap AI on our dashboard:
All themes identified have both a topic and an intent so that the takeaway is clear and immediately useful. We also identify an element of emotional response - sentiment - as well as a recommendation based on the urgency of the theme’s responses. It’s by using advanced machine learning techniques that we’re able to produce this analysis quickly, accurately, and at scale.