The things no one tells you about tree testing
A few practical considerations of designing and running a tree test

Tree testing is a powerful method in evaluating the hierarchical structure of a particular design. It ventures into quantitative research territory and can generate a large amount of data. It is also usually conducted remotely and unmoderated, therefore lacks context to the data we collect. Before I used tree testing for the first time, I read a lot about it, but it was never quite the same as running one myself. There are a few things no one ever told me but could really improve the quality of my study. If you’re planning a tree test for the first time and have some knowledge of experiments and statistics, this article is for you. I will share a few tips that may boost your planning and execution.
Basics of Experimental Design
Tree tests are usually treated like experiments. Experimental design usually includes a planned manipulation of certain factors and an observation of the results (e.g. A/B testing). To deduce the correct relationship between variables, the researcher needs to carefully consider what factors should be manipulated. A few questions to start (with examples from my previous study):
1. What questions are you trying to answer by running this experiment?
The north star of your study. It will help you answer the following questions. For my recent study, I’d like to know how the changes of the navigation structure affect the users’ wayfinding accuracy.
2. What are the independent variables?
These are the factors your will manipulate. In the case of changing multiple variables at once, it may introduce additional effects of interactions. You should then consider using factorial experimental designs, which could help you isolate the effects of different variables, as well as measure the interactions. Using the same study as an example, I only changed the placement of one item from the primary level navigation in each condition.
3. What is the dependent variable?
This is the factor you want to observe, and will be affected by its related independent variables. The dependent variable in my study was the accuracy of locating target area for completing preset tasks.
4. Do you have a Control Group if you’re looking for comparisons?
A Control Group, as opposed to a Treatment Group, is not exposed to the changes and acts like a benchmark. You can later compare the manipulated results to this group and see how different the results are. In my study, I had 2 Treatment Groups and 1 Control Group.
5. If you are testing multiple trees, are you using a between-subject design or a within-subject design?
In other words, would the same group of participants test multiple trees or just one? There are pros and cons with each approach. In my case, I used between-subject design, so that each participant only tested one tree. It helped reduce participant fatigue and learning effect, and, in some cases, could be more time efficient.
Know your target audience
Who is your target audience? If the answer is “everyone”, then you need to think twice about your product goals. It’s easy to assume that because the change / new design can be potentially seen by everyone, then everyone is the target audience. The truth is, some portion of this group are more likely to be influenced than others. To define your target audience, think about who will most likely be impacted if a change to the design is made. The impact can be manifested in a change of behaviours and / or decision making process.
Power analysis and sample size
In most situations, we do not have access to the entire population of interest. In quantitative studies, choosing the right sampling frame and sample size generates meaningful data while keeping the research cost low. Depending on your experiment design and targeted audience, there are different sampling strategies.
In a comparative tree test, the sample size is also influenced by the degree of difference of the multiple options / conditions that are being tested (aka the power of the study). For instance: let’s say that you’d like to survey people on the differences between an apple and an orange.
Because of the stark difference, you may only need to ask 3 people before you can draw a confident conclusion. Alternatively, if you were comparing a Royal Gala and a Fuji Apple, you may need a lot more people before you can reach the same level of confidence.
Keep this in mind when you are designing your trees. Ideally, the experiment design should account for a reasonable difference between the versions so that it can be detected by statistical analysis. This handy little tool is great for estimating sample size for adequate power. However, sometimes it can be difficult to estimate the difference. In my recent study, due to the potentially small difference, I opted to use a “rule of thumb”.
Post-task vs. Post-test Questionnaires
The advantage of Tree Testing is that it can gather behavioural insights from a wide audience. However, it doesn’t necessarily tell you how the participants feel or how easy / difficult the task is. One way to get that information is to use post-task or post-test questionnaires.
In my recent project, I used a standardized scale of measurement. Upon completion of each task, we asked the participants to rate how confident they felt about their answers. This data was later compared with the behavioural data, and we were able to identify where the participants felt very confident but had low success rates. It meant that the users could get frustrated when they realized that their actions didn’t lead to the desired outcome.
Soft launch
One of the common fears of launching a quantitative research study is the lack of control — once the study is out there, for the sake of study integrity, it should not be modified mid-way. One mitigation strategy is a soft launch. It helps identify some of the unforeseen issues in a more controlled environment. Different than a pilot testing with the stakeholders, in a soft launch, you will open up the study to a small portion of participants from your sample. You can use this opportunity to evaluate how well the study is set up, and identify any potential blind spots, prior to the full launch.
Monitor progress
Once the study is launched, it doesn’t mean that your job is done. I usually check in once a day to make sure everything is running smoothly, especially in the first 24 hours. In case something does come up, you’ll spot it early and have it under control.
Analysis plan
Ah, now we finally have the wonderful data we’ve been waiting for. Where do you start? I find having an analysis plan makes it a lot easier to compartmentalize and collaborate when multiple people work on the analysis at the same time.
A general approach I take starts from high-level themes then drills into the specifics:
1. Start with the data set
Before diving into the detailed analysis, spend some time going through the data. If you’re using a tool like Optimal workshop, you can do so by looking at the general stats, including success rate, paths and first clicks. Remember to take notes of any questions you may have.
2. Participant analysis
Your result is only as good as your participants. It’s always a good practice to compare the actual participants (the sample) against the desired population on a few key metrics. In my recent study, I found that my participants have less experience running their businesses than the population. It helped me more accurately interpret the results.
3. Overall trends
- Overall task success rate, directness and failure rate.
- Overall time taken to complete tasks.
4. Key metrics for each question
- Success rate
- First clicks
- Paths / destinations
- Post-task questionnaire
- Comments
5. Detailed analysis of each question
Depending on your study, you may want to compare the performance of different participant subgroups and / or other characteristics which could be correlated with the performance. Usually you’d perform statistical analysis for this part.
6. Interpreting the results
Last but not least, you need to put everything together and interpret the observed trends / differences. Keep in mind that not everyone is familiar with stats. Make sure that you convey the key takeaways of your analysis for the right audience.
There is a lot more to preparing and conducting a tree test study. This article only outlines a few aspects to complement your process. What tips do you have for running a tree testing? Share in the comments below!