ℹ️ Note: By 1 October 2021, Facebook will require advertisers to implement Conversions API in order to run lift tests.
This article discusses advanced best practices for lift testing. If you are new to lift testing, start from our introductory article.
Most common lift testing challenges
Not enough data
The most common problem we see with Lift Tests is that it is surprisingly difficult to gather enough data to get reliable results. Lift Tests often require big budgets, long run times and sufficiently large control groups to generate enough data to separate the causal effect from noise.
- Run the lift test long enough, and with big enough budgets.
- Combining campaigns from e.g. multiple countries help make your budgets bigger.
- The closer to a 50%–50% split your treatment & control groups are, the sooner you'll get results.
- Splitting the test into a multi-cell Lift Test means that each cell has less data.
How much data is enough? Read about Power Analysis below.
Too short a duration
On top of statistical reasons (getting enough data), there is another aspect that dictates the minimum duration of your Lift Test: causality.
Remember that a Lift Test measures the causal effect of your ads. How long does it take for your ads to actually cause somebody to convert? Your test needs to run longer than this. If you sell expensive products it is unlikely that you cause anyone to convert by showing ads for just a week or two. It could take several months to build brand awareness and convince customers to make the purchase (or for that awareness to fade in the control group).
When measuring brand lift for a well known brand, this could mean running the Lift Test for as long as 6-12 months.
A contaminated control group (overlapping campaigns)
If you have campaigns outside of the Lift Test targeting the same audience — even partially — the people in the control group are exposed to your ads. This will result in a smaller difference between the treatment and control groups, and hence smaller lift. This should be avoided at all cost.
Here are some best practises:
- Activate the campaigns only after the test has started. The Lift Test starts gathering the "test populations" only when it starts.
- Don't have any other campaigns, in any ad accounts, targeting the same people as the Lift Test campaigns are targeting.
- Alternatively, include all of your campaigns (or the entire account) in the Lift Test to measure their combined effect.
- For best results, have a "cooldown period" where you pause all ads targeting the test audience for some time before the test. The purpose of this is to minimize the effect of other/previous campaigns on the control group during the test.
- Start (or activate) the test campaigns only after the Lift Test has started
- Or, if you want to use an old campaign, preferably have a cooldown period
- For comparing the lift of multiple campaigns, all campaigns should be created anew
A faulty setup in lift test objectives
Setting up lift test objectives is crucial! Facebook will only measure lift based on the objectives you define when creating the test, and you won't be able to edit them afterwards. A mistake in the setup could mean that your test does not measure the results you want.
- Double-check that your objectives are set up correctly.
- Check that the selected conversion events come from the correct Pixel or app.
- If you included multiple conversion events in a single objective, you won't be able to separate them later; add individual conversion events as separate objectives.
- Create as many objectives you want — you can always choose to not look at all of them later!
Lift test power analysis
Power Analysis helps you estimate how many conversions you need to collect during the Lift Test, in order to have a good chance of getting a statistically significant result. Getting a statistically significant result means that we can be relatively confident that the actual lift is non-zero. In other words, it helps estimate how long you need to run your test, and how much money you need to spend.
The main purpose of Power Analysis is to avoid running under-powered Lift Tests. It is pointless to run a Lift Test that collects too little data: you will not get a result, and the effort is wasted.
Read more in the dedicated article: Lift Test power analysis
How to structure a lift test?
To understand this question, let's start from the beginning.
A Lift Test helps you measure how many incremental conversions your test campaigns generate, compared to not delivering those campaigns. This means that:
- If you only include one campaign in a Lift Test, you are measuring how many more conversions you got by deciding to launch that campaign on top of every other campaign you have running. If the test campaign is only one out of many campaigns targeting the same audience segment, the test is likely to indicate a small lift.
- If you include your whole portfolio of Facebook campaigns in a Lift Test, you are measuring the combined effect of those campaigns — how big of an effect does Facebook advertising as a whole have on your business, on top of every other channel (search, display, TV, billboards...) you might be active on.
Sometimes there are several possible setups to measure the same thing. For example, if you want to measure the lift effect caused by your retargeting campaigns, you could either:
- Option A: Include only the retargeting campaign(s) in the Lift Test, to measure your retargeting activities' lift of your retargeting on top of all other campaigns
- Pros: a simple 1-cell Lift Test gets you results sooner
- Cons: you won't measure the lift of your prospecting campaigns
- Example result: "By spending $10,000 on retargeting, you got 100 incremental conversions"
- Option B: Make a 2-cell Lift Test, where cell A contains a copy of all of your campaigns, and cell B contains identical copies of all of your campaigns without the retargeting campaigns. Now, you are measuring lift effect of two different scenarios: with and without retargeting.
- Pros: you measure the lift of prospecting campaigns as well
- Cons: a 2-cell Lift Test requires more spend to get results
- Example result: "Prospecting alone spent $5,000 and generated 500 incremental conversions = $10 per incremental conversion. By spending $4,000 on prospecting and $1,000 on retargeting, you got 450 incremental conversions = $11.11 per incremental conversion."
Generally, it's safer to compare holistic approaches in your Lift Tests. This means including all Facebook campaigns in the Lift Test, across the funnel. It is dangerous to leave some campaigns out of the Lift Test. For example: assume you are a business with retail outlets, and you want to compare the incrementality of video ads versus link ads. On top of those tactics, you also run regular website visitor retargeting campaigns. If you leave the retargeting campaign outside of the lift test, it will probably "help" the link ad campaign more, because that campaign is generating website visitors to retarget in the first place. The video campaign might be effective in driving customers to your outlet, but won't be helped by the retargeting campaign. However, with the retargeting campaign outside of the Lift Test, the test results are not taking this unfair advantage (and extra ad spend) into account when comparing results.
- Measuring your overall lift:
- Cell A: a copy of your video ad campaign + a copy of your retargeting campaign
- Cell B: a copy of your link ad campaign + a copy of your retargeting campaign
- [no campaigns active outside the Lift Test]
- Measuring the lift of adding a video/link ad campaign on top of retargeting campaign:
- Cell A: a copy of your video ad campaign
- Cell B: a copy of your link ad campaign
- [retargeting campaign left outside the Lift Test]
Best practices, tips and tricks
Create a separate objective for each conversion event
Even if you can't measure statistically significant lift for the purchase event, you might have enough data for the add-to-cart event! Getting results for that upper funnel event might be better than nothing.
If you include both Pixel purchases and app purchases, create three objectives: one for each conversion event alone, and one with both together to get results on the total purchases.
Don't make the observation period longer than necessary
During the observation period, your campaign will deliver normally but the lift study audience is no longer updated. Conversions that happen during the observation period are included in the results only if the user was targeted (or in the case of control group, considered for delivery) during the test period.
If you use an observation period, you should pause the campaigns when the observation period starts. You can then use the observation period to include conversions that are caused by your ads but happen after the test period has ended.
It is better to make the observation period too short rather than too long. After the campaign ends and its causal effect wears out, both the treatment and the control group will collect conversions at the same rate. Eventually, if you continued the observation period indefinitely, the number of conversions in the two groups would level out completely. This would turn a test with a statistically significant difference to a test with no result.
A good rule of thumb for the duration of the observation period is the time it takes for half of the conversion caused by your campaigns to happen. Typically this means a day or two, rarely more than one week.
If the conversion delay is very long, make the Lift Test itself longer rather than increasing the duration of the observation period.
If the lift is 10%, what does it mean?
10% lift means that during the lift study you got 10% more conversions from the reached audience than you would have gotten, if you had not shown those people any ads.
In particular, it does not mean that by spending more and more, you will always get the same 10% lift. Facebook tries to give you the cheapest conversions first, so increasing your spend will have diminishing returns as Facebook needs to expand the reach to people less likely to convert. The unreached audience is most likely worse than the audience you have reached so far.
How big of a control group (in %) should I choose?
It is a trade-off: a smaller control group means you have to gather more data to get statistically significant results. A 50%–50% split gives you results fastest. Thus, it all comes down to the Power Analysis: how much budget and time do you have for the Lift Test, and how big or small lift can you expect there to be? Read more in the article on Lift Test Power Analysis.
Which users are included in the population shown in Lift Test results?
The Lift Test populations are defined by a complex process, for the purpose of including only relevant people in the test to keep signal-to-noise ratio high, while also ensuring that the treatment and control populations stay comparable.
- The populations can only contain people that match the audience targetings of your test campaigns
- Only people who really would have been delivered an ad, will be included:
- The user is logged onto Facebook
- They match the audience targeting
- Your bid is competitive
- Only after this, Facebook checks if the user belongs to the treatment or control group
- If the user is in the control group, they are not shown an ad, but they are added to the control population
- If the user is in the treatment group, your ad will enter the auction
- If your ad won the auction, and the ad was visible on screen, the user will be added to the reached treatment population
- If you lost the auction, or the ad was not visible, the user is added to the unreached treatment population
In other words, Facebook only includes a user to the populations, if they were under consideration for ad delivery. This helps by making the share of reached users higher in the treatment population, while also ensuring that similar people are selected to the treatment and control populations. This makes for a better signal-to-noise ratio, and you get reliable results faster!
During the observation period, when the test period has ended, Facebook stops gathering these populations, but still keeps measuring conversions from them.
If my campaigns use a dynamic audience, which conversions are included in the results?
A person must be included in the experiment population before their conversions count in the experiment. For example, consider a customer who
- buys something on the first day of the lift test
- is considered for an ad (but not shown one) on the second day, and
- buys again on the third day.
This customer would be included in the population on the second day, and only the conversion on the third day would be counted in the lift test results.
Lift Test results show different numbers than my campaign reporting. Why?
- Lift Tests do not care about attribution, quite literally. All and any conversions happening in the treatment and control audiences are counted. This differs from regular ads, where only conversions happening 1, 7 or 28 days after seeing or clicking ads are counted.
- The time window is also longer in Lift Tests: conversions are counted from the whole Lift Test period (including the observation period). This differs from regular campaigns where the conversion window can be shorter (1 or 7 days).
- In case your campaigns already delivered before the Lift Test started, conversions caused by those impressions will be shown in reporting (especially when using the 7-day attribution window). However, the Lift Test only starts measuring the users exposed to ads when the test starts.
- The Lift Test objectives can be set to track different Pixels or different apps than your reporting views. For regular reporting, you can select which Pixels to track in Account Settings > Pixels. For each campaign, you can separately select which apps to track conversions in. For example, if your campaign's goal is to only promote one of two e-commerce stores you operate, it should only track conversions from the Pixel/app of that store.
- Similarly, when you specify a Lift Test objective, you can choose which Pixel or app that event should come from (see below). To avoid surprises, make sure your selections for campaign app tracking, default Pixel tracking and lift test objectives align!
Lift Test results show different numbers in Smartly.io and Ads Manager's "Test & Learn" section. Why?
- Check your settings
- There are 2 types of Lift Test results available: "basic" and "advanced results". Check that you are using the same setting in both platforms.
- Facebook data discrepancy
- While this shouldn't normally happen, in some cases Ads Manager ("Test & Learn") might show slightly different numbers from those shown in Smartly.io.
- Smartly.io gets its numbers directly from the Facebook Ads API, and we synchronize the numbers several times per day. However, we stop syncing the results 10 days after the test has ended. In some rare cases, Facebook has updated the test results even after this.
- This could happen for a few reasons: 1) Facebook needs to process a lot of data for Lift Tests and the pipeline could occasionally get stuck. 2) If there were some data connection issues during your Lift Test, Facebook could backfill the conversions later from a data file (such as in Offline Conversions).