ID Resolution (ID Stitching)

Statsig warehouse native natively supports resolving multiple IDs to one identified user, allowing you to easily expose an experiment on one identifier and analyze data coming from one to many mapped identities associated with that experimental unit. Common scenarios where this is used are:

Exposing logged out users and analyzing logged-in metrics like revenue or a funnel going from logged-out marketing page landing -> to a logged-in subscription purchase.
Utilizing one:many relationships, e.g. a single user owns multiple accounts. ID resolution lets you aggregate metric from the user’s mapped accounts. Note that you lose power when using this approach but it is statistically sound.

ID resolution is a common need in experimentation; generally the responsibility for this mapping is put onto data users or PMs running experiment analysis, which leads to inconsistent results and expensive query logic. Using Advanced ID Resolution streamlines the process, making it consistent and performant and allowing all users to point to trusted identity tables.

The Challenge: Connecting User Identifiers

A common challenge in experimentation is linking user identifiers before and after an event boundary—most often, signups. Experimenters usually have a logged-out ID (e.g., a cookie or Statsig stableID) and, for users who sign up, a userID created afterward. Since business metrics are typically computed at the userID level, teams often want to randomize on logged-out identifiers but measure outcomes on logged-in metrics like revenue or LTV. Most platforms require manual joins or preprocessing to connect these identifiers, leading to complex, error-prone queries that must reconcile exposures across time and mapping tables. Statsig Warehouse Native eliminates this overhead with an automatic, no-code way to connect identifiers across these boundaries—centralized, consistent, and reproducible.

Mapping Modes

When using ID resolution, you can choose from one of three modes:

Strict 1:1 mapping enforces that identities have a singular mapping. If you have a mapping between two IDs that are always 1:1, this mode enforces that the mapping is singular and warns you if there’s data where that’s to the case. Users with a single identity can use downstream metrics from the secondary identity, and multi-mapped users are considered corrupted and discarded from the analysis.
First-touch mapping is a way to attribute activities of secondary ID(s) to one primary ID by recognizing the treatment effect comes from the first time the user is exposed to the experiment.
Last-touch mapping is a way to attribute activities of secondary ID(s) to one primary ID by recognizing the treatment effect comes from the most recent time the user is exposed to the experiment.

Strict 1:1 Mapping

All potential mappings between identifiers within the experiment date range, on the exposed population, are collected. If the primary ID has multiple secondary IDs, or vice versa, it is considered polluted and dropped from the analysis. Choosing this mode will change the exposures on the primary ID as it disqualifies any records outside of a 1:1 mapping.

First Touch Mapping (Mixed Population)

The direction of first-touch mapping will be based on the experiment; all secondary IDs resolve to 1 primary ID, and a single primary ID can have multiple mapped secondary IDs. Data is attributed to the group of the first associated primary ID seen in the exposure. If a secondary ID has multiple associated primary IDs, the group of the first primary ID will be used. Note that this means users that cross groups are not discarded from analysis but instead are assigned based on the the first experience they had. Primary ID records that are associated with another Primary ID, but are not the first observed records, are dropped from the analysis. If a user is exposed twice on different primary IDs that resolve to the same secondary IDs, only the primary ID metrics from the first-exposed user will be kept in the analysis.

Last Touch Mapping (Mixed Population)

Same as first touch but data is attributed to the most recent primary ID.

What does Mixed Population mean?

Both first-touch and last-touch mapping show pulse results based on mixed population. This means that each metric will be based on the corresponding population of the unit type of itself. For example, if you have an experiment that randomizes on Stable ID and the scorecard metrics are a mixture of Stable ID and User ID, for Stable ID metrics, pulse will use the “raw” exposure population of the experiment as it is true to the randomization process. For User ID metrics, pulse will use the resolved population depending on the mapping mode.

Explanation of Methodology

Primary IDs are preferred over secondary IDs if present in the data.

As the best practice, the unit type of analysis should match that of randomization. Statsig will always prefer the primary ID if present in the data. If the primary ID is not present, Statsig will use the most recent secondary ID.

Secondary IDs are only used to join metrics to exposures, but the unit of analysis is still the primary ID.

The unit counts of each metric’s results are calculated using the primary ID.

Many (primary) to one (secondary) mapping is handled through attributing the secondary ID to ONE primary ID.
One (primary) to many (secondary) mapping is implicitly handled by treating all secondary IDs as the same unit.

e.g. The metric value will be added in a sum metric or counted in a count metric.

Statsig supports a mixture of primary and secondary IDs in the same experiment.

You can use both primary and secondary IDs in the same experiment. For example, when you run a signup experiment, you can measure the session level metrics for the primary ID and the user level metrics for the secondary ID. To do this, Statsig maintains two populations - one for the primary ID and one for the secondary ID. Th primary ID population is the same as if you had only used the primary ID.

How to Enable ID Resolution in a Statsig Experiment

Setting up identity resolution in Statsig is very simple. You can either log or join data to provide both IDs on your assignment source, or provide one ID in the assignment source along with a mapping table between the IDs in the form of an Entity Property Source.

Using Property Source

To use Identity Resolution across experiments in your project, you will need a lookup table that has both the ID you are exposing on and the selected targeted ID. This table can be configured by setting up an Entity Property Source with both IDs present. Once that’s done, you can simply select this source when configuring your secondary ID type, and Statsig handles the join for you.

ID resolution source configuration interface

If you want to use a Statsig SDK to populate this table, you can log an event like a “Signup” event that has both the logged-out identifier and the user ID on the same event. Events sent via the Statsig SDK are written into your warehouse - and you can configure an Identity Resolution source on top of that using something like this -

Identity resolution configuration interface

Using Assignment Source

When creating an assignment source, provide a column for both ID types. It is assumed that your ‘Primary ID’ will be non-null for exposure records. Your secondary ID can be null. If your secondary ID is sparse (some records are null, and some are not due to logging), Statsig will back-attribute any identified secondary ID to other records from the same Primary ID.

When you create an analysis-only experiment or power analysis with this ID type, you can optionally select a Secondary ID. If you do so, you can now use metrics from either ID type in your analysis. For E2E experiments that use the Statsig SDK, this is configurable on the experiment setup page, under Advanced settings. Behind the scenes:

For metric sources with the primary ID, metrics will be joined to exposures based on that primary ID
For metric sources with only the secondary ID, metric will be joined to exposures based on that Secondary ID
If using strict mode, users with a duplicate mapping are dropped from analysis. Using first-touch, units use their first exposure record, and merge data from all mapped secondary IDs.

This works natively across Metric Sources, so you can easily set up funnel or ratio metrics across the two ID types. Analysis is done using the primary ID - this process associates metric values that are on an associated secondary ID.

Mapping Changes

If a change is made to the entity property source or assignment source’s definition or underlying data, that will be reflected on the next reload. This is why a full reload is required, since otherwise historical changes to the mapping can lead to inconsistent data on incremental reloads or explore queries.

Best Practices

We strongly recommend using an Entity Property Source to provide a cleaned unit mapping from your warehouse. However, you can also provide mappings on your exposure source by logging multiple identifiers in the exposure data - Statsig will greedily use this to match across identifiers. For both modes, an experiment can currently only have one mapped ID type - e.g. secondary_id->user_id, or secondary_id->account_id, but not both. All modes will require a full reload, so that there’s not data inconsistency due to historical mappings being changed or new mappings introduced. The property source or assignment source used to provide mappings will be filtered to records within the experiment’s date range. If a mapping is “evergreen”, or not scoped to a specific time period, you can omit the timestamp on the entity property source.

Example of a supported schema

if your assignment source data contains:
{stableID: 'unknown_123', exp_id: 'PDP Test', test_group: 'Control'} and your metric sources contain data that represents a metric as:
{userID: 'known_abc', event: 'page_load'} Your Entity Source or Assignment source must contain the secondary identity (in this case, userID) that will enable Statsig to join your assignment data with your metric data:
{stableID: 'unknown_123', userID: 'known_abc', country: 'USA'}

Considerations

Deduplicating records can lead to biased results, so Statsig preforms two extra health checks on this kind of experiment.

Statsig will check your deduplication rate and warn you if it is unusually high. It’s expected that some secondary IDs will have multiple logged-out IDs due to users using different devices or clearing browser history
Statsig will perform a chi-squared test evaluating if the deduplication rate is identical across arms of the experiment. In some cases, an experiment may cause more users to come back (for example an email resurrection campaign), in which case duplicates are expected to be more frequent in that arm and can be a positive outcome. In this case, you can perform first-touch attribution to maintain a common identifier

Introduction

Guides

Warehouse Integrations

Warehouse Management

Data & Semantic Layer

Experiment Analysis

Other Features

The Challenge: Connecting User Identifiers