Guide

A/B Testing Glossary: 50+ Key Terms Every CRO Professional Should Know

Master the essential vocabulary of conversion optimization and transform your tests into measurable results

Bichoy B. CRO Expert & Conversion Optimization
July 2, 2026 14 min de lecture
A/B Testing Glossary: 50+ Key Terms Every CRO Professional Should Know

Conversion rate optimization has its own language — and if you don't master it, you risk misinterpreting your results, misaligning your teams, and making decisions based on faulty assumptions. Whether you're a CRO beginner seeking to decode your first experimentation report, or a seasoned marketer wanting to standardize vocabulary across your organization, this comprehensive glossary covers every essential term in the A/B testing and experimentation ecosystem. Bookmark it, share it with your team, and return to it whenever a concept needs clarification.

Fundamental A/B Testing Concepts

A/B Test (Split Test) : A controlled experiment in which two versions of a single variable — a web page, an email subject line, a CTA button, or any other element — are presented simultaneously to different segments of your audience to determine which performs best on a defined metric.

Control (Variant A) : The original, unmodified version of the tested element. It serves as the reference point against which all other variants are measured. Every experiment must have a clearly defined control to produce valid comparisons.

Variant (Variant B, C, D…) : The modified version(s) of the tested element. Each variant differs from the control on at least one specific point — a different headline, color, layout, or text. When multiple variants are tested simultaneously, the experiment becomes a multivariate test.

Hypothesis : A structured and falsifiable prediction that articulates the change made, the reason you expect a performance improvement, and the measured metric. A strong hypothesis follows this structure: "If we [change X], then [metric Y] will [increase/decrease] because [behavioral justification Z]." Weak hypotheses produce inconclusive tests.

PRO TIP: WRITE YOUR HYPOTHESIS BEFORE YOU BUILD
Teams that document their hypotheses before launching tests are significantly more likely to extract actionable learnings — even from losing variants. The discipline of articulating your reasoning forces clarity and prevents post-hoc rationalization of results.

Statistical Terms You Must Master

Statistical Significance : A threshold that indicates the probability that the observed difference between your control and variant is not due to chance. Expressed as a p-value, significance is typically set at 95% (p < 0.05), meaning there is less than a 5% probability the result is due to chance. Declaring a winner before reaching significance is one of the most common and costly mistakes in CRO.

P-Value : The probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true. A p-value of 0.03 means there is a 3% chance the observed difference is due to random variation. Lower p-values indicate stronger evidence against the null hypothesis.

Confidence Interval (CI) : A range of values in which the true effect size is expected to fall with a given probability (e.g., 95%). A narrow confidence interval suggests a more precise estimate; a wide interval signals high variability and may require a larger sample.

Statistical Power : The probability that a test correctly detects a real effect when it exists. Typically set at 80%, power depends on sample size, effect size, and significance threshold. Low-power tests produce high rates of false negatives — you miss real improvements.

Type I Error (False Positive) : Incorrectly concluding that a variant outperforms the control when no real difference exists. Controlled by your significance threshold (alpha level).

Type II Error (False Negative) : Failing to detect a real improvement that actually exists. Controlled by your statistical power (beta level).

Null Hypothesis : The default hypothesis that there is no difference between the control and the variant. Your experiment attempts to gather enough evidence to reject this hypothesis.

95%
Standard confidence threshold for declaring a winner
80%
Minimum recommended statistical power per test
2–4 weeks
Typical minimum duration to capture weekly traffic cycles

Experimentation Metrics and KPIs

Conversion Rate (CR) : The percentage of visitors who complete a desired action (purchase, signup, form submission, etc.) divided by the total number of visitors. This is the primary metric in most A/B tests and the foundation of CRO work.

Primary Metric (Objective Metric) : The most important KPI your test is designed to move. Each experiment should have exactly one primary metric to avoid the multiple comparisons problem. Secondary metrics provide additional context but should not drive the final decision.

Secondary Metrics (Guardrail Metrics) : Complementary KPIs monitored to ensure a winning variant doesn't negatively impact other important business outcomes. For example, a variant that increases add-to-cart rate but reduces average order value may not constitute a net gain.

Average Order Value (AOV) : The average monetary value of transactions over a given period. A critical metric for e-commerce A/B tests, particularly when optimizing upsell flows, price displays, or bundle offers.

Revenue Per Visitor (RPV) : Total revenue divided by the total number of visitors. RPV is often preferred over conversion rate in e-commerce contexts because it captures both conversion rate and order value simultaneously, giving a more complete picture of variant performance.

Bounce Rate : The percentage of visitors who leave a page without further interaction. While not always a primary metric, a significant increase in bounce rate on a variant can signal a negative user experience worth investigating.

Click-Through Rate (CTR) : The ratio of users who click on a specific element (CTA, link, image) to the total number of users who saw it. Commonly used as a primary metric when testing above-the-fold elements or email campaigns.

Design Terms and Testing Methodology

Sample Size: The number of visitors (or sessions) required in each variant to obtain reliable statistical results. Insufficient sample sizes lead to underpowered tests and unreliable conclusions. Use a sample size calculator before launching any experiment to avoid premature conclusions.

Traffic Allocation: The percentage of total site traffic assigned to an experiment and the distribution across variants. A 50/50 split between control and one variant is the statistically most efficient allocation for a standard A/B test.

Randomization: The process of assigning visitors to control or variant groups without bias. Proper randomization ensures that the only systematic difference between groups is the tested variant, making causal inference valid.

Segmentation: Division of your audience into sub-groups based on attributes (device type, traffic source, geography, behavior) to analyze how different segments respond to variants. Segment-level insights can reveal opportunities invisible at the aggregate level.

Novelty Effect: A temporary change in behavior caused by the novelty of a variant rather than its true superiority. Users may interact differently with a new design simply because it is unfamiliar. Running tests long enough to move past initial novelty responses is essential for accurate results.

Seasonal Bias: Distortion of test results caused by running experiments during periods of atypical traffic (promotional events, holidays, etc.) that do not represent normal user behavior. Always account for your testing calendar against your business cycles.

Multivariate Testing (MVT): An experiment that tests multiple variables and their interactions simultaneously. Unlike A/B tests, MVT reveals which combination of changes produces the best result. It requires significantly more traffic to achieve statistical significance.

AA Test: A test in which both variants are identical (control vs control). Used to validate that your testing tool correctly randomizes traffic and that your baseline conversion rate is stable before launching real experiments.

WARNING: MONITOR RESULTS DURING TESTING
Checking results before reaching your predetermined sample size and stopping early when you see a "winner" is called peeking — and it dramatically inflates your false positive rate. Always set your stopping criteria before launching a test.

CRO Process and Strategy Terms

Conversion Rate Optimization (CRO): The systematic process of increasing the percentage of website visitors who complete a desired action. CRO combines quantitative data (analytics, heatmaps), qualitative research (user interviews, surveys) and controlled experimentation to deliver evidence-based improvements.

Experimentation Roadmap: A prioritized backlog of planned tests, organized by expected impact, ease of implementation and strategic alignment. A well-maintained roadmap ensures your testing program runs continuously and accumulates learnings over time.

ICE Score: A prioritization framework that ranks test ideas by Impact (potential effect on the primary metric), Confidence (certainty that the change will work) and Ease (implementation effort). Each dimension is scored from 1 to 10 and averaged. Other popular frameworks include PIE (Potential, Importance, Ease) and PXL.

Test Velocity: The number of experiments launched per unit of time (typically per month or quarter). Higher test velocity, combined with appropriate rigor, accelerates the rate at which an organization accumulates optimization learnings and compounds performance gains.

Winning Variant: The variant that statistically outperforms the control on the primary metric at the predetermined confidence level. A winning variant must be implemented permanently and its learnings documented for generating future hypotheses.

Inconclusive Test: A test that does not reach statistical significance within the allocated time or sample size. Rather than a failure, inconclusive tests provide valuable insights: the tested change likely has a negligible effect on the metric, or the hypothesis needs to be refined.

Velocity vs. Quality of Experiments: A common tension in CRO programs. Running many low-quality tests produces noise; running too few with heavy effort creates bottlenecks. The optimal balance depends on available traffic, team capacity, and organizational maturity.

User Experience and Behavioral Terms

Heatmap: A visual representation of user interaction data on a web page, showing where users click, move their cursor, or scroll. Heatmaps are qualitative research tools used to generate hypotheses for A/B tests, not to validate them.

Session Recording: A replay of an individual user's journey on your website, capturing mouse movements, clicks, scrolls, and form interactions. Session recordings are valuable for identifying friction points and unexpected user behaviors that inform test hypotheses.

Friction: Any element of the user experience that creates cognitive load, confusion, or resistance, reducing the likelihood of conversion. Friction can be visual (cluttered layout), functional (slow load time), or psychological (unclear value proposition).

Cognitive Bias: Systematic patterns in human thinking that influence decision-making, often in predictable ways. CRO professionals leverage biases such as social proof, scarcity, anchoring, and loss aversion to design more persuasive experiences.

Above the Fold: The portion of a web page visible to users without scrolling. Above-the-fold elements receive disproportionate attention and are priority candidates for A/B testing, particularly headlines, hero images, and primary CTAs.

Social Proof: Evidence that other people have had a positive experience with a product or service (reviews, ratings, testimonials, user count). Social proof is one of the highest-leverage elements to test on product pages and checkout funnels.

Call to Action (CTA): A button, link, or prompt that directs users toward a desired conversion action. The text, color, size, placement, and surrounding context of the CTA are among the most frequently tested elements in CRO programs.

Technical and Implementation Terms

JavaScript Snippet / Tag: A small piece of code inserted into a website's HTML that enables an A/B testing platform to serve different variants to visitors. Most modern testing tools are deployed via a single asynchronous JavaScript tag.

Flicker Effect: A brief visual flash that occurs when the original page loads before the variant's CSS or JavaScript modifications are applied. Flicker degrades user experience and can introduce bias into test results. It is mitigated by loading the test snippet synchronously or using anti-flicker scripts.

Server-Side Testing: An A/B test implemented at the server level, where variant logic is executed before the page is delivered to the user. Server-side testing eliminates flicker, enables deeper personalization, and is preferred for testing application logic, pricing, or algorithm changes.

Client-Side Testing: An A/B test implemented in the browser via JavaScript after the page loads. Faster to deploy and requires no developer intervention for most changes, making it the default approach for visual experiments on landing pages and product pages.

Feature Flag: A software development technique that allows teams to enable or disable features for specific user segments without deploying new code. Feature flags are a foundational tool for server-side experimentation and progressive rollouts.

Personalization: The delivery of content, offers, or experiences dynamically tailored to individual users or segments based on behavioral, demographic, or contextual data. Personalization and A/B testing are complementary disciplines — testing validates which personalized experiences generate the most value.

"The goal of CRO is not to run more tests — it's to make better decisions faster. Every term in this glossary represents a decision point where rigor separates winners from noise."

Advanced Experimentation Concepts

Bayesian Statistics: An alternative statistical framework to frequentist methods (p-values) that incorporates prior knowledge and continuously updates probability estimates as data accumulates. Bayesian testing allows more flexible stopping rules and produces results expressed as probability of being best rather than significance thresholds.

Frequentist Statistics: The traditional statistical approach used in most A/B testing platforms, based on p-values and fixed sample sizes. Frequentist methods require predetermined sample sizes and significance thresholds to maintain valid error rates.

Sequential Testing: A statistical method that enables continuous monitoring of results with controlled false positive rates, solving the peeking problem inherent in fixed-horizon frequentist tests. Sequential testing is increasingly adopted by mature experimentation programs.

Interaction Effects: When two or more competing tests influence the same users, their combined effect may differ from that of each test run in isolation. Interaction effects are a key risk in high-velocity testing programs and require careful experiment planning or mutually exclusive groups.

Regression to the Mean: The statistical tendency of extreme results to move closer to the average over time. CRO professionals must be aware that a variant showing unusually large lift in early data may converge toward a more modest result as sample size increases.

Network Effects: In social or recommendation-based products, assigning users to different variants can create spillover effects where one user's experience is influenced by the variant their contacts are in. This violates the independence assumption of standard A/B tests and requires cluster-based randomization.

Experiment Documentation: The practice of systematically recording the hypothesis, setup, results, and learnings from each test in a shared repository. Organizations with strong documentation practices build institutional knowledge that compounds over time and avoids repeating failed experiments.

Conclusion

Mastering CRO terminology is not an academic exercise — it is a practical prerequisite for conducting rigorous experiments, communicating results clearly across teams, and building a culture of evidence-based decision making. Each term in this glossary represents a concept that, when misunderstood, can lead to wasted traffic, false conclusions, and missed revenue opportunities. Use this reference to audit your current vocabulary, align your team on shared definitions, and elevate the quality of every experiment you launch. The most effective CRO professionals are those who combine statistical rigor with behavioral intuition — and it starts with mastering the language fluently.

A/B TestingCROConversion Rate OptimizationMarketing GlossaryTerminologyConversion OptimizationStatistical TestingUX/UIAnalyticsDigital Strategy

Launch your first A/B tests in less than 10 minutes, without a developer.

View the complete glossary