spacer
s-channel "tb"
Observation of Single Top Quark Production

The DØ Collaboration

March 2009
spacer
t-channel "tqb"



        Summary

  On March 4, 2009, the DØ Collaboration submitted a paper to Physical Review Letters announcing the first observation of single top quark production.
  It was accepted for publication on July 20, and appeared in the journal on August 24, an "Editors' Suggestion" paper with a "Physics Synopsis" article.

        "Observation of Single Top-Quark Production"         arXiv:0903.0850         Phys. Rev. Lett. 103, 092001 (2009)

spacer
  The search history:
    •  "Evidence for Production of Single Top Quarks" March 2008 47 pages, TOPCITE=50+
    •  "Evidence for Production of Single Top Quarks and First Direct Measurement of |Vtb|" December 2006 SPIRES TOPCITE=100+
    •  "Multivariate Searches for Single Top Quark Production with the DØ Detector" April 2006 27 pages
    •  "Search for Single Top Quark Production in pp Collisions at √s = 1.96 TeV" May 2005 SPIRES TOPCITE=50+
    •  "Search for Single Top Quark Production at DØ Using Neural Networks" June 2001 SPIRES TOPCITE=50+
    •  "Search for Electroweak Production of Single Top Quarks in pp Collisions" August 2000 SPIRES TOPCITE=50+


        Contents

Abstract
Event Selection
Systematic Uncertainties
Signal-Background Discrimination
Combination of Results
Signal Plots
CKM Matrix Element Vtb
Observation Summary
Talks
In the News
More Images for Talks
Contacts

(Click on a link to jump down the page)


        Abstract

We present the results of a search for single top quark production in 2.3 fb–1 of data at the Fermilab Tevatron proton-antiproton collider at 1.96 TeV center-of-mass energy. The events have an isolated high transverse-momentum electron or muon, together with missing transverse energy from the decay of a W boson from the top quark decay, and one or two bottom-quark jets. Some events have an additional light-quark jet. The predicted cross section for this process is 3.46 ± 1.8 pb for a top quark mass of 170 GeV. Our measurement is:

σ(pp → tb + X, tqb + X) = 3.94 ± 0.88 pb

where "tb" stands for tb + tb production, and "tq" stands for tqb + tqb production. The probability to measure a cross section at this value or higher in the absence of signal is 2.5 × 10–7, corresponding to a 5.0 standard deviation significance for the presence of signal. This is considered an unlikely enough occurrence (1 in 4 million) that our measurement meets the standard to be called an observation of a new physics process. The results of our analysis are illustrated in the plot to the right.

We use the cross section measurement to make a direct measurement of the size of the CKM quark-mixing matrix element Vtb and find |Vtb f1L| = 1.07 ± 0.12, and when the strength of the left-handed scalar coupling f1L=1, we find |Vtb| > 0.78 at the 95% confidence level.
spacer


        Parameters for the Measurement

The analysis is performed using a top quark mass of 170 GeV, which is close to the published world average value (171.2 GeV, Particle Data Book 2008 edition). The theoretical predictions for the cross sections of the two modes of single top quark production are 1.12 ± 0.04 pb for the s-channel tb mode and 2.34 ± 0.12 pb for the t-channel tqb mode (N. Kidonakis, Phys. Rev. D 74, 114012 (2006) with NNNLO-NLO matching, for mtop = 170 GeV and MRST2004 NNLO parton distribution functions). We do not use these values in the tb+tqb cross section measurement directly, but we assume the SM ratio of the processes, 1.12 : 2.34 = 1 : 2.1 when measuring the signal acceptance, selecting events used to train the BDT and BNN discriminants, and generating pseudo-datasets for the linearity tests.

        Event Selection

The measurement uses data that pass almost any online trigger for maximum efficiency. We require between two and four jets, exactly one high transverse momentum electron or muon isolated from all jets in the event, and high missing transverse energy. One or two of the jets must be b-tagged. We model the single-top signal using the COMPHEP-SINGLETOP event generator coupled to PYTHIA for the underlying event and jet fragmentation. We model the tt, W+jets, and Z+jets using ALPGEN with PYTHIA using parton-jet matching. The small diboson (WW, WZ, ZZ) backgrounds are modeled with PYTHIA and the small multijet backgrounds where a jet has faked an electron, or a muon from b decay has traveled wide of its jet, is modeled using data. The tt, Z+jets, and diboson backgrounds are normalized to the theory cross sections, and the W+jets and multijets backgrounds are normalized to data. The resulting event yields are shown in the tables below. The proportions of signal and background predicted in the data before and after b-tagging are shown in the pie charts. After event selection, the signal acceptances (percentage of total cross section that pass the cuts) are (3.7 ± 0.5)% for the s-channel tb process and (2.5 ± 0.3)% for the t-channel tqb process. (The t-channel process has a lower acceptance because the second b-jet has low transverse momentum and is difficult to identify. These acceptances are ~18% higher than in our previous analysis, mainly because of the change in choice of triggers from lepton+jets ones only to allowing data events to pass almost any trigger. This analysis uses 85 million Monte Carlo events. After event selection, we have 0.5 million MC signal events, 1.4 million W+jets events, 1.6 million tt, a few hundred thousand Z+jets and diboson events, (4.1 million MC events in total), and 0.8 million pretagged multijets data events (31 thousand with b-tags).

spacer spacer spacer spacer

        Yields Before b-Tagging

spacer spacer spacer

        B-Jet Identification

We use a neural network b-tagging algorithm with two cut-points. The "tight" b-tagging ID used for single-tagged events has an efficiency of 40% for identifying b-jets, with a 9% probability to tag c-jets and 0.4% for light-quark jets. The "loose" b-tagging ID used for double-tagged events has an efficiency of 50% for identifying b-jets, with a 14% probability to tag c-jets and 1.5% for light-quark jets. (These efficiencies include the losses from the incomplete geometric acceptance of the Silicon Microstrip Tracker. For jets within the SMT acceptance, the efficiencies are 47% b's), 10% (c's), and 0.5% (light-jets) for tight tagging, and 58% (b's), 17% (c's), and 1.8% (light-jets) for loose tagging.) To model the b-tagging in Monte Carlo events we parametrize the efficiency in "tag-rate functions" as a function of jet transverse momentum and jet pseudorapidity separately for each jet flavor, and then apply these probabilities to every combination of jets in every MC event (using "the permuter") to obtain a tag probability for each event. Before b-tagging, the signal:background ratios vary from 1:300 in the 2-jets channel to 1:170 in the 4-jets channel (average 1:260). After b-tagging, this is improved to values ranging from 1:10 to 1:37, depending on the analysis channel, with the most powerful channel (2jets/1-tag) and the average of all channels having S:B = 1:20. spacer

        Yields After b-Tagging

spacer spacer spacer

        More Detailed Yield Information

spacer spacer

        Cross Checks

We perform the analysis in 24 independent analysis channels (Run IIa, Run IIb; electron, muon; 2,3,4 jets; 1,2 b-tags) to take advantage of the different signal:background ratios and dominant sources of background. In additional to checking the distributions of about 160 variables for data-background agreement in all analysis channels separately, before and after b-tagging , we also define two cross-check samples to check the background model components separately. The first sample has low total energy (exactly two jets and the total transverse energy HT(lepton,neutrino,alljets) < 175 GeV), and only one b-tagged jet, to maximize the W+jets content and minimize the top pairs contribution, and the second sample has high total energy (exactly four jets and HT > 300 GeV), and one or two b-tagged jets, to maximize the top pairs component and minimize the W+jets contribution. We find good agreement for both normalization and shape in all variables studied. The W boson transverse mass distribution is shown here as an example.

spacer spacer


        Systematic Uncertainties

The uncertainties in all searches are dominated by the statistical uncertainty from the size of the data sample. However, once there is enough data to observe and measure something, then systematic contributions to the total uncertainty become important. The total uncertainty on the single top cross section measured in this observation analysis is ±22%. When we perform the calculation without including any systematics, it is 18% (i.e., this is the statistical uncertainty). Thus, the systematic component of the total cross section is approximately 13%. We consider both normalization systematic uncertainties and shape-dependent systematic uncertainties separately for each signal and background source in each analysis channel. The overall background uncertainty varies between 7% and 15% for the individual channels. Shape uncertainties result in 20% to 40% uncertainties in the discriminant output region near one. The following two tables show the sources of systematic uncertainty included in this measurement, in ranked order of contribution to the total cross section uncertainty. Other potential sources of systematic uncertainty were studied and found to have a negligible effect.

spacer spacer


        Signal-Background Discrimination

We apply three methods to separate signal from background:
  • Boosted Decision Trees. A decision tree applies sequential cuts to the events but does not reject events that fail the cuts. Boosting averages the results over many trees and improves the performance by about 20%. We use 50 boosting cycles. The most important part of a boosted decision tree analysis is the choice of variables. We use the highest ranked (most discriminating) 64 variables from the 97 shown below (chosen to have good agreement between data and background in all analysis channels as well as having different distributions for signal and at least one background component). We use the same variables in all analysis channels.
  • Bayesian Neural Networks. A neural network is trained on signal and background samples to obtain weights between the network nodes and thresholds at the nodes. Bayesian neural networks average over a large number of such networks to improve the performance. We average over 100 networks. The most important part of a Bayesian neural network analysis is the choice of variables. We use the highest ranked 18–28 variables from the ones shown below, with a different set optimized for each analysis channel. The networks each have 20 hidden nodes.
  • Matrix elements. This method was pioneered by DØ in the top quark mass measurement. It uses the 4-vectors of the reconstructed lepton and jets (inlcuding the jet flavor information) and the Feynman diagrams for 2-jet and 3-jet events to compute an event probability density for the signal and background hypotheses. We use the matrix elements for 19 Feynman diagrams for signal and backgrounds (×2 for charge conjugate states) to separate them. To improve the performance, we split the data into events with low total transverse energy (that have mainly W+jets in the background) and events with high total transverse energy (that have most of the top pair background events).

        Cartoons of the BDT and BNN Techniques and an Example of BNN Averaging

spacer spacer spacer spacer

        The Matrix Elements

spacer spacer
spacer

        Discriminating Variables

These tables show the variables used by the boosted decision trees and the Bayesian neural networks. (Plots of all variables are at the bottom of the page.) Some comments on the notation are in order. The numbering n of jetn, tagn, lightn, etc. refers to the transverse momentum ordering of the jets, 1 is the highest pT jet of that type of jet, 2 is the second-highest pT jet, and so on. "tag" means a b-tagged jet. "light" means an untagged jet (it failed the b-tag criteria). "best" means the jet which, when combined with the lepton and missing transverse energy, produces a reconstructed top quark mass closest to 170 GeV (the value at which we did the analysis). "notbest" means any jet that is not the best jet. "alljets" means include all the jets in the event in the global variable (there are 2, 3, or 4 of them). pT is the transverse momentum. E is the particle energy. Q is the particle's charge. H is the scalar sum of the particles' energies. HT is the scalar sum of the transverse energies. M is the invariant mass of the objects. MT is the transverse mass of the objects. Sqrt(s^hat) is the total center of mass energy in the event. pTrel is the transverse momentum of the muon relative to the closest jet. S1 and S2 are the two solutions for the neutrino longitudinal momentum when solving the W boson mass equation, and S1 is the smallest absolute value of the two (the preferred value). MtopΔMmin is the reconstructed top quark mass using the jet and neutrino solution that make the mass closest to 170 GeV. ΔMtopmin is the difference in GeV between MtopΔMmin and 170 GeV. Mtopsig is the reconstructed top quark mass using the jet and neutrino solution that gives the lowest value for "significance," where Significancemin(Mtop) is loge of the jet and missing transverse energy resolution functions calculated at Mtop divided by the resolution functions at 170 GeV. ΔR is sqrt(Δφ2 + Δη2).

spacer spacer spacer spacer spacer

        Which Variables are Most Powerful?

spacer spacer

        Example Variable Distributions for Each Variable Category

  spacer spacer spacer spacer
spacer spacer spacer

        Discriminant Output Transformation

All raw discriminant output distributions undergo a monotonic transformation of the binning to ensure that every bin (50 in each distribution) has at least 40 background events, so that there are no bins with a nonzero signal prediction or data but not enough background in the model to use that information. The bins from the matrix elements outputs are then also reordered in descending signal:background ratio from 1 towards 0. After transformation, analysis channels with lower statistics do not have entries in all 50 bins, the filled bins start at one and end before reaching zero. The following plots illustrate the output transformation process for one channel in the boosted decision trees analysis.
spacer spacer spacer spacer

        Discriminant Performance

The following three plots on the right show that each of the discriminant methods is able to accurately measure the single top cross section. These plots were produced using eight ensembles of pseudo-data. Each pseudo-dataset contains signal and background events and their uncertainties that model the real 2.3 fb-1 dataset. Each ensemble contains thousands of pseudo-datasets. The difference between the samples is the cross section value chosen for the single top events. This input cross section is reproduced by each discriminant analysis, as illustrated in the left-hand plot for the boosted decision tree discriminants and the ensemble with SM signal cross section.

spacer spacer spacer spacer

        Discriminant Outputs for the Cross-Check Samples

spacer spacer  
spacer spacer spacer
spacer
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.