1 Introduction

Supersymmetry (SUSY) [1,2,3,4,5,6] is a generalisation of space-time symmetries that predicts new bosonic partners for the fermions of the Standard Model (SM) and new fermionic partners for its bosons. In SUSY models, if R-parity is conserved [7], SUSY particles are produced in pairs and the lightest supersymmetric particle (LSP) is stable. The scalar partners of the left- and right-handed quarks, the squarks \(\tilde{q}_{\textrm{L}} \) and \(\tilde{q}_{\textrm{R}},\) can mix to form two mass eigenstates \(\tilde{q}_1\) and \(\tilde{q}_2,\) ordered by increasing mass. SUSY can reduce unnatural tuning in the Higgs sector by orders of magnitude [8,9,10,11] provided that the superpartners of the top quark (the top squarks, \(\tilde{t} _1\) and \(\tilde{t} _2\)) have masses not too far above the weak scale [12]. Because of the SM weak-isospin symmetry, the mass of the lighter bottom squark \(\tilde{b}_1 \) is also expected to be close to the weak scale. The fermionic partners of the gluons, the gluinos \((\tilde{g}),\) are also motivated by naturalness [13] to have a mass around the TeV scale in order to limit their contributions to the radiative corrections to the top squark masses. For these reasons, and because the gluinos are expected to be pair-produced with a high cross-section at the Large Hadron Collider (LHC [14]), the search for gluino production with decays via top and bottom squarks is highly motivated at the LHC.

This paper presents a search for pair-produced gluinos decaying via top or bottom squarks in events with multiple jets containing b-hadrons (b-jets in the following), high missing transverse momentum of magnitude \(E_{\text {T}}^{\text {miss}}\), and potentially additional jets and/or an isolated electron or muon (referred to as ‘leptons’ hereafter). The results constitute an update of those obtained using 36.1 fb\(^{-1}\) of proton–proton (pp) collision data [15] from the ATLAS detector [16]. They exploit an expanded dataset of 139 fb\(^{-1}\) of pp collision data acquired at a centre-of-mass energy of 13 TeV. To make best use of the expanded dataset, the simple kinematic selections used in the earlier analysis have been re-optimised, and a new event selection based upon a deep neural network optimised to discern the gluino signatures from background is employed. The latter optimally combines selections requiring zero leptons or one lepton.

Interpretations are provided in the context of several simplified models [17,18,19] probing gluino decays into the LSP via off-shell top or bottom squarks. In these models, the LSP is assumed to be the lightest neutralino \(,\) a linear superposition of the superpartners of the neutral electroweak and Higgs bosons. One model also features the lighter charginos \(,\) which are linear superpositions of the superpartners of the charged Higgs and SM electroweak bosons. Several benchmark simplified model scenarios studied in the earlier instances of the analysis [15, 20] are considered: two models, referred to as ‘Gtt’ and ‘Gbb’ respectively, which feature exclusively gluino decays to the LSP via off-shell top or bottom squarks (Fig. 1), and a third model, referred to as ‘Gtb’, with variable branching ratios for , and / (Fig. 2 shows the additional decay processes that this model permits). In the Gtb models the mass difference between the and the is fixed to a small value (2 \(\text {GeV}\)), motivated by natural SUSY models in which the is an almost pure higgsino (see e.g. Ref. [12]).

Pair-produced gluinos with top-squark-mediated decays have also been sought in events containing either pairs of same-sign leptons or three leptons [21, 22]. The same-sign/three-leptons search is comparable in sensitivity to the search presented in this paper only when the masses of the gluino and the LSP are of similar magnitude. Sensitivity to such scenarios is also obtained by searching for events with large jet multiplicity, \(N_\text {jet} \ge \text {7--12}\) [23]. Similar searches for pair-produced gluinos by the CMS experiment with 36–137 fb\(^{-1}\) of 13 \(\text {TeV}\) collisions [24,25,26,27,28,29,30] produced results comparable to the previous ATLAS results.

This paper is organised as follows. Section 2 describes the ATLAS detector, and Sect. 3 describes the data and simulated event samples used in the analysis. Section 4 introduces the event reconstruction methodology, and Sect. 5 introduces the analysis strategy. The event selection is discussed in Sect. 6, and systematic uncertainties in Sect. 7. The results of the analysis are presented and interpreted in Sect. 8. Section 9 gives the conclusions.

Fig. 1
figure 1

The decay processes in the a Gtt and b Gbb simplified models

Fig. 2
figure 2

The additional decay processes permitted by the variable gluino branching ratio (Gtb) model, in addition to those shown in Fig. 1. In diagram a, both gluinos decay via with . In diagrams b and c, only one gluino decays via the while the other decays via or \(,\) respectively. In diagram d, one gluino decays via and the other via . In each case the charge conjugate processes are implied. The fermions originating from the decay have low momentum and are not detected because the mass difference between the and the is fixed to 2 \(\text {GeV}\)

2 ATLAS detector

The ATLAS detector is a multipurpose particle physics detector with a forward–backward symmetric cylindrical geometry and nearly 4\(\pi \) coverage in solid angle.Footnote 1 The inner tracking detector (ID) consists of silicon pixel and microstrip detectors covering the pseudorapidity region \(|\eta |<2.5,\) surrounded by a transition radiation tracker, which enhances electron identification in the region \(|\eta |<2.0.\) The ID is surrounded by a thin superconducting solenoid providing an axial 2 T magnetic field and by a fine-granularity lead/liquid-argon (LAr) electromagnetic calorimeter covering \(|\eta |<3.2.\) A stainless-steel/scintillator tile calorimeter provides coverage for hadronic showers in the central pseudorapidity range \((|\eta |<1.7).\) The endcaps \((1.5<|\eta |<3.2)\) of the hadronic calorimeter are made of LAr active layers with copper as the absorber material. The forward region \((3.1< |\eta | < 4.9)\) is instrumented with a LAr calorimeter for both the EM and hadronic measurements. A muon spectrometer with an air-core toroidal magnet system surrounds the calorimeters. Three layers of high-precision tracking chambers provide coverage in the range \(|\eta |<2.7,\) while dedicated fast chambers allow triggering in the region \(|\eta |<2.4.\) The ATLAS trigger system [31] consists of a hardware-based level-1 trigger followed by a software-based high-level trigger (HLT). In terms of software, an extensive suite [32] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Data and simulated event samples

The data analysed in this paper were collected between 2015 and 2018 at a centre-of-mass energy of 13 \(\text {TeV}\) with a 25 ns proton bunch crossing interval. The average number of pp interactions per bunch crossing (pile-up) ranged from 13 in 2015 to around 38 in 2017–2018. Application of beam, detector and data-quality criteria [33] results in a total integrated luminosity of 139 fb\(^{-1}\). The uncertainty in the combined 2015–2018 integrated luminosity is 1.7% [34], obtained using the LUCID-2 detector [35] for the primary luminosity measurements and cross-checked by a suite of other systems.

Events are required to pass an \(E_{\text {T}}^{\text {miss}} \) trigger [31, 36] with thresholds of 70 GeV, 100 GeV and 110 GeV in the HLT for the 2015, early 2016 and late 2016 / 2017 / 2018 datasets, respectively. These triggers are fully efficient for events passing the preselection defined in Sect. 6.1, which requires the offline reconstructed \(E_{\text {T}}^{\text {miss}} \) to exceed 200 \(\text {GeV}\)  [36].

Samples of Monte Carlo (MC) simulated events are used to model the signal and background processes in this analysis, except multijet processes, which are estimated using a data-driven method (Sect. 5). The MC simulation strategy is largely the same as that described in Ref. [15], with one exception, which is described below. A summary of the generators used can be found in Table 1.

Table 1 List of MC generators used to simulate different signal and background processes. The third column (‘Tune’) describes the tuned set of underlying event and hadronisation parameters, the fourth column the PDF set used, and the fifth column the perturbative accuracy in the strong coupling constant used for the calculation of the cross-section used to normalise the sample

The only significant change with respect to Ref. [15] concerns the modelling of the dominant background in the signal regions – namely the production of \(t\bar{t} \) pairs with additional high transverse momentum (\(p_{\text {T}}\)) jets. It was simulated using the  [54] v2 event generator and the [55] PDF set with \(\alpha _{\text {s}} (m_Z^2)=0.118.\) The parton shower, fragmentation, and the underlying event were simulated using  [56]. The \(h_{\textrm{damp}}\) parameter in , which controls the \(p_{\text {T}}\) of the first additional emission beyond the Born level and thus regulates the \(p_{\text {T}}\) of the recoil emission against the \(t\bar{t} \) system, was set to 1.5 times the mass of the top quark (assumed to be \(m_\textrm{top} = 172.5\) GeV) [57].

Other changes in generator settings for the modelling of sources of minor backgrounds were found not to affect the sensitivity of this analysis significantly. All simulated background processes are normalised using the best available theoretical calculation for their respective cross-sections.

The SUSY signal samples are normalised using the cross-section calculations at next-to-leading order (NLO) in the strong coupling constant, adding the resummation of soft gluon emission at next-to-leading-logarithm (NLL) accuracy [40,41,42,43,44]. The masses of the top and bottom squarks are assumed to be much greater than that of the gluino, such that gluinos decay according to 3-body phase-space. For the Gtb benchmark models, the mass difference between the and the is fixed to 2 \(\text {GeV}\). For each signal model, the nominal cross-section and its uncertainty are taken from an envelope of cross-section predictions using different parton shower models and factorisation and renormalisation scales, as described in Ref. [45].

The program [58] was used to describe the properties of the b- and c-hadron decays in the signal samples and in the background samples, except those produced with . For all samples the response of the detector to particles was modelled with the full ATLAS detector simulation [59] based on Geant4  [60]. All simulated events were overlaid with multiple pp collisions simulated with using the A3 tune [38] and the PDF set. The MC samples were generated with variable levels of pile-up in the same and neighbouring bunch crossings, and were reweighted to match the distribution of the mean number of interactions observed in data in 2015–2018.

4 Event reconstruction

Events are required to have a primary vertex [61, 62] reconstructed from at least two tracks [63] with \(p_{\text {T}} >500~\text {MeV}.\) Among the vertices found, the vertex with the largest summed \(p_{\text {T}} ^2\) of the associated tracks [62] is designated as the primary vertex.

Jets are built from topological clusters of energy in the calorimeter [64], calibrated to the electromagnetic scale, using the anti-\(k_t\) algorithm [65, 66] with radius parameter \(R=0.4.\) Jet energy scale corrections, derived from MC simulation and data, are used to calibrate the average energies of jet candidates to the scale of their constituent particles [67]. Remaining differences between data and simulated events are evaluated and corrected for using in situ techniques, which exploit the transverse momentum balance between a jet and a reference object such as a photon, Z boson, or multijet system in data. After these calibrations, all jets in the event with \(p_{\text {T}} > 30\) \(\text {GeV}\) and \(|\eta | < 2.8\) must satisfy a set of loose jet-quality requirements [68]. These requirements are designed to reject jets originating from sporadic bursts of detector noise, large coherent noise or isolated pathological cells in the calorimeter system, hardware issues, beam-induced background or cosmic-ray muons [69]. If these jet requirements are not met, the event is discarded. If the event is retained, only the jets with \(p_{\text {T}} > 30\) \(\text {GeV}\) and \(|\eta | < 2.8\) are considered by the analysis. In addition, the ‘medium’ working point of the track-based jet vertex tagger [70, 71] must be satisfied for jets with \(p_{\text {T}} <120\) \(\text {GeV}\) and \(|\eta |<2.5,\) to reject jets that originate from pile-up interactions.

Jets which contain b-hadrons and are within the inner-detector acceptance \((|\eta |<2.5)\) are identified and ‘b-tagged’ using a multivariate algorithm (‘MV2c10’) that exploits the impact parametersFootnote 2 of the charged-particle tracks, the presence of secondary vertices, and the reconstructed flight paths of b- and c-hadrons inside the jet [72]. The output of the multivariate algorithm is a single b-tagging score, which signifies the likelihood of a jet to contain b-hadrons. For the chosen selection working point applied to this score, the average identification efficiency for jets containing b-hadrons is \(77\%,\) determined with simulated \(t\bar{t} \) events. Using the same simulated sample, a rejection factor of approximately 110 (5) is obtained for jets initiated by light quarks and gluons (charm quarks). Differences in efficiency and mis-tag rate between data and MC simulation are taken into account with correction factors as described in Ref. [72].

Electron candidates are reconstructed from clusters of energy deposits in the electromagnetic calorimeter that are matched to a track in the inner detector [73]. They are required to have \(|\eta |<2.47\) and \(p_{\text {T}} >20\) \(\text {GeV}\), and must meet ‘Loose’ likelihood-based identification criteria [73]. The impact parameter along the beam direction is required to satisfy \(|z_{0} \sin (\theta ) | < 0.5\) mm. The electromagnetic shower of an electron can also be reconstructed as a jet such that a procedure (‘overlap removal’) is required to resolve this ambiguity. In the case where the separation \({\Delta }{R_{y}}\) (\({\Delta }{R_{y}}\equiv \sqrt{(\Delta y)^2+(\Delta \phi )^2},\) for rapidity y) between an electron candidate and a non-b-tagged (b-tagged) jet is \({\Delta }{R_{y}}< 0.2,\) the candidate is considered to be an electron (b-tagged jet). This procedure uses a b-tagged jet definition that is looser than that described earlier, to avoid selecting electrons from heavy-flavour hadron decays. If the separation between an electron candidate and any jet satisfies \(0.2<{\Delta }{R_{y}}< 0.4,\) the candidate is considered to be a jet, and the electron candidate is removed.

Muons are reconstructed by matching tracks in the inner detector to tracks in the muon spectrometer and must have \(|\eta |<2.5\) and \(p_{\text {T}} >20\) \(\text {GeV}\)  [74]. The impact parameter along the beam direction is required to satisfy \(|z_{0} \sin (\theta ) | < 0.5\) mm. Events containing muons identified as originating from cosmic rays, with \(|d_0| > 0.2\) mm and \(|z_0| > 1\) mm, or as being poorly reconstructed, with \(\sigma (q/p)/|(q/p)| > 0.2,\) are removed. Here, \(\sigma (q/p)/|(q/p)|\) is a measure of the momentum uncertainty for a particle with charge q. Muons are discarded if they are within \({\Delta }{R}= 0.4\) of jets that survive the electron–jet overlap removal, except when the number of tracks associated with the jet is less than three, where the muon is kept and the jet discarded. This last exception retains muons which have undergone bremsstrahlung.

After resolving the overlap with leptons, the candidate \(R=0.4\) jets are reclustered [75] into larger-radius jets using the anti-\(k_{t}\) algorithm with a radius parameter of 0.8. The calibration from the input \(R=0.4\) jets propagates directly to the reclustered jets. These reclustered jets are then trimmed [75,76,77,78] by removing subjets with \(p_{\text {T}}\) below 10% of the \(p_{\text {T}}\) of the original reclustered jet. The resulting larger-radius jets are required to have \(p_{\text {T}} >100\) GeV and \(|\eta |<2.0.\) No additional overlap removal procedure is applied to such jets after reclustering. When it is not explicitly stated otherwise, in this paper the term ‘jets’ refers to the smaller-radius \(R=0.4\) jets, while the reclustered larger-radius \(R=0.8\) jets are called ‘large-R jets’.

The requirements on electrons and muons are tightened for the selection of events in signal regions and background control regions requiring at least one electron or muon (Sect. 5). The electrons and muons passing the tight selection are called ‘signal’ electrons or muons in the following, as opposed to ‘baseline’ electrons and muons, which need only pass the requirements described above. Signal electrons (muons) must satisfy the ‘Fix (Loose)’ [73, 79] (‘FixedCutTightTrackOnly’ [74, 80]) \(p_{\text {T}} \)-dependent track-based and calorimeter-based isolation criteria. The calorimeter-based isolation is determined by taking the ratio of the sum of energy deposits in a cone of \(\Delta R=0.2\) around the electron or muon candidate to the sum of energy deposits associated with the electron or muon. The track-based isolation is estimated in a similar way but using a variable cone size with a maximum value of \(\Delta R=0.2\) for electrons and \(\Delta R=0.3\) for muons. Signal electrons are also required to pass a ‘Tight’ likelihood-based selection [73, 79]. The impact parameter of the electron in the transverse plane is required to be less than five times the transverse impact parameter uncertainty \((\sigma _{d_0}).\) Further selection criteria are also imposed on signal muons: muon candidates are required to pass a ‘Medium’ quality selection and meet a \(|d_0| < 3\sigma _{d_0}\) requirement [74, 80].

The missing transverse momentum \(\vec {p}_{\textrm{T}}^{\textrm{miss}},\) with magnitude \(E_{\text {T}}^{\text {miss}},\) is defined as the negative vector sum of the \(p_{\text {T}}\) of all selected and calibrated electrons, muons and jets in the event, with an extra term added to account for energy in the event that is not associated with any of these objects [81]. This last ‘soft term’ contribution is calculated from ID tracks with \(p_{\text {T}} >500~\text {MeV} \) which are matched to the primary vertex, thus ensuring that this contribution is robust against pile-up contamination, and which are not associated with selected objects [81, 82].

This analysis does not consider the contribution of reconstructed photons or hadronically decaying \(\tau \)-leptons when considering possible overlaps with other objects. They are also not included explicitly in the calculation of \(\vec {p}_{\textrm{T}}^{\textrm{miss}},\) but the associated energy deposits contribute to this calculation via the overlapping reconstructed jets.

5 Analysis strategy

Evidence for the presence of SUSY signal events in the data sample is sought by selecting events populating multiple signal regions (SRs), which are expected to be enriched in such events on the basis of simulation studies. The expected yields of major SM background processes in the SRs are determined with a profile likelihood fit (Sect. 8) using MC samples with normalisations constrained to data in dedicated SR-dependent control regions (CRs). The yields of subdominant backgrounds are estimated directly from MC samples, except in the case of multijet backgrounds, where a data-driven procedure is used. The accuracy of the background estimation procedure is verified by comparing data with background predictions in validation regions (VRs) with low signal contamination, which are located between the CRs and SRs in the multidimensional space of event selection variables.

Signal regions are defined using two alternative methodologies. The first methodology, the ‘cut-and-count’ (CC) analysis, defines SRs by applying selections independently to a series of observables sensitive to differences in kinematics between signal and background (‘discriminating variables’). These are expected to provide rejection of SM background events while retaining events from a broad range of Gtt, Gbb and Gtb signal models to provide maximum discovery power. This methodology is well suited to subsequent reinterpretation of the results in the context of other theories not considered in this paper. The CC SR event selection criteria fall into two broad categories targeting final states which contain no leptons or at least one lepton (referred to as ‘0-lepton’ and ‘1-lepton’ SRs henceforth). The second event selection methodology, the neural network (NN) analysis, classifies events using a supervised machine-learning technique in which correlations between discriminating variables are further exploited to maximise exclusion power for specific Gtt and Gbb signal models. The event selection for each NN SR optimally selects a mixture of events containing different lepton multiplicities, depending on the values of the other discriminating variables. A set of CRs and VRs is associated with each CC or NN SR. The event selection criteria are discussed in Sect. 6.

5.1 Discriminating variables

The following discriminating variables are used in the CC and NN SR, CR and VR event selections, in addition to more widely used variables such as \(E_{\text {T}}^{\text {miss}} \) and the momenta and multiplicities of jets, b-jets and leptons.

The ‘inclusive effective mass’ (\(m_{\textrm{eff}}\)), is defined by the following scalar sum:

$$\begin{aligned} m_{\textrm{eff}} = \sum _{i} p_{\text {T}} ^{{\textrm{jet}}_{i}} + \sum _{j} p_{\text {T}} ^{\ell _j} + E_{\text {T}}^{\text {miss}}, \end{aligned}$$

where the first and second sums run over the selected jets \((N_\text {jet})\) and signal leptons \((N_{\textrm{lepton}}),\) respectively. This variable is correlated with the invariant mass scale of the final-state particles in the event, and typically takes a higher value for pair-produced gluino events than for SM background events with lower mass scales.

In regions with at least one selected lepton, the transverse mass \(m_{\textrm{T}}\) is calculated from the \(p_{\text {T}} \) of the leading selected lepton \((\ell )\) and \(E_{\text {T}}^{\text {miss}},\) and is defined as:

$$\begin{aligned} m_{\textrm{T}}= \sqrt{2p_{\text {T}} ^\ell E_{\text {T}}^{\text {miss}} \{1-\cos [\Delta \phi (\vec {p}_{\textrm{T}}^{\textrm{miss}},\vec {p}_{\textrm{T}}^{\ell })]\}}. \end{aligned}$$

This variable is used to reject \(t\bar{t} \) and W+jets background events in which one W boson decays leptonically. The \(m_{\textrm{T}}\) distribution for these backgrounds, assuming on-mass-shell W bosons, has an upper bound corresponding to the W boson mass, leading to a Jacobian edge in the observed distribution, and typically has higher values for Gtt signal model events. In addition, the minimum transverse mass formed by \(E_{\text {T}}^{\text {miss}} \) and any of the three highest-\(p_{\text {T}}\) b-tagged jets in the event, \(m_{{\textrm{T, min}}}^{b{\text {-jets}}},\) is used in regions with any lepton multiplicity:

$$\begin{aligned} m_{{\textrm{T, min}}}^{b{\text {-jets}}} = {\textrm{min}}_{i\le 3} \left( \sqrt{2p_{\text {T}} ^{b{\text {-jet}}_i} E_{\text {T}}^{\text {miss}} \{1-\cos [\Delta \phi (\vec {p}_{\textrm{T}}^{\textrm{miss}}, \vec {p}_{\textrm{T}}^{b{\text {-jet}}_i} )]\}} \right) . \end{aligned}$$

The \(m_{{\textrm{T, min}}}^{b{\text {-jets}}} \) distribution has an upper bound corresponding to \(\sqrt{m_{\textrm{top}}^2 - m_W^2}\) for \(t\bar{t} \) events with a single leptonic W boson decay, while extending to higher values for signal events.

Another powerful variable for selecting signal events is the total jet mass variable, defined as:

$$\begin{aligned} M_{J}^{\Sigma } = \sum _{i\le 4} m_{J,i}, \end{aligned}$$

where \(m_{J,i}\) is the mass of large-R jet i in the event. The decay products of a high-\(p_{\text {T}} \) (boosted) hadronically decaying top quark can be reconstructed in a single large-R jet, resulting in a jet with a high mass. This variable typically takes larger values for Gtt signal model events than for background events, because the former can contain as many as four hadronically decaying top quarks while the latter typically contain a maximum of two.

The requirement of a selected lepton, with the additional requirements on jets, \(E_{\text {T}}^{\text {miss}} \) and event variables described above, makes the multijet background negligible for the \(\ge \) 1-lepton signal regions. For the 0-lepton signal regions, the distribution of the minimum azimuthal angle \({\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}\) between \(\vec {p}_{\textrm{T}}^{\textrm{miss}}\) and the \(\vec {p}_{\textrm{T}}\) of the four leading \(R=0.4\) jets in the event, defined as:

$$\begin{aligned} {\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}= {\text {min}}_{i\le 4} \left( |\phi _{{\textrm{jet}}_i} - \phi _{\vec {p}_{\textrm{T}}^{\textrm{miss}}}| \right) , \end{aligned}$$

peaks near zero for multijet background events in which large values of \(E_{\text {T}}^{\text {miss}} \) have been generated by poorly measured jets or by neutrinos emitted close to the axis of a jet. The distributions of \({\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}\) for signal events and other backgrounds are more uniformly distributed, reflecting the reduced correlation between the missing transverse momentum direction and the transverse momenta of the leading jets.

5.2 Kinematic reweighting of MC samples

Table 2 Definitions of the reweighting regions used to derive the \(m_{\textrm{eff}}\) reweighting factors applied to the MC samples. The \(N_{\textrm{lepton}}\) requirements apply to signal leptons. The Z and VV RR uses a definition of \(E_{\textrm{T}}^{\textrm{miss}}\) \((\hat{E}_{\textrm{T}}^{\textrm{miss}})\) that adds the lepton pair’s transverse momentum to the missing transverse momentum, to simulate \(Z\rightarrow \nu \nu \) events

In signal-depleted regions requiring the presence of exactly one lepton with loose event selections, discrepancies are observed in the shapes of \(p_{\text {T}} \)-related observables, such as \(m_{\textrm{eff}}\), \(M_{J}^{\Sigma }\) and \(E_{\text {T}}^{\text {miss}}\), between data and the MC background expectations. Similar discrepancies are also observed in other similar analyses, e.g. Refs. [57, 83]. No similar discrepancies are visible in the 0-lepton regions, or for events with \(\ge 2\) leptons. To reduce these discrepancies, a kinematic reweighting procedure is applied to simulated background events containing at least one signal lepton, prior to their use in the main analysis fits. Dedicated reweighting regions (RRs) with loose event selection criteria are defined for the \(t\bar{t} \) process, the single-top, \(t\bar{t} +W/Z/H\) and \({t\bar{t}}{t\bar{t}}\) processes, the W + jets process, and the Z + jets and electroweak diboson processes, as set out in Table 2. Requirements on the number of reconstructed b-jets \((N_{b{\text {-jets}}})\) are applied to ensure that the RRs are orthogonal to all analysis signal regions, which include a \(N_{b{\text {-jets}}} \ge 3\) requirement. Simulated events for these processes are first normalised to the total number of observed events in the respective RR. The ratio of data events to normalised MC events is then computed as a function of \(m_{\textrm{eff}},\) for exclusive bins of \(N_\text {jet} =4,5,6\text { and }\ \ge 7\) (for the \(t\bar{t} \) and W + jets RRs) or for \(N_\text {jet} \ge 4\) (for the single-top, \(t\bar{t} +W/Z/H\) and \({t\bar{t}}{t\bar{t}}\) RR, and the Z+jets and electroweak diboson RR). This ratio is found to be well-fitted with a decreasing exponential function of \(m_{\textrm{eff}} \) for each \(N_\text {jet} \) bin. The resulting fitted functions are then used to reweight MC simulated events with exactly one lepton. The reweighting factors typically take values between \(\sim \) 1.17 and \(\sim \) 0.19 for \(m_{\textrm{eff}} \) ranging from threshold to 4200 \(\text {GeV}\) for the \(t\bar{t} \) and W + jets processes, and between \(\sim \) 1.7 and \(\sim \) 0.43 for the same range of values of \(m_{\textrm{eff}} \) for the single-top, \(t\bar{t} +W/Z/H\) and \({t\bar{t}}{t\bar{t}}\) processes, and the Z + jets and electroweak diboson processes. The reweighting procedure reduces the discrepancies between data and MC background expectations in the validation regions used in the analysis (Sect. 8.1).

5.3 Background estimation

The dominant background process in most signal regions is the production of a \(t\bar{t} \) pair in association with heavy- and light-flavour jets. In both the CC and NN SRs the dominant contribution to this background arises from events in which exactly one of the top quarks decays via a W boson to a lepton and a neutrino. In selected background events containing no leptons, the lepton is outside the acceptance of the analysis or is a hadronically decaying \(\tau \)-lepton. A normalisation factor for the \(t\bar{t} \) background is extracted for each CC or NN SR from a CR which is defined for a similar but orthogonal region of kinematic phase space. The normalisation factors are derived by dividing the data event yields in the CRs by the equivalent MC predictions, and then applied as factors multiplying the event yields in the SRs predicted by MC simulation. This procedure is equivalent to propagating the CR event yields to the SRs by multiplying with transfer factors derived from MC simulation. Systematic uncertainties associated with the MC simulations used in this procedure are taken into account in the final fit (Sect. 7).

In the CC analysis the CRs are defined with a lepton multiplicity requirement \(N_{\textrm{lepton}}=1,\) for both 0-lepton and 1-lepton SRs, and a requirement on \(m_{\textrm{T}}\) inverted with respect to that used in the 1-lepton SRs [15]. In the NN analysis they are defined with orthogonal selections on the neural network output. In each CR, \(t\bar{t} \) production is the dominant process and the contribution from rare background processes, such as \(t\bar{t} +W/Z/H,\) or \(t\bar{t} t\bar{t},\) is low. At least 20 data events are also required in each \(t\bar{t} \) CR to minimise the data statistical uncertainties of the normalisation factors. The CRs are required to possess a signal-to-background ratio which is expected to be less than 5% for Gtt, Gbb and Gtb signal models near the expected 95% CL exclusion contours of the analysis. The CRs are also found to possess less than 10% potential contamination from signal models beyond the exclusion contours of previous analyses.

The normalised \(t\bar{t} \) background estimates extracted from the CRs are cross-checked with VRs that share similar background compositions with the SRs but use an orthogonal event selection. In the CC analysis the VRs incorporate an inverted requirement on one of the SR observables: \(M_{J}^{\Sigma },\) \(m_{\textrm{eff}} \) or \(E_{\text {T}}^{\text {miss}}.\) In the NN analysis the VRs apply an orthogonal requirement to the neural network output together with inverted requirements on \(M_{J}^{\Sigma } \) and \(m_{\textrm{eff}}.\) The signal-to-background ratio in the VRs is expected to be lower than 30% for benchmark Gtt, Gbb and Gtb signal points near the expected 95% CL exclusion contours of the analysis. The purity of the CRs and VRs in \(t\bar{t} \) events is expected to be higher than 50% and 33%, respectively.

The leading non-\(t\bar{t} \) backgrounds in this analysis consist of single-top, W+jets, Z+jets, \(t\bar{t} +W/Z/H,\) \(t\bar{t}t\bar{t}\) and electroweak diboson (VV) events, which are estimated using simulated samples (Table 1) normalised to theoretical cross-sections. There is one exception to this procedure – the Z + jets process makes a significant contribution to the total background in the NN Gbb SRs and is therefore normalised, for these SRs only, with dedicated 2-lepton CRs (Sect. 6.3). Due to their relatively low selection efficiency, these CRs contain fewer data events than the \(t\bar{t} \) CRs described above.

The remaining multijet background in the 0-lepton CC regions, and in the NN regions, is estimated following the strategy of Ref. [84]. This method estimates the multijet background from a CR with the same requirements as the SR, but with a selection requiring \({\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}< 0.1\) to enhance the yield of events in which the missing transverse momentum is correlated with the \(p_{\text {T}} \) of a leading jet. The yield is extrapolated from the multijet CRs to their corresponding SRs with exponential functions. The decay parameters of these functions are fixed by fits to the \({\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}\) distribution of events passing the \(E_{\text {T}}^{\text {miss}} \) trigger and a \(E_{\text {T}}^{\text {miss}} >200\) \(\text {GeV}\) requirement. The multijet background prediction is validated by comparing the data with the total prediction in the range \(0.1<{\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}< 0.2.\) The contribution of multijet background events to the SRs is found to be \(\lesssim 5\%.\)

6 Event selection

6.1 Preselection

Events used in the analysis are required to meet a set of loose preselection criteria. All events (except those used in the Z + jets CR described below) are required to possess at least four jets, of which at least three must be b-tagged, and \(E_{\text {T}}^{\text {miss}} > 200\) GeV, which ensures that the efficiency of the \(E_{\text {T}}^{\text {miss}} \) triggers used in this analysis is close to 100% for selected events. Events selected in the CC 0-lepton regions are additionally required to contain no baseline leptons and possess \({\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}}>0.4,\) while those selected in the CC 1-lepton regions are required to contain at least one signal lepton. No additional requirements on lepton multiplicity are applied at this stage in the NN analysis.

Figures 3 and 4 show the multiplicities of selected jets and b-tagged jets, and the distributions of \(E_{\text {T}}^{\text {miss}}\), \(m_{\textrm{eff}}\), \(m_{{\textrm{T, min}}}^{b{\text {-jets}}}\), and \(M_{J}^{\Sigma }\) for events meeting the 0-lepton and 1-lepton preselection criteria, respectively. The reweighting described in Sect. 5.2 is applied in the 1-lepton preselection to events with at least one lepton. The uncertainty bands depict the statistical and experimental systematic uncertainties, as described in Sect. 7, but not the theoretical uncertainties in the background modelling. Distributions for example SUSY signal models are overlaid for comparison.

Fig. 3
figure 3

Distributions of (top-left) the number of selected jets \((N_{\textrm{jet}}),\) (top-right) the number of selected b-tagged jets \((N_{b-{\textrm{jets}}}),\) (centre-left) \(E_{\textrm{T}}^{\textrm{miss}},\) (centre-right) \(m_{\textrm{eff}},\) (bottom-left) \(M_J^{\Sigma }\) and (bottom-right) \(m_{\textrm{T,min}}^{b-{\textrm{jets}}}\) for events meeting the 0-lepton preselection criteria. The statistical and experimental systematic uncertainties (as defined in Sect. 7) are included in the uncertainty bands. The final bin in each case includes overflow events. The lower panel of each figure shows the ratio of data to the background prediction. All backgrounds (including \(t\bar{t}\)) are normalised using the theoretical calculations described in Sect. 3. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. Distributions for example SUSY signal models, applying a normalisation scaling of 50, are overlaid for comparison

Fig. 4
figure 4

Distributions of (top-left) the number of selected jets \((N_{\textrm{jet}}),\) (top-right) the number of selected b-tagged jets \((N_{b-{\textrm{jets}}}),\) (centre-left) \(E_{\textrm{T}}^{\textrm{miss}},\) (centre-right) \(m_{\textrm{eff}},\) (bottom-left) \(M_J^{\Sigma }\) and (bottom-right) \(m_{\textrm{T,min}}^{b-{\textrm{jets}}}\) for events meeting the 1-lepton preselection criteria, after applying the kinematic reweighting to the \(m_{\textrm{eff}}\) distribution as described in the text. The statistical and experimental systematic uncertainties (as defined in Sect. 7) are included in the uncertainty bands. The final bin in each case includes overflow events. The lower panel of each figure shows the ratio of data to the background prediction before (red empty circles) and after (black filled circles with error bars) the kinematic reweighting. All backgrounds (including \(t\bar{t}\)) are normalised using the theoretical calculations described in Sect. 3. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. Distributions for example SUSY signal models, applying a normalisation scaling of 50, are overlaid for comparison

6.2 Cut-and-count analysis

The cut-and-count analysis employs a set of overlapping and not statistically independent single-bin SRs. The event selection criteria for the CC SRs, CRs and VRs are listed in Tables 3, 4, 5 and 6. The CC SR event selection criteria are optimised to maximise the expected significance of Gtt, Gbb and Gtb models close to the 95% CL exclusion contours in the \(m(\tilde{g}) \)\(m(\tilde{\chi }^0_1) \) mass plane set by the previous ATLAS search in this channel using a smaller dataset [15]. Separate SRs are defined for each of these three classes of models. The SRs are further categorised according to whether they target models with large \(\tilde{g} \) mass splitting (\(\Delta m = m(\tilde{g})- m(\tilde{\chi }^0_1) \gtrsim 1.5\) TeV, Regions B), moderate mass splitting (0.3 TeV  TeV, Regions M1 and M2) or small mass splitting ( TeV, Regions C). These regions differ mainly in the selections applied to the \(m_{\textrm{eff}},\) \(E_{\text {T}}^{\text {miss}},\) \(m_{\textrm{T}}\) and \(M_{J}^{\Sigma } \) variables.

For each SR, a CR is defined to constrain the \(t\bar{t} \) background (Sect. 5.3). The CRs for the 0-lepton SRs require the presence of exactly one signal lepton. The 1-lepton SRs and the associated CRs require at least one signal lepton, with the latter also implementing an inverted requirement on \(m_{\textrm{T}}.\) The CRs also place looser requirements on jet multiplicity and the other discriminating variables.

The VRs for the 0-lepton SRs use inverted requirements on \(M_{J}^{\Sigma },\) \(m_{\textrm{eff}} \) or \(E_{\text {T}}^{\text {miss}} \) to remove overlap with the respective SRs. For the 1-lepton SRs, two VRs are defined to validate the background prediction in high-\(m_{{\textrm{T, min}}}^{b{\text {-jets}}} \) and high-\(m_{\textrm{T}}\) regions by increasing the threshold for \(N_\text {jet} \) and/or inverting the \(M_{J}^{\Sigma } \) or \(m_{\textrm{T}}\) requirements to remove overlap with both the corresponding SR and the CR.

Table 3 Event selection requirements for the CC Gtt 0-lepton SRs together with the associated \(t\bar{t}\) CRs and VRs, classified according to the \(\tilde{g}\) mass splitting (\(\Delta m\)) targeted. The thresholds in bold for each control and validation region ensure orthogonality with the corresponding signal region. \(N_{\textrm{lepton}}=0\) requires zero baseline leptons, while \(N_{\textrm{lepton}}=1\) requires one signal lepton
Table 4 Event selection requirements for the CC Gtt 1-lepton SRs together with the associated \(t\bar{t}\) CRs and VRs, classified according to the \(\tilde{g}\) mass splitting (\(\Delta m\)) targeted. The thresholds in bold for each control and validation region ensure orthogonality with the corresponding signal region
Table 5 Event selection requirements for the CC Gbb 0-lepton SRs together with the associated \(t\bar{t}\) CRs and VRs, classified according to the \(\tilde{g}\) mass splitting (\(\Delta m\)) targeted. The thresholds in bold for each control and validation region ensure orthogonality with the corresponding signal region. \(N_{\textrm{lepton}}=0\) requires zero baseline leptons, while \(N_{\textrm{lepton}}=1\) requires one signal lepton
Table 6 Event selection requirements for the CC Gtb 0-lepton SRs together with the associated \(t\bar{t}\) CRs and VRs, classified according to the \(\tilde{g}\) mass splitting (\(\Delta m\)) targeted. The thresholds in bold for each control and validation region ensure orthogonality with the corresponding signal region. \(N_{\textrm{lepton}}=0\) requires zero baseline leptons, while \(N_{\textrm{lepton}}=1\) requires one signal lepton

6.3 Neural network analysis

The NN analysis uses low-level kinematic variables [85] as inputs to a dense neural network with three hidden layers trained to discriminate Gtt or Gbb model events from SM background events. Each SR is defined by event selection criteria applied to the outputs of the neural network, optimised to maximise the statistical significance of the SUSY model considered. The neural network input variables are:

  • The four-momenta (\(p_{\textrm{T}},\) \(\eta ,\) \(\phi ,\) m) of the 10 leading jets, in decreasing order of \(p_{\text {T}}\), and a set of binary variables indicating which jets are b-tagged;

  • The four-momenta of the four leading large-R jets, in decreasing order of \(p_{\text {T}}\);

  • The four-momenta of the four leading leptons (e or \(\mu \)), in decreasing order of \(p_{\text {T}}\);

  • The two components of the vector \(\vec {p}_{\textrm{T}}^{\textrm{miss}}\).

These four-momenta describe the expected final state of a typical Gtt signal event. If a given event contains fewer jets or leptons than specified above, the remaining inputs are set to zero. This procedure enables the analysis to optimally select a mixture of events with different jet and lepton multiplicities. The neural network generates three output scores measuring the probability of a given event being a signal event (P(Gtt) or P(Gbb), depending on the signal model targeted), a \(t\bar{t} \) background event \((P(t\bar{t})),\) or a Z + jets background event (P(Z)). The output scores for all processes are normalised such that they sum to unity.

The neural network was trained using the Keras [86] library with the TensorFlow [87] backend. Training was performed once for each representative model. The neural network hyperparameters were optimised with a random search [88] followed by a line scan of the learning rate. The number of training epochs was not fixed but instead defined through an early-stopping strategy using the cross-entropy loss as a figure of merit. The training samples consisted of a total of \(9.2\times 10^5\) signal events and \(5.6\times 10^5\) background events. Following event preselection, the training samples for all signal and background processes corresponded to the same integrated luminosity as that of the data sample used in the analysis.

To ensure optimum discrimination power across the Gtt and Gbb model planes, a parameterised training method [89] was employed in which the neural network was also given the \((m(\tilde{g}), m(\tilde{\chi }^0_1))\) pair of the signal point under consideration as well as a binary variable indicating if discrimination of background versus Gtt or Gbb is required. Background events were assigned random parameter values. After training, the neural network outputs were evaluated by unconditionally setting the parameters to that of the model point under consideration and processing all MC and data events. To reduce the large number of potential SRs that could emerge from such a strategy, i.e. one SR per model point, a set-cover algorithm [90] was used to iteratively select the SR which excludes the most as-yet non-excluded model points until all such points are exhausted. The result is a minimum set of SRs that are expected to exclude the same number of Gtt or Gbb models as a more extensive set with one SR for each model point. The resulting SRs are optimal for excluding models that are representative of regions of the Gtt or Gbb model parameter space. The resulting representative models are specified by the following \((m(\tilde{g}), m(\tilde{\chi }^0_1))\) mass values (in \(\text {GeV}\)):

  • Gtt: (2100, 1), (1800, 1), (2300, 1200), (1900, 1400)

  • Gbb: (2800, 1400), (2300, 1000), (2100, 1600), (2000, 1800).

There is one SR for each of these eight models. The acceptance times efficiency of the SR selections is typically 1–10% for the representative models that they target. For example, the acceptance times efficiency of the SR-Gtt-2300-1200 selection for the Gtt (2300, 1200) representative model is 6.4%.

The eight NN SRs are defined in Tables 7 and 8, together with the associated CRs and VRs. For each SR, a CR is defined to constrain the \(t\bar{t} \) background (Sect. 5.3). The CRs are defined by placing requirements on the neural network outputs orthogonal to those used in the SRs, and placing additional requirements on \(m_{\textrm{eff}}\) and \(M_{J}^{\Sigma }\) to select events with kinematics similar to those in the corresponding SR. For the NN Gbb SRs, the \(Z \,+\,{{\textrm{jets}}}\) background process (principally \(Z(\rightarrow \nu \nu )+\) jets) also contributes significantly to the background yield. Dedicated control regions for this process (labelled ‘CRZ’ in Table 9 and thereafter) are therefore defined for the Gbb SRs only. The CRs are used to normalise the \(Z \,+\,{{\textrm{jets}}}\) background estimates with the same procedure as is used for the \(t\bar{t} \) background estimates (Sect. 5.3). These CRs (Table 9) select \(Z (\rightarrow \ell \ell )+\)jets events with a requirement of two opposite-sign same-flavour (OSSF) signal electrons/muons (both with \(p_{\text {T}} >30\) \(\text {GeV}\)) with an invariant mass in the range \(60~\text {GeV}< m_{\ell \ell } < 120\) \(\text {GeV}\) and combined \(p_{\text {T}} > 70\) \(\text {GeV}\). The events must pass the lowest-threshold unprescaled single-lepton triggers used in 2015–2018, which are well modelled by MC simulation and have approximately constant efficiency for leptons with offline \(p_{\text {T}} > 27\) \(\text {GeV}\). The momenta of the leptons in selected events are added to \(\vec {p}_{\textrm{T}}^{\textrm{miss}}\) and the leptons discarded, to mimic \(Z(\rightarrow \nu \nu )+\)jets events, and then the resulting modified value of \(E_{\text {T}}^{\text {miss}} \) \((\hat{E}_{\textrm{T}}^{\textrm{miss}})\) is required to exceed 200 \(\text {GeV}\), to replicate the SR event preselection criterion. These events are then passed through the SR neural network and selections are applied to the neural network outputs that are orthogonal to the SR selection criteria.

VRs are defined for the NN SRs in a similar way to the CRs, with modified, orthogonal, selections on the neural network outputs and additional selections applied to high-level kinematic variables including \({\Delta }\phi _{\textrm{min}}^{4{\textrm{j}}},\) \(m_{\textrm{eff}} \) and \(M_{J}^{\Sigma }.\) These VRs are defined in Tables 7 and 8 and their relationship to the SRs and \(t\bar{t} \) CRs is illustrated in Fig. 5. Two VRs are defined for each NN Gbb SR to validate both the \(t\bar{t} \) and \(Z \,+\,{{\textrm{jets}}}\) background estimates.

Table 7 Definitions of the NN Gtt SRs together with the associated \(t\bar{t}\) CRs and VRs. In the first column, the two numbers separated by a hyphen specify the values of \(m(\tilde{g})\) and \(m(\tilde{\chi }^0_1) \) in GeV for the targeted representative Gtt model. The third and fourth columns specify the ranges of probability for the targeted Gtt signal model and \(t\bar{t}\) background, respectively, generated by the selection applied to the NN output. The fifth and sixth columns specify the values of additional requirements applied to \(m_{\textrm{eff}}\) and \(M_J^{\Sigma }\) in the CRs and VRs to select events with kinematics similar to those in the corresponding SRs
Table 8 Definitions of the NN Gbb SRs together with the associated \(t\bar{t}\) CRs and VRs. In the first column, the two numbers separated by a hyphen specify the values of \(m(\tilde{g})\) and \(m(\tilde{\chi }^0_1) \) in GeV for the targeted representative Gbb model. The third, fourth and fifth columns specify the ranges of probability for the targeted Gbb signal model and the \(t\bar{t}\) or \(Z+\)jets background, respectively, generated by the selection applied to the NN output. The sixth, seventh and eighth columns specify the values of additional requirements applied to \(\Delta \phi _{\textrm{min}}^{\textrm{4j}},\) \(m_{\textrm{eff}}\) and \(M_J^{\Sigma }\) in the CRs and VRs to select events with kinematics similar to those in the corresponding SRs
Table 9 Definitions of the NN Gbb \(Z+\) jets CRs (CRZ). The first seven rows specify the event selection criteria. The final three rows specify the ranges of probability for the targeted Gbb signal model and the \(t\bar{t}\) or \(Z+\) jets background, respectively, generated by the selection applied to the NN output. \(N_{\textrm{lepton}}=2\) requires two signal leptons

7 Systematic uncertainties

The magnitudes of the post-fit uncertainties in the background estimates for the various signal regions, obtained following the profile likelihood fit described in Sect. 8.1, are summarised in Fig. 6. The uncertainties considered include experimental and theoretical systematic uncertainties, and statistical uncertainties in data CR yields and MC background samples.

Fig. 5
figure 5

Schematic diagram of the inter-relationship of the SRs, \(t\bar{t}\) CRs and VRs used in the NN analysis for the Gtt (left) and Gbb (right) SRs. The \(t\bar{t}\) CRs and VRs apply additional selections to \(\Delta \phi _{\textrm{min}}^{\textrm{4j}},\) \(m_{\textrm{eff}}\) and \(M_J^{\Sigma }\) beyond those used in the SRs

Fig. 6
figure 6

A summary of the uncertainties in the background estimates for each of the signal regions of the CC (top) and NN (bottom) analyses. The individual experimental and theoretical uncertainties are assumed to be uncorrelated and are combined by adding in quadrature

Detector-related systematic uncertainties affect both the background estimate and the signal yield. The sources of the largest experimental uncertainties are related to the jet energy scale (JES) and resolution (JER) [91], the jet mass scale (JMS), and the b-tagging efficiencies and mis-tagging rates [92, 93]. The jet energy-related uncertainties are also propagated to the reclustered large-R jets, which use them as inputs. Jet mass scales are evaluated by comparing the masses reconstructed via calorimeter- and track-based measurements [94]. The impact of the JES uncertainties on the expected background yields is between 0.6 and 38% (with the largest uncertainty observed in SR-Gbb-2000-1800), while JER uncertainties affect the background yields by 1–49% in the various regions (with the largest uncertainty observed in SR-Gtt-1900-1400). This JER variation is the principal uncertainty contributing to the large total uncertainty observed in SR-Gtt-1900-1400 in Fig. 6. The impact of JMS uncertainties on the expected background yields is 0.3–41%, depending on the region (with the largest uncertainty observed in SR-Gtt-2100-1). Uncertainties in the measured b-tagging efficiencies and mis-tagging rates are 0.3–17% (with the largest uncertainty observed in SR-Gbb-B).

The experimental uncertainties due to lepton reconstruction, identification and isolation efficiency differences between data and simulation [80, 95] are also taken into account, and so are the uncertainties in lepton energy measurements [96]. These uncertainties contribute at most 14% to the total uncertainty (in SR-Gtt-0L-B). All lepton and jet measurement uncertainties are propagated to the calculation of \(E_{\text {T}}^{\text {miss}}\), and additional uncertainties in the scale and resolution of the soft term [82] are included. The overall impact of the \(E_{\text {T}}^{\text {miss}}\) soft-term uncertainties is at most 13% (in SR-Gtt-0L-B).

Fig. 7
figure 7

Pre-fit event yield in control regions and related background normalisation factors after the background-only fit for the CC (top) and NN (bottom) analyses. The upper panel shows the observed number of events and the MC background yield before the fit. The uncertainty bands include both the experimental and theoretical systematic uncertainties. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. The background normalisation factors obtained from the fit are displayed in the bottom panels, with the error bars representing the combined statistical + systematic uncertainty of each normalisation factor, obtained from the background-only fit. Red (blue) points correspond to the \(t\bar{t}\) (Z + jets) CRs

Fig. 8
figure 8

Data and fitted background yields in the VRs for the CC (top) and NN (bottom) analyses. The upper panels show the observed numbers of events and the expected background yields obtained from the background-only fits. The uncertainty bands include both the experimental and theoretical systematic uncertainties. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. The bottom panels show the Gaussian significance \((n_{\text {obs}} - n_{\text {pred}}) / \sigma _{\text {tot}}\) of the deviation of the observed yield \((n_{\text {obs}})\) from the background expectation \((n_{\text {pred}})\) in each VR, obtained using the combined statistical + systematic background uncertainty \((\sigma _{\text {tot}})\)

Fig. 9
figure 9

Data and fitted background yields in the SRs for the CC (top) and NN (bottom) analyses. The upper panels show the observed numbers of events and the expected background yields obtained from the background-only fits. Bins without data-points correspond to zero observed events. The uncertainty bands include both the experimental and theoretical systematic uncertainties. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. The bottom panels show the Gaussian significance \((n_{\text {obs}} - n_{\text {pred}}) / \sigma _{\text {tot}}\) of the deviation of the observed yield \((n_{\text {obs}})\) from the background expectation \((n_{\text {pred}})\) in each SR, obtained using the combined statistical + systematic background uncertainty \((\sigma _{\text {tot}})\)

Considering theoretical uncertainties, hadronisation and parton showering model uncertainties of the \(t\bar{t} \) background are evaluated by comparing two samples generated with and showered by either or Pythia 8.230 [15]. In addition, systematic uncertainties in the modelling of initial- and final-state radiation are explored withsamples showered with two alternative settings [97] of  [98]. The uncertainty due to the choice of matrix-element event generator is estimated by comparing the expected yields obtained using \(t\bar{t} \) samples generated with either or . The total theoretical uncertainty in the \(t\bar{t} \) background estimation is taken as the sum in quadrature of these individual components, corresponding to an impact of 6–42% (with the largest uncertainty observed in SR-Gbb-2000-1800). Moreover, an additional uncertainty of 30% is assigned to the fraction of \(t\bar{t} \) events produced in association with additional heavy-flavour jets [15, 99] (i.e. \(t\bar{t} ~+ \ge 1b\) and \(t\bar{t} ~+ \ge 1c\)), which has an impact of at most 10%.

Modelling uncertainties affecting single-top processes arise especially from the interference between the \(t\bar{t} \) and Wt processes. This uncertainty is estimated using inclusive WWbb events, generated using , which are compared with the sum of \(t\bar{t} \) and Wt events. Furthermore, as in the \(t\bar{t} \) modelling uncertainties, variations of settings increasing or decreasing the amount of radiation are also used. An additional uncertainty is included in the cross-section of single-top processes [48], and is at most 5%. Overall, the modelling uncertainties affecting single-top processes lead to changes of 4–19% in total yields in the various regions (with the largest uncertainty observed in SR-Gbb-2800-1400).

Uncertainties related to factorisation and renormalisation scales and affecting the matching procedure between the matrix element and parton shower in the W/Z + jets backgrounds are also taken into account [15]. The resulting uncertainties in the total yield range up to 53% in the various regions (with the largest uncertainty in SR-Gbb-2100-1600). A constant 30% uncertainty in the heavy-flavour content of W/Z+jets is assumed, motivated by Ref. [100], which contributes at most 8% (with the largest uncertainty observed in SR-Gtt-0L-B). Furthermore, a 50% uncertainty is assigned to \(t\bar{t} +W/Z/H,\) \(t\bar{t} t\bar{t} \) and diboson backgrounds, and is assumed to be uncorrelated across all regions. It is found to have no significant impact on the sensitivity of this analysis, and contributes at most 15% (with the largest uncertainty observed in SR-Gtt-2100-1) to the total background uncertainty. The effect of the uncertainties related to the parton distribution functions affect the background yields by less than 2%, and therefore are neglected here. Uncertainties due to the limited number of events in the MC background samples are included and can reach 30% in regions targeting moderate/large mass-splittings.

Systematic uncertainties are also assigned to the kinematic reweighting procedure, by propagating the statistical uncertainties in the parameters of the exponential fits (Sect. 5.2). In addition, the changes in estimated background yield obtained by omitting the reweighting procedure are added in quadrature to conservatively assess the impact of the procedure on the final results, which was observed to contribute to total yield variations of 0–49% across all regions (with the largest observed in SR-Gtt-1L-B). These uncertainties are applied to all simulated background events containing at least one signal lepton. The uncertainties are treated as nuisance parameters in the fits, in the same way as the other uncertainties.

Table 10 Post-fit results of the background-only fit extrapolated to the Gtt 0-lepton and Gtt 1-lepton SRs in the CC analysis, for both the total expected background yields and the main contributing background processes. The quoted uncertainties include both the experimental and theoretical systematic uncertainties. The data in the SRs are not included in the fit. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. The row ‘Pre-fit background’ provides the total background prediction when the \(t\bar{t}\) normalisation is obtained from a theoretical calculation [46], taking into account the kinematic weights described in Sect. 5.2. Yields are obtained for the large \(\Delta m\) (‘B’), moderate \(\Delta m\) (‘M1’ and ‘M2’) and small \(\Delta m\) (‘C’) scenarios
Table 11 Post-fit results of the background-only fit extrapolated to the Gbb and Gtb SRs in the CC analysis, for both the total expected background yields and the main contributing background processes. The quoted uncertainties include both the experimental and theoretical systematic uncertainties. The data in the SRs are not included in the fit. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. The row ‘Pre-fit background’ provides the total background prediction when the \(t\bar{t}\) normalisation is obtained from a theoretical calculation [46], taking into account the kinematic weights described in Sect. 5.2. Yields are obtained for the large \(\Delta m\) (‘B’), moderate \(\Delta m\) (‘M’) and small \(\Delta m\) (‘C’) scenarios
Table 12 Post-fit results of the background-only fit extrapolated to the SRs in the NN analysis, for both the total expected background yields and the main contributing background processes. The quoted uncertainties include both the experimental and theoretical systematic uncertainties. The data in the SRs are not included in the fit. The background category ‘\(t\bar{t}+X\)’ includes \(t\bar{t} W/Z,\) \(t\bar{t} H\) and \(t\bar{t}t\bar{t}\) events. The row ‘Pre-fit background’ provides the total background prediction when the \(t\bar{t}\) and \(Z+\)jets normalisations are obtained from theoretical calculation [46, 53], taking into account the kinematic weights described in Sect. 5.2
Table 13 Summary of compatibility of SR observations with background expectations in the CC and NN SRs. The second column shows the \(p_0\)-values of the background-only hypothesis together with the equivalent Gaussian significances Z. Also shown are the resulting 95\(\%\) CL upper limits on the visible cross-section \((\sigma ^{95}_{\textrm{vis}}),\) and the observed and expected 95\(\%\) CL upper limits on the number of beyond the Standard Model signal events (\(S_{\text {obs}}^{95}\) and \(S_{\text {exp}}^{95}\)). The \(p_0\)-values are capped at 0.5

The uncertainties in the cross-sections of signal processes are determined from different cross-section predictions extracted using alternative event generators and parton shower variations, as described in Sect. 3. These are at most 30% in all SRs.

Fig. 10
figure 10

Exclusion limits in the \(\tilde{g}\) mass plane for the Gtt (top) and Gbb (bottom) models obtained from the NN analysis. The dashed and solid bold lines show the 95% CL expected and observed limits, respectively. The shaded bands around the expected limits show the impact of the experimental and background uncertainties. The dotted lines show the impact on the observed limit of the variation of the nominal signal cross-section by \(\pm 1 \sigma \) of its theoretical uncertainty. The blue contours indicate the exclusion limits from the previous published result [15]

8 Results and interpretation

8.1 Results

The SM background yields are estimated separately for each SR with a profile likelihood fit [101] implemented in the HistFitter framework [102], referred to as a background-only fit. The fit (one for each SR) uses the observed event yield in the associated CR(s) to adjust the \(t\bar{t} \) normalisation (and \(Z+\) jets normalisation for the NN Gbb regions), assuming that no signal contributes to this yield, and applies that normalisation factor to the number of background events predicted by simulation for the equivalent process in the SR. The normalisation factors are allowed to vary freely, without application of constraints derived from uncertainties in theoretical cross-sections. The mean yields predicted by simulation in the SR and CR(s) are used for all background processes. The numbers of events observed in each region are described by Poisson probability density functions. The systematic uncertainties in the expected values are included in the fit as nuisance parameters. They are constrained by Gaussian distributions with widths corresponding to the sizes of the uncertainties and are treated as correlated, when appropriate, between the various regions. The product of the various probability density functions forms the likelihood, which the fit maximises by adjusting the \(t\bar{t} \) (and, in the NN Gbb regions, the \(Z+\) jets) normalisation and the nuisance parameters. The values of the normalisation factors range from 0.8 to 1.9 depending on the CRs (Fig. 7). The normalisation factors are broadly consistent with unity within uncertainties, with the largest pull across the 26 CRs being 2.3\(\sigma \) (for the \(t\bar{t} \) normalisation factor in CC SR Gtt-0L-M2).

The results of the background-only fits are extrapolated to the VRs, following the definitions of Sect. 6. A comparison of the observed and expected yields in the VRs after the fit is shown in Fig. 8 for the CC and NN analyses. The largest yield differences in the CC and NN VRs are observed for VR2-Gtt-1L-C (1.3\(\sigma \) across 18 VRs) and VR2-Gbb-2000-1800 (1.8\(\sigma \) across 12 VRs) respectively. The MC reweighting gives better agreement between the observed and expected yields in the VRs; in the CC analysis the change in the predicted yields due to the MC reweighting, which is applied to events containing exactly one signal lepton, is on average \(5\%\) and \(10\%\) for the 0-lepton and 1-lepton channels, respectively. This reweighting affects the CC 0-lepton channels, which require exactly zero signal leptons, via the \(t\bar{t} \) normalisation factors obtained from CRs requiring exactly one signal lepton.

The observed and expected event yields in the SRs for the CC and NN analyses are shown in Fig. 9 and Tables 10, 11 and 12. The significances of the deviations of the observed data from the background expectations, evaluated using a model-independent fit (described in Sect. 8.2) with pseudo-experiments, are presented in the lower panels of Fig. 9, and in Table 13. No statistically significant deviation from the background expectation is found for any of the presented SRs or analysis methodologies. The largest deviation across the 22 SRs is observed in SR-Gtb-B, with a significance of 2.3\(\sigma .\) In some cases the SR event selections are not orthogonal and hence the significances can be correlated between regions.

8.2 Interpretation

In the absence of any significant excess over the expected background from SM processes, the data are used to derive one-sided upper limits at 95% confidence level (CL). Two levels of interpretation are provided in this paper: model-independent exclusion limits and model-dependent exclusion limits set on the Gtt, Gbb and Gtb models. Exclusion limits are evaluated using pseudo-experiments.

To set upper limits on the number of ‘beyond the Standard Model’ (BSM) signal events in each SR, a model-independent fit is used [102]. This fit proceeds in the same way as the background-only fit, where yields in the CRs are used to constrain the predictions of backgrounds in each SR, while the SR yield is also used in the likelihood function with an additional parameter-of-interest describing potential signal contributions. The observed and expected upper limits at 95% CL on the number of events from BSM phenomena for each signal region (\(S_{\text {obs}}^{95}\) and \(S_{\text {exp}}^{95}\)) are derived using the CL\(_{\textrm{s}}\) prescription [103], neglecting any possible signal contamination in the CRs. These limits, when normalised by the integrated luminosity of the data sample, may be interpreted as upper limits on the visible cross-section of BSM physics \((\sigma ^{95}_{\textrm{vis}}),\) where the visible cross-section is defined as the product of production cross-section, acceptance and efficiency. All SRs from both the CC and NN analyses are considered, to aid in the reinterpretation of the results of this analysis. The results of the model-independent fit are presented in Table 13.

Model-dependent fits [102] in all the SRs are used to set limits in the mass or branching ratio planes of the Gtt, Gbb and Gtb models. Such a fit proceeds in the same way as the model-independent fit, except that both the signal yield in the SR and the signal contamination in the CR(s) are taken into account. Correlations between signal and background systematic uncertainties are taken into account where appropriate. Systematic uncertainties in the assumed signal yields due to detector effects and the theoretical uncertainties in the signal acceptance are included in the fit. The NN analysis SRs provide the higher expected exclusion sensitivity for the Gtt and Gbb models and hence are used to obtain the exclusion limits for these models. For the Gtb models, for which the NN analysis SRs were not optimised, the CC analysis SRs are used. In all cases, at each model point the result obtained from the SR with the best expected CL\(_{\textrm{s}}\) value is used. For the CC analysis applied to the Gtb models, all CC SRs are considered in this optimisation. The Gtt or Gbb SRs are found to be optimal for Gtb models in which the gluino branching ratio is dominated respectively by or \(,\) while the Gtb SRs are found to be optimal otherwise. All the fits for the various model points and parameter spaces considered give fitted SUSY signal cross-sections consistent with zero within uncertainties.

The 95% CL observed and expected exclusion limits for the Gtt and Gbb models, obtained from the NN analysis, are shown in the \(\tilde{g} \) mass plane in Fig. 10. The \(\pm 1 \sigma ^{\text {SUSY}}_{\text {theory}}\) lines around the observed limits are obtained by changing the SUSY production cross-section by one standard deviation \((\pm 1\sigma ),\) as described in Sect. 3. The yellow band around the expected limit shows the \(\pm 1\sigma \) uncertainty, including all statistical and systematic uncertainties except the theoretical uncertainties in the SUSY cross-section. The expected limits on the gluino mass, assuming a massless neutralino LSP, obtained from the CC analysis are \({\sim }150\) \(\text {GeV}\) and \({\sim }50\) \(\text {GeV}\) lower for the Gtt and Gbb models, respectively. Compared to the previous results [15], the expected limits on the gluino mass from the current search (assuming a massless ) have improved by 280 GeV and 330 GeV for the Gtt and Gbb models, respectively. Gluinos with masses below 2.44 \(\text {TeV}\) (2.35 \(\text {TeV}\)) are excluded at 95% CL for massless neutralinos in the Gtt (Gbb) models. The most stringent exclusion limits on the neutralino mass are approximately 1.35 \(\text {TeV}\) and 1.65 \(\text {TeV}\) for the Gtt and Gbb models, obtained for a gluino mass of approximately 2.20 \(\text {TeV}\) and 2.10 \(\text {TeV}\), respectively.

The 95% CL observed and expected exclusion limits for the Gtb model, with \(m(\tilde{\chi }^0_1) \) set to 1 \(\text {GeV}\), are shown in Fig. 11. The results, which are obtained from the CC analysis, are presented as limits on the mass of the gluino as a function of the branching ratios for and . The sum of these branching ratios with that for the decays and (which are set to be equal in the simulation) is set to unity. The exclusion limits are strongest when ) or ) saturate the total branching ratio (top-left and bottom-right corners of the figures), and weakest when /) saturates. Similar results are obtained for \(m(\tilde{\chi }^0_1) = 600\) \(\text {GeV}\) (Fig. 12) and \(m(\tilde{\chi }^0_1) = 1000\) \(\text {GeV}\) (Fig. 13).

9 Conclusion

This paper has discussed a search for supersymmetry involving the pair production of gluinos decaying via off-shell third-generation squarks into a lightest neutralino LSP. The analysis exploits proton–proton collision data at a centre-of-mass energy \(\sqrt{s} = 13\) \(\text {TeV}\) with an integrated luminosity of 139 fb\(^{-1}\) collected with the ATLAS detector at the LHC from 2015 to 2018. The search uses events containing large missing transverse momentum, zero or one electrons or muons, and several energetic jets, at least three of which must be identified as containing b-hadrons. Two analysis methodologies are used: one applying selections independently to a series of kinematic observables in multiple signal regions sensitive to a broad range of SUSY models, and one targeting specific signal models using a deep neural-network to further exploit correlations between observables constructed from the four-vectors of the reconstructed final-state particles. The latter methodology, which also combines events with differing numbers of final-state electrons or muons, provides enhanced discovery and exclusion power for the specific signals targeted. No significant excess is found with respect to the predicted background. Model-independent limits are set on the visible cross-section for new physics processes. Exclusion limits are set on the gluino and neutralino LSP masses in two simplified models where the gluino decays exclusively as or . For a massless neutralino, gluino masses of less than 2.44 \(\text {TeV}\) or 2.35 \(\text {TeV}\) are excluded at 95% CL in these two scenarios. The results are also interpreted in a model with variable gluino branching ratios to \(b\bar{b}\tilde{\chi }^0_1,\) \(t\bar{t}\tilde{\chi }^0_1\) and \(t\bar{b}\tilde{\chi }^-_1\)/\(\bar{t}b\tilde{\chi }^+_1.\) These limits represent a substantial increase in performance over previous ATLAS analyses exploiting smaller datasets collected at the LHC.

Fig. 11
figure 11

The observed (left) and expected (right) 95% CL exclusion limits on the gluino mass as a function of ) (vertical) and ) (horizontal) for Gtb models with \(m(\tilde{\chi }^0_1) = 1\) \(\text {GeV}\), obtained from the CC analysis. The shading indicates the highest excluded gluino mass for each point in the plane. The white line indicates the contour of fixed \(m(\tilde{g}) = 2150\) \(\text {GeV}\)

Fig. 12
figure 12

The observed (left) and expected (right) 95% CL exclusion limits on the gluino mass as a function of ) (vertical) and ) (horizontal) for Gtb models with \(m(\tilde{\chi }^0_1) = 600\) \(\text {GeV}\), obtained from the CC analysis. The shading indicates the highest excluded gluino mass for each point in the plane. The white line indicates the contour of fixed \(m(\tilde{g}) = 2200\) \(\text {GeV}\)

Fig. 13
figure 13

The observed (left) and expected (right) 95% CL exclusion limits on the gluino mass as a function of ) (vertical) and ) (horizontal) for Gtb models with \(m(\tilde{\chi }^0_1) = 1000\) \(\text {GeV}\), obtained from the CC analysis. The shading indicates the highest excluded gluino mass for each point in the plane. The white line indicates the contour of fixed \(m(\tilde{g}) = 2100\) \(\text {GeV}\)