Scripts

The project contains some scripts to make the analysis easier by automating as work much as possible in steering the XAMPP framework. Most scripts are just simple wrappers for XAMPPplotting scripts. It is not required to use them but it can make your life a bit easier.

The scripts are located in XAMPPmonoH/python and in XAMPPmonoH/util

You first have to make sure that you have an Athena TestArea set up (e.g. by executing the setup script). Then you can run the steering scripts to produce histograms and plots.

Common tasks

These scripts automate the most common tasks like histogram creation and plotting.

Download the XAMPP ntuples of a production

This script helps to download all samples that match a common identifier, e.g.

  • group.phys-exotics.*.0L0604*
  • group.phys-exotics.*.1L0604*
  • group.phys-exotics.*.2L0604*

Note: You need to have set up lsetup rucio before using this script.

usage: downloadXAMPPntuples [-h] [-d DESTINATION] [-l] [-w WRITEFILE]
                            identifier

Positional Arguments

identifier Identifier for dataset, e.g. group.phys-exotics.*.0L0500a.*XAMPP

Named Arguments

-d, --destination
 

Directory where datasets should be downloaded to.

Default: “./”

-l, --listOnly

Only list datasets and do not download.

Default: False

-w, --writeFile
 Write dataset list to file located at this path.

Example: Download all ntuples in the 0lepton region with tag 0604 in the current directory:

python MonoHSteeringScripts/getXAMPPntuples.py group.phys-exotics.*.0L0604*XAMPP

Check that download/production of XAMPP ntuples of a production was complete

The script validateXAMPPntuples.py is a wrapper for XAMPPplotting’s CheckMetaData.py script. It compares the metadata with information stored in AMI to see if the local set of XAMPP ntuples is complete. It also checks the PRW file information stored in the ntuples to the information in AMI (MC only).

It is assumed that you will have stored the XAMPP ntuples in a separate folder for 0 lepton, 1 lepton and 2 lepton ntuples, so you have to run the script for each lepton multiplicity, which is specified by -l 0L/1L/2L.

The script will check all 4 groups of ntuples, data1516, mc16a, data17, mc16d and generate latex/pdf files documenting the completeness as tables. The script takes quite some time to run, over 10 min usually. It will also provide text files in which the luminosity in data files is stored. You can look at those most efficiently using tail lumi_15.txt (only the last line is relevant). The three numbers are “lumi in local ntuples”, “reference lumi according to GRL for local ntuples” and the difference between those two numbers.

In case you want to run the XAMPPplotting/python/CheckMetaData.py script without wrapper, you please make sure that you include both input config and sample list containing the names of the DxAODs used to create the samples listed in the input config. Otherwise it is not clear what should be compared on AMI. Currently the EXOT derivation sample lists are the reference. In case you want to use a set of sample lists belonging to another derivation, you have to specify it using the --deriv argument.

Note: You need to have set up lsetup rucio pyami before using this script. Best practice is to use this command before the AthAnalysis setup is chosen. Otherwise conflicts may arise during execution.

Note: Also make sure that you use the appropriate datasets in XAMPPmonoH/data/SampleLists for the production you are checking (e.g. by checking out the appropriate tag for the production before running this script)

Note It does not hurt to look at the script in a text editor before executing it. For example, if you only want to validate mc16a and not data, you can comment out the lines that would call XAMPPplotting/python/CheckMetaData.py for data ntuples.

usage: validateXAMPPntuples [-h] -l {0L,1L,2L} [--deriv {EXOT,HIGG}]
                            [-d DESTINATION]
                            [-i [INPUTCONFIGS [INPUTCONFIGS ...]]]
                            [-o OUTPUTPATH] [-t]

Named Arguments

-l, -L, --leptonRegion
 

Possible choices: 0L, 1L, 2L

Choose the lepton region (0L, 1L or 2L)

--deriv

Possible choices: EXOT, HIGG

Derivation which should be used for checking. Possible choices: EXOT, HIGG

Default: “EXOT”

-d, --destination
 Directory where datasets are stored which should be checked. Input config is automatically created.
-i, --inputConfigs
 

Input configs which should be used instead of automatically creating input config for files in directory

Default: []

-o, --outputPath
 

Directory where result will be stored.

Default: “result_NtupleValidation”

-t, --nolatex

Do not create a latex file

Default: False

Example: Check the completeness of the downloaded ntuples in the 0lepton region with tag 0604 in the directory ``XAMPPmonoH-00-06-04.0lep`̀`:

python MonoHSteeringScripts/validateXAMPPntuples.py -d XAMPPmonoH-00-06-04.0lep/ validation_0604_0lep -l 0L --deriv HIGG

Create Input Configs

This is a script to create input configs for the Mono-h(bb) analysis based on the settings defined in XAMPPmonoH/data/inputConfig_config.json. There are two ways two run this script:

  1. Create input configs with ntuples lying in a local directory: Use the option -i to point to the input directory.

  2. Create input configs with ntuples located on a localgroupdisk: Do NOT use -i anymore, but specify a pattern, which all datasets of interest have in common. With -e one can specify, which patterns should not be part of the dataset name. Note that the pattern elements must be separated with ” “. If the ntuples are not located at the own localgroupdisk, specify the groupdisk by using -r. Example:

    `` python createInputConfig.py -o /project/etp5/amatic/MonoH/XAMPP/source/XAMPPplotting/data/InputConf/MonoH/ZeroLepton/0L0601/mc16d –addPRWFiles -p “group.phys-exotics mc16 1L0601 r10201 XAMPP” -e “0L0601a” -r LRZ-LMU_LOCALGROUPDISK ``

Note: The script supports the creation of input configs for different campaigns. Use the --campaigns argument to select the campaign(s) among possible choices of mc16a,mc16d,mc16e which should be filtered out from the set of input files.

Script for creating Data input configs

usage: createInputConfig [-h] [-i INDIR] [-o OUTDIR] [-c CONFIG]
                         [--campaigns {mc16a,mc16d,mc16e,truth} [{mc16a,mc16d,mc16e,truth} ...]]
                         [-v] [--excludeExtensions] [-p PATTERN]
                         [-e [EXCLUDE [EXCLUDE ...]]] [-r RSE]

Named Arguments

-i, --input

Folder with files (must be an absolute path); to be used only when reading local directory

Default: “”

-o, --output

Output directory to put file list(s) into

Default: “./InputConfigs/MonoH/”

-c, --config

Path to configuration file in JSON format.

Default: “/xampp/build/x86_64-slc6-gcc62-opt/data/XAMPPmonoH/inputConfig_config.json”

--campaigns

Possible choices: mc16a, mc16d, mc16e, truth

Choose which data and mc campaigns should be processed for input config creation

Default: [‘mc16a’, ‘mc16d’, ‘mc16e’]

-v, --verbose

Print out in verbose mode

Default: False

--excludeExtensions
 

Exclude files with multiple e-tags or s-tags

Default: False

-p, --pattern

specify a pattern which is part of dataset name; to be used only when reading RSE

Default: “”

-e, --exclude

specify a pattern which must not be part of dataset name. argument 0 is used for RSE.

Default: []

-r, --RSE specify RSE storage element which should be read

Example: Create input config for all 0 lepton ntuples:

/python MonoHSteeringScripts/createInputConfig.py -i /ptmp/mpp/pgadow/monoH/data/0700/0lep/ -o XAMPPplotting/data/InputConf/MonoH/MPPMU/XAMPPmonoH-00-07-00.SR

Create DSConfigs

Automatically create DS configs to be used in XAMPPplotting to create plots out of the histograms.

The names of the histograms are defined in the configuration file XAMPPmonoH/data/DSConfig_config.json. This is also the file that defines the colour scheme of the plots. Usually only the background samples are defined in this file.

If additional samples, e.g. some signals, should be appended to the DS config, the option --additional can be used to add the content of other files (e.g. those stored in XAMPPmonoH/data/DSConfig_signal) to the DSConfig.

Note: There is the --inputConfigs option to add the path to the folder with your data input configs to the DSconfig (absolute path required !!!) to automatically calculate the luminosity for scaling the MC correctly for the plot. If this option is chosen, the luminosity set by the -l option will be overwritten.

Script for creating DSconfigs for XAMPPplotting

usage: createDSConfig [-h] -i INPUT [-o OUTPUT] [-c CONFIG]
                      [-a ADDITIONAL [ADDITIONAL ...]] [-l LUMI]
                      [--inputConfigs [INPUTCONFIGS [INPUTCONFIGS ...]]]

Named Arguments

-i, --input Folder with root files from WriteDefaultHistos (must be an absolute path)
-o, --output

Output file for the DSConfig

Default: “./DSConfig_MonoH.py”

-c, --config

Configuration file containing information about MC samples.

Default: “/xampp/build/x86_64-slc6-gcc62-opt/data/XAMPPmonoH/DSConfig_config.json”

-a, --additional
 Path to file(s) containing additional entries for DSConfig file, e.g. data-driven background estimates.
-l, --lumi

Luminosity (overwritten by –inputConfigs and auto lumi calculation)

Default: “80.”

--inputConfigs

Path to folder containing input configs or input config(s) to calculate luminosity automatically. Need to give absolute paths here (no ../)

Default: []

Example: Create DSConfig for background ntuples and include signals defined in XAMPPmonoH/data/DSConfig_signal/signal.conf:

python XAMPPmonoH/python/createDSConfig.py -a XAMPPmonoH/data/DSConfig_signal/signal.conf -i /ptmp/mpp/pgadow/Cluster/OUTPUT/2018-11-22/monoHbb_output/

Produce plots

This script allows to produce plots from a collection of XAMPP ntuples almost automatically.

The script automatically scans the input ntuples and creates the required input configs to create histograms and creates the required DSConfigs to create plots from the histograms.

To allow for quick identification of the plotting output it is possible to define a job name using the option -n. All other settings are configured using a configuration file. The default configuration file is XAMPPmonoH/data/runHistos_config.json. It is possible to edit this file before running the script using the option -e.

The default settings of this file are:

"inputBasePath": "XAMPPplotting/data/InputConf/MonoH/MPPMU",
"outputBasePath": "/ptmp/mpp/pgadow/monoH/plottingOutput/",
"tag": "00-08-00",
"ntuplesPath": {
    "SR": "/ptmp/mpp/pgadow/monoH/data/XAMPPmonoH-00-07-00.0lep/",
    "CR1": "",
    "CR2ee": "/ptmp/mpp/pgadow/monoH/data/XAMPPmonoH-00-07-01.2lep/",
    "CR2mumu": "/ptmp/mpp/pgadow/monoH/data/XAMPPmonoH-00-07-01.2lep/",
    "PH": ""
},
"regions": ["SR"],
"samples": ["data15", "data16", "data17", "data18", "stops", "stopt", "stopWt", "ttbar", "VHbb", "Wenu_cl", "Wenu_hf", "Wenu_hpt", "Wenu_l", "Wmunu_cl", "Wmunu_hf", "Wmunu_hpt", "Wmunu_l", "Wtaunu_cl", "Wtaunu_hf", "Wtaunu_hpt", "Wtaunu_l", "WW", "WZ", "Zee_cl", "Zee_hf", "Zee_hpt", "Zee_l", "Zmumu_cl", "Zmumu_hf", "Zmumu_hpt", "Zmumu_l", "Ztautau_cl", "Ztautau_hf", "Ztautau_hpt", "Ztautau_l", "Znunu_cl", "Znunu_hf", "Znunu_hpt", "Znunu_l", "ZZ"],
"signals": ["zp2hdm", "zphxxbb", "shxxbb", "2HDMa", "monoSbb"],
"doBlinding": true,
"doSystematics": false,
"doPRW": true,
"doFitInputs": false,
"skipInputConfigCreation": false,
"campaigns": ["mc16a", "mc16d"],
"luminosity": "auto",
"engine": "LOCAL"

Explanation of settings:

Settings name explanation
inputBasePath Input configs created on-the-fly will be stored here
outputBasePath Histogram and plot output will be stored here
tag Input config filename: XAMPPmonoH-<tag>.<identifier>
regions Regions to be processed (SR, CR1, CR2ee, CR2mumu, PH)
samples Process these background samples (filters input conf.)
signals Process these signal samples (filters input configs)
doBlinding Blind signal region mass window? true / false
doSystematics Run with systematics? true / false
doPRW Run with PRW? true / false (should be always true)
doFitInputs Create strongly reduced set of output histograms
campaigns Process these MC campaigns (match dataXX in samples!)
luminosity Which luminosity to scale MC (“auto”: calc from input)
engine Which batch system (or “LOCAL”) for processing?

Produce histograms for mono-h(bb) analysis.

usage: runHistos [-h] [-n NAME] [-c CONFIG] [-e]

Named Arguments

-n, --name

String to be added to the output directory to help with bookkeeping of plots.

Default: “”

-c, --config

Path to configuration file for histogram/plot creation.

Default: “/xampp/build/x86_64-slc6-gcc62-opt/data/XAMPPmonoH/runHistos_config.json”

-e, --edit

Modify config file for this script prior to execution with a text editor.

Default: False

Example: Create plots and view + edit configuration file before running script:

python XAMPPmonoH/python/runHistos.py -e

Get yields from XAMPPplotting histograms

This script allows to calculate event yields from any ROOT histogram, e.g. also those created by XAMPPplotting’s WriteDefaultHistos.

usage: getYields [-h] [--useOverflow] [--printAllVariables] [--csv]
                 [-o OUTPUTPATH] [-l LUMINOSITY]
                 [--resolvedVariable RESOLVEDVARIABLE]
                 [--mergedVariable MERGEDVARIABLE]
                 [inputFiles [inputFiles ...]]

Positional Arguments

inputFiles input files

Named Arguments

--useOverflow

take into account overflow and underflow bins

Default: False

--printAllVariables
 

print all variables

Default: False

--csv

save output as csv file

Default: False

-o, --outputPath
 

name of the directory with output yields

Default: “yields”

-l, --luminosity
 

Luminosity for backgrounds to be scaled to

Default: 36.1

--resolvedVariable
 

name of the variable for the resolved fit input histogram

Default: “m_jj”

--mergedVariable
 

name of the variable for the resolved fit input histogram

Default: “m_J”

Convert XAMPPplotting output histograms to WSMaker input format

In order to run WSMaker easily we have a script to convert the XAMPPplotting output format into WSMaker input format.

Script for converting XAMPPplotting output to WSMaker input

usage: XAMPPplottingToWSMaker [-h] [-i INPUT_DIR] [-o OUTPUT_DIR] [-l LUMI]
                              [-c] [--mee] [--mmumu]

Named Arguments

-i, --input

Folder with XAMPPplotting output files

Default: “./”

-o, --output

Output directory to put WSMaker input file

Default: “./WSMakerInput.root”

-l, --lumi Integrated luminosity to be applied as signal and background scaling.
-c, --charge

Use muon charge instead of higgs candidate mass.

Default: False

--mee

Use dielectron mass instead of higgs candidate mass.

Default: False

--mmumu

Use dimuon mass instead of higgs candidate mass.

Default: False

Currently it expects an input file format that matches the output of runHistos.py (i.e. file names like WW.root).

When systematics jobs are run, it is important to switch on the jet flavour splitting for W and Z+jets samples (pick the dedicated RunConfig region files).

Special care must be taken in case of W+jets and Z+jets input! Currently this script works only if beforehand all W+jets and Z+jets ROOT files are merged, respectively. So for instance, if the W+jets background is split into Wenu_l.root, Wenu_cl.root, Wenu_hf.root, Wenu_hpt.root, [enu -> munu], [enu -> munu], all this files must be merged into W.root using hadd first. A similar condition must be respected for Z+jets splittings. Then only the resulting W.root and Z.root files must be considered by this script.

Also data input files must be prepared before running this script. Assuming there is one folder with data15.root and data16.root files and another folder with a data17.root file, where this script is run twice accordingly, the data15.root and data16.root files first need to be merged using hadd where the result is called data.root. The data17.root file must just be renamed to data.root.:

# data1516
hadd data.root data{15,16}.root
rm data{15,16}.root

# data17
mv data17.root data.root

# W/Z+jets
hadd W.root W{enu,munu,taunu}_{cl,hf,hpt,l}.root
hadd Z.root Z{ee,mumu,tautau,nunu}_{cl,hf,hpt,l}.root
rm W{enu,munu,taunu}_{cl,hf,hpt,l}.root
rm Z{ee,mumu,tautau,nunu}_{cl,hf,hpt,l}.root

Often it occurs that some data runs are missing (sorry ATLAS operations). In this case, the integrated luminosity of the data at hand must be determined. One way to do this conveniently is to just run the script https://gitlab.cern.ch/atlas-mpp-xampp/XAMPPplotting/blob/master/python/CheckMetaData.py with the –lumi option. The outcome must then be passed to this script here via the -l option.

**What next? **

In case some systematics still need to be renamed, an example is provided with XAMPPmonoH/utils/formatSystematicHistos.C.

Currently the fitting code is developed here:

https://gitlab.cern.ch/atlas-phys/exot/jdm/ANA-EXOT-2018-46/fitting-and-limits/WSMaker_MonoH

Bookkeeping

The scripts in XAMPPmonoH/python/bookkeeping are designed to simplify the bookkeeping of

  • sample lists: these lists are stored in XAMPPmonoH/data/SampleLists and contain the DxAODs used for ntuple creation
  • ntuple lists: these lists are circulated to the analysis team once a new ntuple production is finished and contain the XAMPP ntuples required for analysis with XAMPPplotting

Create new MC sample list

This script works both for data and MC and creates sample lists for both AODs and all derivations that are defined in the configuration files:

  • XAMPPmonoH/data/samplelist.txt: list of all dsids + sample names with option to put a veto (will remove sample from DAOD lists) + option to limit sample only to a specific derivation (if multiple derivations are used)
  • XAMPPmonoH/data/samplelist_config.json: configuration file with settings for sample list creation, e.g. output folder, good run lists, all derivations for which sample lists should be created, rtags, MC campaign, signal keywords, blacklisted samples, and fastsim / fullsim switches
usage: CreateSampleList [-h] [-c CONFIG] [--appyVetoForAOD] [--skipCheckGRL]
                        [--noDataLists] [--pTags PTAGS [PTAGS ...]]

Named Arguments

-c, --config

Path to configuration file

Default: “/xampp/build/x86_64-slc6-gcc62-opt/data/XAMPPmonoH/samplelist_config.json”

--appyVetoForAOD
 

Skip samples with veto flag on the samplelist.txt also for AOD lists.

Default: False

--skipCheckGRL

Skip check if the GRLs in the config file are up to date.

Default: False

--noDataLists

Don’t update data sample lists, only update MC data lists (e.g. if you have defined period containers and don’t want those overwritten).

Default: False

--pTags Option to choose specific p-tags (type e.g. pTags p3840 p3841 to get the MC samples with p3840 and the data samples with p3841). This is useful when more than one derivation versions are available.

Check sample list for completeness

This script can be used to check a sample list for completeness.

A sample list containing data is checked against the good run list (GRL) for completeness.

A sample list containing MC is checked against the content of XAMPPmonoH/data/samplelist.txt for completeness.

This script checks if you have duplicate DSIDs in your sample list (i.e. list with DxAOD names that are used for ntuple production). The file names of the sample list must follow this format: “data[15,16,17]_[derivationname].txt” or “mc16[a,d,e]_[signals,bkgs]_[derivationname].txt” The script checks the sample list for completeness based on the GRL in case for data or the content of XAMPPmonoH/data/samplelist.txt

usage: CheckSampleList [-h] [--GRL GRL] [--derivation DERIVATION] inputFile

Positional Arguments

inputFile your input file with data/MC lists

Named Arguments

--GRL

your input GRL

Default: “”

--derivation

your input derivation (choose HIGG or EXOT), leave blank for automatic detection

Default: “”

Check XAMPP ntuple list for completeness

This script can be used to check if the content of a filelist with XAMPP ntuples is complete with respect to a set of sample lists (e.g. those sample lists that were used to create the ntuples during production).

This script checks if a text file containing a list with XAMPP ntuples (i.e. the output of XAMPPmonoH) contains all samples specified on a set of sample lists with the names of DxAODs that went into the ntuple production. Typical use case: The analysis team just finished a ntuple production on the grid and compiled a text file containing all ntuples for download using “rucio download –ndownloader 5 cat textfile.txt”. Before sending the list to the team you want to check using this script if this text file is complete with respect to the sample list containing the DxAODs that went into the ntuple production.

usage: CheckNtupleList [-h] [--sampleLists [SAMPLELISTS [SAMPLELISTS ...]]]
                       inputFile

Positional Arguments

inputFile your input file listing XAMPP ntuples

Named Arguments

--sampleLists

sample lists with DxAODs used as a reference to check the ntuple file list for completeness (be sure that those are complete!)

Default: []

Systematics

There exist some scripts that allow to plot the systematic up/down variations in support-note-plot quality and create tables listing the magnitude of the up/down variations:

https://gitlab.cern.ch/atlas-mpp-xampp/XAMPPmonoH/tree/master/XAMPPmonoH/python/studysystematics

Other

Update DSID proc list

The background modeling uncertainties are assigned based on a process list that matches DSIDs to processes: https://gitlab.cern.ch/atlas-mpp-xampp/XAMPPmonoH/blob/master/XAMPPmonoH/data/mc16_dsid_proc.txt

If a sample is not included on this list, the framework will crash for processing this sample.

To simplify the task of updating this list, a very basic script that extracts the samples and DSIDs from the sample lists exists:

https://gitlab.cern.ch/atlas-mpp-xampp/XAMPPmonoH/blob/master/XAMPPmonoH/python/assignDSIDtoProc.py

Signal significance studies

There are some scripts that can be used to study the signal significance:

https://gitlab.cern.ch/atlas-mpp-xampp/XAMPPmonoH/tree/master/XAMPPmonoH/python/studysignalsignificance