Ntuple production¶
Checklist for Launching a Ntuple Production¶
A new ntuple production should be launched when all items below are fulfilled.
Checklist¶
- Get the latest version of XAMPPbase (AnalysisTools/XAMPPplotting) and update the submodule (or some tag created for production)
- If you checkout a tag, please make sure that you use a recent version of git. The commands ‘setupATLAS’ and ‘lsetup git’ are good for that. If you don’t do so, you might use the wrong submodules.
- Check that the master of the framework compiles (hopefully it does already see CI) against the latest AthAnalysis release
- Check the sample list you want to run on (located here, and Sorting the list by DSID makes it easier)
- Use the latest version of the datasets MC/data (check the p-tag via AMI)
- Check for duplicated datasets in the list (especially important for duplicated runs of data, e.g. different m/f/p-tags and if the runs are on the GRL)
- Full production requires a check of data (15, 16, 17), bkg MC and signal MC
- Check also before the lifetime of your derivations, for MC here and for data here
- Check for the latest GRL, ilumicalc files and PRW files (and setup here)
- The PRW got recently a bit complicated, when running on mc16a/c/d. Therefore, one has to check the correct configuration, e.g. data15/16 or data17
- Check that all of the desired triggers (MET and single lepton) are included in the 3 job option files
- Update the 1 b-tagging CDI file:
- For small-R jets here
- For track-jets the same as for small-R jets is used.
- Check the large-R uncertainties config, e.g. look for
Jet.LargeRuncConfig
in SUSYTools_MonoH.conf - Check that all 4 job options files have the right configuration: MonoHToolSetup.py, runMonoH_0lep.py, runMonoH_0lep.py and runMonoH_0lep.py
- Check that the SUSYTools file is correctly configured
- Check that the framework is locally running for all lepton channels (0/1/2) using MC and data (6 test jobs to run!!!) For MC it is reasonable to use the ttbar sample as it contributes in all analysis channels.
- Check that a file exists which assigns scattering process names to DSIDs properly. Normally this should be available under XAMPPmonoH/data, but running python XAMPPmonoH/scripts/assignDSIDtoProc.py again would be nice to make things safe.
If everythings works out fine create a new tag, announce it to the mailing list and keep the fingers crossed that all jobs finish fine.
Tips for efficient grid jobs¶
There exists a very helpful collection of scripts to monitor panda jobs. You can find them here: https://github.com/dguest/pandamonium
Try to submit using the option –filesPerJob 1. This option significally speeds up production. It will give a warning, which you can ignore. However there might be broken jobs due to this setting.
For the production tag ALWAYS follow this recommendation:
[0,1,2]L[production_id][a,d][n]
Explanation:
[0,1,2]L
: number of leptons in region[production id]
: four digit number with the production tag, e.g.0700
or0604
[a,d]
: mc16a or mc16d- [n]: number of try for submission (e.g. recover for broken jobs), example:
0
for first submission
Example name for 0 lepton data1516/mc16a production id 0604
, first attempt :
0L0604a0
Example submission command for exotics production role:
python XAMPPmonoH/python/SubmitToGrid.py --list XAMPPmonoH/data/SampleLists/data15_HIGG5D1.txt --jobOptions XAMPPmonoH/share/runMonoH_0lep.py --production 0L0604a0 --productionRole --filesPerJob 1
Retry jobs at high performance sites¶
Sometimes it is necessary to request the transfer of a dataset to a high performace site, such as RAL_ANALY_SL6
, to the scratch disk.
Then the slowly submitting job can be aborted using
setupATLAS
lsetup panda
pbook
# wait a bit until the internal database is ready
kill(<enter task_id of job>)
The job will then go in the aborted state. After some time (~30 min), when the transfer is completed, the job can be restarted, using this option:
retry(<enter here the task_id>, newOpts={'site': ['ANALY_RAL_SL6']})
Of course there are other sites than RAL, such as TOKYO, DESY, …
Mass-retry of failed and finished jobs:¶
Please download and copy to a location, which is added to your bash-PATH
the pandamonium tools.
Before being able to use them you must have set up python and panda by entering lsetup python panda
.
Add these lines to your ~/.bashrc
, then source ~/.bashrc
.
function resubmitXAMPPmonoH(){
pandamon -d 60 -f group.phys-exotics.data*_13TeV.*.*${1}* | grep "finished\|failed\|running" | awk '{print $2}' | panda-resub-taskid
pandamon -d 60 -f group.phys-exotics.mc16_13TeV.*.*${1}* | grep "finished\|failed\|running" | awk '{print $2}' | panda-resub-taskid
}
After this you can retry en masse parts of the production using
resubmitXAMPPmonoH [0,1,2]L[production_id][a,d]
Example:
resubmitXAMPPmonoH 0L0604a
Get list of input datasets for broken jobs or for scouting jobs¶
Please download and copy to a location, which is added to your bash-PATH
the pandamonium tools.
Before being able to use them you must have set up python and panda by entering lsetup python panda
.
Add the following lines to your ~/.bashrc
:
function listScoutingJobsXAMPPmonoH(){
pandamon -d 60 -f group.phys-exotics.data*_13TeV.*.*${1}* | awk '$1 ~ /scouting/ {print $4}' | pandamon - -s IN
pandamon -d 60 -f group.phys-exotics.mc16_13TeV.*.*${1}* | awk '$1 ~ /scouting/ {print $4}' | pandamon - -s IN
}
function listBrokenJobsXAMPPmonoH(){
pandamon -d 60 -f group.phys-exotics.data*_13TeV.*.*${1}* | awk '$1 ~ /broken/ {print $4}' | pandamon - -s IN
pandamon -d 60 -f group.phys-exotics.mc16_13TeV.*.*${1}* | awk '$1 ~ /broken/ {print $4}' | pandamon - -s IN
}
Now you can use those commands to efficiently get the input datasets for broken and submitting jobs, e.g. for resubmission with another identifier (e.g. 0L0604a1
instead of 0L0604a0
) or for transfer to high performace grid sites.
Example:
listBrokenJobsXAMPPmonoH 2L0604d0
Create file lists of downloadable ntuples¶
Please download and copy to a location, which is added to your bash-PATH
the pandamonium tools.
Before being able to use them you must have set up python and panda by entering lsetup python panda
.
Once the production is almost done, it will be required to create filelists to be shared with the analysis team, so that they can download the ntuples using the commands
setupATLAS
lsetup rucio
# get grid proxy
voms-proxy-init -voms atlas
rucio download --ndownloader 5 `cat filelist.txt`
You can efficiently create such a filelist by adding the following lines to your ~/.bashrc
function listDoneJobsXAMPPmonoH(){
pandamon -d 60 -f group.phys-exotics.data*_13TeV.*.*${1}* | awk '$1 ~ /done/ {gsub("/","_XAMPP",$4); print $4}' | sort
pandamon -d 60 -f group.phys-exotics.mc16_13TeV.*.*${1}* | awk '$1 ~ /done/ {gsub("/","_XAMPP",$4); print $4}' | sort
pandamon -d 60 -f group.phys-exotics.data*_13TeV.*.*${1}* | awk '$1 ~ /finished/ {gsub("/","_XAMPP",$4); print $4}' | sort
pandamon -d 60 -f group.phys-exotics.mc16_13TeV.*.*${1}* | awk '$1 ~ /finished/ {gsub("/","_XAMPP",$4); print $4}' | sort
pandamon -d 60 -f group.phys-exotics.data*_13TeV.*.*${1}* | awk '$1 ~ /running/ {gsub("/","_XAMPP",$4); print $4}' | sort
pandamon -d 60 -f group.phys-exotics.mc16_13TeV.*.*${1}* | awk '$1 ~ /running/ {gsub("/","_XAMPP",$4); print $4}' | sort
}
Example:
listDoneJobsXAMPPmonoH 0L0604d