The general method of specification is:
#GQSUB -W stagein=jobName@sourceName
or
#GQSUB -W stageout=jobName@targetName
The syntax is derived from the PBS/Torque/SGE model for specifying stagein/out files, with a few extensions.
The jobName is the name the file should have at the time of execution on the worker node. The sourceName and targetName are the specification of where to get the file from / put it to. If the jobName is ommitted, then the file arrives with the same name as it had remotely.
The valid ways of requesting a file are:
- simpleFileName
- local/path/to/file
- A file local to the submission machine
- host:file
- host:path/to/file
- An SCP destination, which the submission machine can access.
- gsiftp://host/path/file
- GridFTP URL
- srm://host/path/to/file
- SRM Storage Element
- lfn:/path/to/file
- Logical File Name (via LFC)
For SRM and LFC modes, these are converted at submission time into GridFTP urls. This is fine if the data is only at one place, but I'll change that later to automatically make it pull from the nearest.
Unfortunatly, determining the topologically nearest SE is ... tricky. Most of the usual tools (ping times, packet pairing for bandwidth assesment, etc) are unreliable, as ICMP is not guarenteed. So I think I'll have to get down and dirty with TCP/IP and HTTPS, and use that to make the estimates. Fun ... in an ironic sort of way.
Still, it's there, it works, and it does the job.
I'm going to do some more testing and stressing, before I wrap up 1.5 release - probably next week for the release.
Oh, look - our cluster is quiet right now. Time to find out what breaks when I throw throusands of jobs at it...
No comments:
Post a Comment