Thursday, 4 March 2010

Data transfers - now with workyness

As of the current SVN tree, data handling from SE's is supported. (It would have been yesterday, but there was a blip in the SVN severs).

The general method of specification is:

#GQSUB -W stagein=jobName@sourceName

or

#GQSUB -W stageout=jobName@targetName

The syntax is derived from the PBS/Torque/SGE model for specifying stagein/out files, with a few extensions.

The jobName is the name the file should have at the time of execution on the worker node. The sourceName and targetName are the specification of where to get the file from / put it to. If the jobName is ommitted, then the file arrives with the same name as it had remotely.

The valid ways of requesting a file are:

simpleFileName

local/path/to/file

A file local to the submission machine

host:file

host:path/to/file

An SCP destination, which the submission machine can access.

gsiftp://host/path/file

GridFTP URL

srm://host/path/to/file

SRM Storage Element

lfn:/path/to/file

Logical File Name (via LFC)



For SRM and LFC modes, these are converted at submission time into GridFTP urls. This is fine if the data is only at one place, but I'll change that later to automatically make it pull from the nearest.

Unfortunatly, determining the topologically nearest SE is ... tricky. Most of the usual tools (ping times, packet pairing for bandwidth assesment, etc) are unreliable, as ICMP is not guarenteed. So I think I'll have to get down and dirty with TCP/IP and HTTPS, and use that to make the estimates. Fun ... in an ironic sort of way.

Still, it's there, it works, and it does the job.

I'm going to do some more testing and stressing, before I wrap up 1.5 release - probably next week for the release.

Oh, look - our cluster is quiet right now. Time to find out what breaks when I throw throusands of jobs at it...

No comments:

Post a Comment