Monday 29 March 2010

CREAM Engine

As Stuart mentioned, I'm putting a CREAM engine together. A basic first draft has been checked in, and I'm in the process of writing a set of tests specifically for CREAM rather than amending the original test set (so I can make sure I haven't broken gLite submission, as well as checking the CREAM one works).

Thursday 18 March 2010

1.5.0 release, and further developments

That's the 1.5.0 release up on the gqsub page.

This handles data staging to and from Storage elements. It's a little ... direct ... in a few cases, so if you are staging multiple files and they have multiple replica's, then it doesn't guarantee to use the optimal replica. This is, however, a fairly minor issue - if you have to handle that much data then we're quite a bit beyond something that could run on a local cluster. Nevertheless, I will work on that part, and tighten it up. As it stands, it's comparable with other user tools that weren't written for a particular VO.

In parallel with that, Morag is working on a second submission engine on the back end - this one for direct submission to CREAM. It's not very different, which makes it an excellent 2nd target, and a solid step on the way to backend independence. In particular, direct submission to CREAM requires a GridFTP server, or something similar, in order to get the data back, so requirements for backends need to be handled. She's already cleaned up some of the job handling code; so look for direct CREAM submission (no WMS needed) in around 1.6 - 1.7 timeframe.

Now, on to more data handling code ...

Thursday 4 March 2010

Data transfers - now with workyness

As of the current SVN tree, data handling from SE's is supported. (It would have been yesterday, but there was a blip in the SVN severs).

The general method of specification is:

#GQSUB -W stagein=jobName@sourceName

or

#GQSUB -W stageout=jobName@targetName

The syntax is derived from the PBS/Torque/SGE model for specifying stagein/out files, with a few extensions.

The jobName is the name the file should have at the time of execution on the worker node. The sourceName and targetName are the specification of where to get the file from / put it to. If the jobName is ommitted, then the file arrives with the same name as it had remotely.

The valid ways of requesting a file are:

simpleFileName

local/path/to/file

A file local to the submission machine

host:file

host:path/to/file

An SCP destination, which the submission machine can access.

gsiftp://host/path/file

GridFTP URL

srm://host/path/to/file

SRM Storage Element

lfn:/path/to/file

Logical File Name (via LFC)



For SRM and LFC modes, these are converted at submission time into GridFTP urls. This is fine if the data is only at one place, but I'll change that later to automatically make it pull from the nearest.

Unfortunatly, determining the topologically nearest SE is ... tricky. Most of the usual tools (ping times, packet pairing for bandwidth assesment, etc) are unreliable, as ICMP is not guarenteed. So I think I'll have to get down and dirty with TCP/IP and HTTPS, and use that to make the estimates. Fun ... in an ironic sort of way.

Still, it's there, it works, and it does the job.

I'm going to do some more testing and stressing, before I wrap up 1.5 release - probably next week for the release.

Oh, look - our cluster is quiet right now. Time to find out what breaks when I throw throusands of jobs at it...