Thursday, 25 February 2010

Buckets and sieves

Deep into data management at the moment - hence buckets of data.

One of the more annoying aspects of working with computers is the leaky abstraction; and that's where the sieve comes in.

A leaky abstraction is a particularly cruel thing - it appears to promise things that don't hold true. For example, if something acts like a list that you can add things to, you expect to be able to add things to it. If it didn't allow things that started with the word 'aardvark', most people wouldn't notice it. The moment an entomologist gets involved, and the abstraction breaks apart.

The particular problem I've been having is that when staging in files, there's an established syntax for naming files, which allows them to be different names on the worker node than on the source filesystem. So what should happen if you want several files with the same source filename, but distinct names on the worker node?

Well, at the moment, you get a right mess. Files are staged in to a local file with the same filename as the remote. Then, if you need it, gqsub will rename them. If there's two files with the same remote name, then the last specified one clobbers the others. That's a problem that the gLite middleware has had for a while (mostly as it doesn't support renaming files at all), but it's a leak - it breaks the mental model we're trying to support here.

In addition, what happens if the user wants 2 copies of the same file, under different names, on the worker node? It's perhaps a rare case, but I can think of a few cases (mostly where the file is mutated by the work, and it's the differences that are important, not the absolute data). In this case, we have to consider where do we do the duplication, and how. I plumped for doing on the worker node - most filesystems will be able to do copy-on-write, and the network transfer is likely to be more expensive than the local copy.

Anyway, this is mostly about the issues, as a way of thinking about the resolutions.

Ultimatly, I'd like the abstractions presented to be as leak free as possible, so that people can reason about what will happen without having to make reference to the underlying system.

Monday, 15 February 2010

Data, data everywhere ...

and not a drop to download.

(I know it doesn't scan right - work with me here...)

Currently working on the data management aspects; aiming to make data as easy to work with as compute. And as efficient as possible...

There's a certain amount of mis-match with the various data tools in gLite / LCG. It's as if one group went with round wheels and the other build roads of inverted catenaries. Both fine ideas, but not when used together...

Within the Job Description Language, we can specify hard requirements on data files that we want to be 'close' to [0]. However, we can't get the job wrapper to handle staging in those files for us - so the job has to do that itself.

The primary tools [1] for handling data is the lcg-utils; which are pretty good. They work nearly transparently with gsiftp (single file) urls, srm (storage element) urls, and lfn (logical file name) uri's - so as you move through the hierachy, the tools are very similar. They're also auto load balancing, so picking a random replica each time for cases when the same data is in multiple locations.

Alas, these two don't play well - if we use the JDL DataRequirements, we'll be on an worker node 'close' to a Storage Element that has the data; but using lcg-utils naively, we'll pull the data from somewhere random. That's not good, given we already know we're close to one.

So the plan is, in the short term, do it that way, in all it's simple to write, inefficent glory. Once I have a working infrastructure for data staging, then I'll refine it. There's interesting trade-off's in the load on the Logical File Catalogue, versus runtime on the worker node, against reliability. But the first stab will one that 'works', in the sense of gets the job and the data in one place.


[0] Although if you specify a set of files that are not all present in a single Storage Element, then nowhere matches. Not the most helpful ...
[1] Read: most usable ones. You can do the same thing in more complicated ways, if you really want to...

Tuesday, 9 February 2010

... how many jobs‽

During the course of trying to track down a problem with one users jobs, we ran gqstat.

Which started collecting status on a job. And then another. And more.

Then it kept going.

Turned out that there were around 1500 jobs (each with their own set of subjobs) which had status to collect. This took around 10 minutes to gather...

A little bit sub-optimal methinks. I've done some profiling since then, and it looks like the dominating factor in the the time taken is the connection setup / tear down - no doubt due in a large part to the SSL certificate overhead. gqstat collects each job individually, so we incurr that hit every job.

In a way, this is a good problem to have - it shows that the tool is sufficiently usable for an end user to submit thousands of jobs, and it validates the illusionary shared filesystem approach. Had that illusion not be created, then the monitoring tools would have to be used more.

Of course, just because that fits one persons workload doesn't mean that it's going to be great for everyone, so it's not the end of the road yet. Still, a nice milestone to hit.

Anyway, back to this performance issue - thankfully I had it report when it collects job information, so it was clear _what_ was happening. This does suggest an improvement, however - collect a group of jobs, and then display them, then collect, display etc.

The speed hit from the SSL setup/tear down can be mitigated if we can collect information about several job at once. gqstat keeps the status on disc, so I'd need to be able to separate the status data for each job after that, but I think this is straight forward. It'll involve creating a temporary file listing the JobIDs of several jobs, and then querying that. Of course, the ideal way would be to have a nice Python API for querying job information... Given that it's gLite 3.2 now, it might be worth giving the Python API another shot, when I get some free time (ha!).

So, in summary, there's performance issues with gqstat at around 1000 jobs in flight, and I have a plan for how to deal with that. As to _when_ ... I think that'll have to wait till I get the data staging sorted out, putting it in 1.6 timeframe.

Friday, 5 February 2010

1.4.1 release, and MyProxy

Small update over yesterdays 1.4.0, fixing problems with existing default.jdl - mostly as a way of retiring that feature. This had been used to enable use of MyProxy proxy renewals.

However, the 1.4 separation work was with the express purpose of supporting MyProxy natively. So, if you are in the situation where you want long running jobs, here's how to enable that:

credentials = MyProxy


in your gqsubrc file.

And then forget about it. When it's needed to refresh credentials, you will be prompted for the passphrase for the certificate. Let the computer work out what's needed, and get on with the research ... Hrm, that would make a good motto for gqsub.

This does assume that the user interface machine is set up for MyProxy appropriately.

The other alternative for credentials is 'PlainVoms', which menas to use an ordinary proxy certificate.

At some point, I'll be extending this to allow for the voms-proxy-from-proxy mechanism used in some UI's - that will make it simpler to use on non-institutional UI machines.

Thursday, 4 February 2010

Beginning in the middle

After all, beginning at the start is such a cliché...

So this is a blog primarily to talk about gqsub, a hunk of Python code that exists to provide a better interface to Grid job submission and management. I'll be talking about the development process, and showing some of the ways in which it can be used.

gqsub has been around for a while now, currently on the 1.4.0 release. I'll put a post up shortly talking about the latest changes in 1.4.0.

Previously I've stuck a couple of posts on Scotgrid on Fire! about it, but it's really a bit separate from the Scotgrid work - and I don't want to overly clutter that blog with the minutia of gqsub development.