1. Introduction of SRM

  • Concepts of SRM
  • Storage Resource Managers (SRMs) are middleware components whose function is to provide dynamic space allocation and file management on shared storage components on the grid. They complement Compute Resource Managers in providing storage reservation and dynamic information on storage availability for data movement, and for planning and execution of Grid jobs. [1]

    SRMs are designed to allocate and reuse space dynamically. Space can be allocated dynamically to a client, and the decision of which files to keep in the storage space is controlled dynamically by the SRM. [1]

    SRMs are designed to provide effective sharing of files, by monitoring the activity of shared files, and making dynamic decisions on which files to replace when space is needed. In addition, SRMs perform automatic garbage collection of unused files by removing files whose lifetime has expired when space is needed. [2]

    SRMs do not perform file movement operations, but rather interact with operating systems, mass storage systems (MSSs) to perform file archiving and file staging, and invoke middleware components (such as GridFTP) to perform file transfer operations. [1]

    LBNL SRM is one kind of SRM implementations which has been developed by Scientific Data Management (SDM) Group of LBNL. There are several types of LBNL SRM: Disk Resource Managers (DRMs), Tape Resource Managers (TRMs), and Hierarchical Resource Managers (HRMs).

 
  • Types of LBNL SRM
Disk Resource Managers (DRM)
A Disk Resource Manager (DRM) manages dynamically a single shared disk cache. This disk cache can be a single disk, a collection of disks, or a RAID system. The disk cache is available to the client through the operating system that provides a file system view of the disk cache, with the usual capability to create and delete directories/files, and to open, read, write, and close files. However, space is not pre-allocated to clients. Rather, the amount of space allocated to each client is managed dynamically by the DRM. The function of a DRM is to manage the disk cache using some client resource management policy that can be set by the administrator of the disk cache. The policy may restrict the number of simultaneous requests by each client, or may give preferential access to clients based on their assigned priority. In addition, a DRM may perform operations to get files from other SRMs on the grid. Using a DRM by multiple clients can provide an added advantage of file sharing among the clients and repeated use of files. This is especially useful for scientific communities that are likely to have an overlapping file access patterns. One can use cache management policies that minimize repeated file transfers to the disk cache for remote grid sites. The cache management policies can be based on use history or anticipated requests. [1]
Tape Resource Manager (TRM)
A Tape Resource Manager (TRM) is a middleware layer that interfaces to systems that manage robotic tapes. The tapes are accessible to a client through fairly sophisticated Mass Storage Systems (MSSs) such as HPSS, Unitree, Enstore, etc. Such systems usually have a disk cache that is used to stage files temporarily before transferring them to clients. MSSs typically provide a client with a file system view and a directory structure, but do not allow dynamic open, read, write, and close of files. Instead they provide some way to transfer files to the client's space, using transfer protocols such as FTP, and various variants of FTP (e.g. Parallel FTP, called PFTP, in HPSS). The TRM's function is to accept requests for file transfers from clients, queue such requests in case the MSS is busy or temporarily down, and apply a policy on the use of the MSS resources. As in the case of a DRM, the policy may restrict the number of simultaneous transfer requests by each client, or may give preferential access to clients based on their assigned priority. [1]
Hierarchical Storage Manager (HRM)
A Hierarchical Storage Manager (HRM) is a TRM that has a staging disk cache for its use. Thus, it can be viewed as a combination of a DRM and a TRM. It can use the disk cache for pre-staging files for clients, and for sharing files between clients. This functionality can be very useful in a data grid, since a request from a client may be for multiple files. Even if the client can only process one file at a time, the HRM can use its cache to pre-stage the next files. Furthermore, the transfer of large files on a shared wide area network may be sufficiently slow, that while a file is being transferred, another can be staged from tape. Because robotic tape systems are mechanical in nature, they have a latency of mounting a tape and seeking to the location of a file. Pre-staging can help mask this latency. Another advantage of using a staging disk in an HRM is that it can be used for file sharing. Given that multiple clients can make a request for multiple files to an HRM, the HRM can choose to leave a file longer in cache so that it can be shared with other client based on use history or anticipated requests. The goal is to minimize staging files from the robotic tape system. [1]
 
  • The "Read" and "Write" functionality of SRMs
"Read" functionality
When a request to read a file is made to an SRM, the SRM may already have the file in its cache. In this case it pins the file and returns the location of the file in its cache. The client can then read the file directly from the disk cache (if it has access permission), or can copy or transfer the file into its local disk. In either case, the SRM will be expected to pin the file in cache for the client for a period of time. A well-behaved client will be expected to "release" the file when it is done with it. This case applies to both DRMs and HRMs. [1]
If the file is not in the disk cache, the SRM will be expected to get the file from its source location. For a DRM this means getting the file from some remote location. For an HRM, this means getting the file from the MSS. This capability simplifies the tasks that the client has to perform. Rather than return to the client with "file not found", the SRM provides the service of getting the file from its source location. Since getting a file from a remote location or a tape system may take a relatively long time, it should be possible for the client to make a non-blocking request. To accommodate this possibility the SRMs provide a callback function that notifies the client when a requested file arrives in its disk cache and the location of that file. In case that the client cannot be called back, SRMs also provide a "status" function call that the client can use to find out when the file arrives. The status function can return estimates on the file arrival time if the file has not arrived yet. [1]
HRMs can also maintain a queue for scheduling the file staging from tape to disk by the MSS. This is especially needed if the MSS is temporarily busy. When a request to stage a file is made, the HRM checks its queue. If the HRM's queue is empty, it schedules its staging immediately. The HRM can take advantage of its queue to stage files in an order optimized for the MSS. In particular, it can schedule the order of file staging according to the tape ID to minimize tape mounts and dismounts. Like a DRM, the HRM needs to notify the client that the file was staged by issuing a callback, or the client can find that out by using "status". [1]
"Write" functionality
A request to "write" a file requires a different functionality. In the case of a DRM, if the file size is provided, then that space is allocated, and the client can write the file to it. If the file size is not provided, a large default size is assumed, and the available space is adjusted after the file is written. In the case of an HRM, the file is first written to its disk cache in exactly the same way as the DRM description above. The HRM then notifies the client that the file has arrived to its disk using a callback, then it schedules it to be archived to tape by the MSS. After the file is archived by the MSS, the SRM notifies the client again using a callback. Thus, the HRM's disk cache is serving as a temporary buffer for files being written to tape. The advantage of this functionality by HRM is that writing a file to a remote MSS can be performed in two stages: first transferring the file to the HRMs disk cache as fast as the network permits, and then archiving the file to tape as a background task. In this way the HRM can eliminate the burden from the client to deal with a busy MSS as well as dealing with temporary failures of the MSS system. [1]

 

  • Third party transfer
SRMs can also be used to coordinate a third party file movement. Essentially, an SRM in site Y can be asked to "pull" a file form site X. This request can be made by a client in a third location. The SRMs in the two sites X and Y then coordinate space allocation, file pinning, and file release. The actual transfer of the file is a regular two-way file transfer from X to Y. The usefulness of this functionality is for clients that produce files, store then temporarily in some location X, and then request their movement to an archive in site Y. The inverse functionality can also be provided, where the SRM at site X is asked to "push" the file to site Y. [1]

 

  • Benefits of using SRMs
o SRMs can insulate clients from storage systems failures.
This is an important capability that is especially useful for HRMs since MSSs are complex systems that fail from time to time, and may become temporarily unavailable. For long lasting jobs accessing many files, which are typical of scientific applications, it is prohibitive to abort and restart a job. Typically, the burden of dealing with an MSS's temporary failure falls on the client. Instead, an HRM can insulate clients from such failures, by monitoring the transfer to the HRM's disk, and if a failure occurs, the HRM can wait for the MSS to recover, and re-stage the file. All that the client perceives is a slower response. [1]
o SRMs can transparently deal with network failures.
SRMs can monitor file transfers, and if failures occur, re-try the request. They can provide clients the information of such failures, so that clients can find other alternatives, such as getting the file from its original archive if a transfer from a replication site failed. [1]
o SRMs can provide a "streaming model" to the client.
SRMs provide a stream of files to the client programs, rather than all the files at once. This is especially important for large computing tasks, such as processing hundreds, or even thousands of files. Typically, the client does not have the space for the hundreds of files to be brought in at once. When making such a request from an SRM, the SRM can provide the client with a few files at a time, streaming of files as they are used and released. This is managed by the SRM enforcing a quota per client, either by the amount of space allocated and/or by the number of files allocated. As soon as files are used by the client and released, the SRM brings in the next files for processing in a streaming fashion. [1]
The advantage to this "streaming model" is that clients can set up a long running task, and have the SRM manage the streaming of files, the pre-staging of files, the dynamic allocation of space, and the transferring of files in the most efficient way possible. [1]
o Enhance efficiency of the grid, eliminating unnecessary file transfers by sharing files.
It is typical of scientific investigations that multiple clients at the same site use overlapping sets of files. This presents an opportunity for the SRM at that site to choose to keep the most popular files in its disk cache longer, and providing clients with files that are already in the disk cache first. Managing this behavior is referred to as a "replacement policy", that is deciding dynamically which file to replace when space is needed. Deploying efficient replacement policies by the SRMs can lead to significant reductions in repeated file transfers over the grid. [1]

 

  • Several SRM web sites
Berkeley Storage Resource Management (SRM) Middleware Project: http://sdm.lbl.gov/projectindividual.php?ProjectID=SRM
Berkeley SRM Software Distribution page: http://sdm.lbl.gov/srm-dist/
Storage Resource Management Working Group: http://sdm.lbl.gov/srm-wg/
SRM Publications and Documents

2. HRM server at BNL

An HRM server has been deployed at RCF/ACF of BNL. From Oct. 13, 2004, the server is also open to outside users. Clients can get data from or put data to BNL HPSS via our HRM server. The benefits of using HRM to interact with HPSS have been stated in Introduction of SRM.
The hostname of our server name is srm.bnl.gov. Several components of HRM are installed on the server, including Naming Service, DRM(Disk Resource Manager), TRM(Tape Resource Manager), HRM-WSG (HRM Web service gateway), HRM-GWS(HRM-Gateway to Web Services).
There are several parameters of the server (several of them may be used in HRM-client commands):
Disk cache size: 350GB
Hostname of Naming Service: srm.bnl.gov
Port number of Naming Service: 6181
HRMObjectName: HRMServerBNL
MSS hostname: hpss.rcf.bnl.gov
Several important notes for users:
    (1) Due to security consideration, only GSI authentication way is allowed, which means that user need valid grid proxy certificate to access our HRM server. After the client authenticates himself/herself, the HRM server first invokes GSI-enabled HSI to get the information of the file in HPSS like size or tape ID, which is used for optimization. Then HRM invokes GSI-enabled PFTP to get data into or out of HPSS .

    (2) From Oct. 13, 2004, the server is also open to outside users.

    (3) Your DN needs to be in the gridmap file of our HRM server. We'll import all permitted users' DN later. Before we've done that, please email us your DN.

    (4) You need have an HPSS account at RCF/ACF.

    (5) Currently, only one HRM server is open for public use. We may have multiple HRM servers in the future.

    (5) HRM-WSG (HRM web service gateway) service is open to clients now. HRM-WSG service is used to let HRM talk to other kinds of SRM implementations, e.g., dCache SRM.

(6) There are two kinds of ways to access HRM. One is through Corba interface, e.g., Corba interface API, or HRM-client tool. The other one is through web service interface, e.g., programming web service client to invoke HRM services, or HRM-WSG Client, a simple and limited client tool provided by LBNL SRM group. For those who plan to program web services client to access HRM web service, please refer to http://sdm.lbl.gov/srm-wg/documents.html. for SRM Web service Interface Specification.

3. Installation of HRM-client package

To access our HRM service, clients need to install HRM-client package which has been developed by Scientific Data Management Group of LBNL.
Installation Steps:
(1) Please read LBNL SRM license from LBNL SRM software distribution site: http://sdm.lbl.gov/srm-dist/DRM-LICENSE

(2) If you access the HRM server from RCF/ACF nodes at BNL, then please skip Step (3)(4) and go to Step (5) directly. Otherwise, please go to Step (3).

Note: The HRM-client package has been deployed to all RCF/ACF nodes. So client from RCF/ACF nodes don't need to install it and can use HRM-client directly. The HRM-client commands are located in /user/local/bin/.

(3) Download LBNL HRM-client BINARY Package V1.2.1 depending on your choice:
Static binary for Redhat 7.3 - 3 Sep 2004:
Static binary for Redhat 8.0 - 3 Sep 2004:
Static binary for Redhat 9: New version 1.2.1 binary for RH9 is not ready yet.
Static binary for Redhat Enterprise WS - 3 Sep 2004:
http://www.atlasgrid.bnl.gov/srm/hrm-client-pkg/drm-1.2.1-bin.rhws.tar.gz
(4) Extract the installation package.
On RH73 platform,
Extract drm-1.2.1-bin.rh73.tar.gz by using commands:
gzip -d drm-1.2.1-bin.rh73.tar.gz
tar -xvf drm-1.2.1-bin.rh73.tar
Create a new directory, e.g., ./drm-rh73/, nd extract the HRM-client package to this directory . In the directory, you can see five client commands: srm-copy.linux.7.3, srm-ping.linux.7.3, srm-ls.linux.7.3, srm-get.linux.7.3 srm-status.linux.7.3.
For the convenience of usage, please make symbolic links of these client commands:
ln -s srm-copy.linux.73 srm-copy
ln -s srm-get.linux.73 srm-get
ln -s srm-ls.linux.73 srm-ls
ln -s srm-ping.linux.73 srm-ping
ln -s srm-status.linux.73 srm-status
 
On RH8 platform,
Extract drm-1.2.1-bin.rh8.tar.gz by using commands:
gzip -d drm-1.2.1-bin.rh8.tar.gz
tar -xvf drm-1.2.1-bin.rh8.tar
Suppose the HRM-client package is extracted to some directory, e.g., ./drm-rh8/. In this directory, you can see five client commands: srm-copy.linux.8, srm-ping.linux.8, srm-ls.linux.8, srm-get.linux.8, srm-status.linux.8.
For the convenience of usage, please make symbolic links of these client commands:
ln -s srm-copy.linux.8 srm-copy
ln -s srm-get.linux.8 srm-get
ln -s srm-ls.linux.8 srm-ls
ln -s srm-ping.linux.8 srm-ping
ln -s srm-status.linux.8 srm-status
On Redhat Enterprise WS,
Extract drm-1.2.1-bin.rhws.tar.gz by using commands:
gzip -d drm-1.2.1-bin.rhws.tar.gz
tar -xvf drm-1.2.1-bin.rhws.tar
Suppose the HRM-client package is extracted to some directory, e.g., ./drm-rhws/. In this directory, you can see five client commands: srm-copy.linux.ws, srm-ping.linux.ws, srm-ls.linux.ws, srm-get.linux.ws srm-status.linux.ws.
For the convenience of usage, please make symbolic links of these commands:
ln -s srm-copy.linux.ws srm-copy
ln -s srm-get.linux.ws srm-get
ln -s srm-ls.linux.ws srm-ls
ln -s srm-ping.linux.ws srm-ping
ln -s srm-status.linux.ws srm-status
(5) Download the configuration file bnl_hrm_client.rc from http://www.atlasgrid.bnl.gov/srm/hrm-client-pkg/bnl_hrm_client.rc (This file was recently updated on Oct.13 2004, if your rc file was older than that, please update it now).
Note: bnl_hrm_client.rc is a HRM-client configuration file customized to use our HRM server.
Copy it to ./drm-rh73/ on RH73 , ./drm-rh8/ on RH8, or ./drm-rhws/ on RHWS
For the convenience of usage, please make a symbolic link of rc file as hrm.rc or copy it as hrm.rc. "hrm.rc" is the default configuration file name used by HRM-client commands.
ln -s bnl_hrm_client.rc hrm.rc
or: cp bnl_hrm_client.rc hrm.rc


4. Installation of HRM-WSG Client package

HRM-WSG Client tool is a simple and limited client tool to access HRM through web service interface. Note : another way to access HRM through web service interface is to program web service client to invoke HRM web services, please refer to http://sdm.lbl.gov/srm-wg/documents.html. for SRM Web service Interface Specification.

Please go to http://sdm.lbl.gov/srm-dist/hrm-ws-bin-1.2.1.tar.gz to download Java Binary Package of HRM-WSG (which include HRM-WSG Client).

Note: the client machine need install J2SDK1.4.X for running HRM-WSG Client. Our experience is that J2SDK1.4.2_05 can NOT work well, however, J2SDK1.4.2_04 is OK. You may get J2SDK1.4.2_04 from http://java.sun.com/products/archive/j2se/1.4.2_04/index.html

5. Several notes for running HRM-client or HRM-WSG Client commands

  • Initiate a valid grid proxy before running HRM-client commands.
    Your DN needs to be in the gridmap file where HRM is running. Temporarily, we need add your DN manually. We'll import all permitted users' DN later. Before we've done that, please email us your DN and we'll add it manually.
    Since we only allow GSI authentication way to access HRM server, please make sure that you have valid grid proxy certificate before you run any HRM-client command. Our HRM server only recognizes DOE grid certificate.
    If you don't have a DOE personal certificate, please see how to get a request certificates from the DOEGrids CA from this site: http://www.doegrids.org/pages/cert-request.html .
    If you have one, please use the command "grid-proxy-init" to initiate or renew a proxy certificate.
  • Please read generic help from LBNL about HRM-client commands before you start.

    (1) Generic DataMover (HRM-client) user guide from LBNL: (HTML) (PDF)
    You will see instructions on five HRM-client commands srm-copy, srm-ping, srm-ls, srm-get, srm-status from this manual.
    Note: the most important client command is srm-copy. We suggest you to at least read the help on srm-copy command before you try HRM-client commands.
    (2) Generic DRM user guide from LBNL: (HTML) (PDF)
    Please read Chapter One and parts before Chapter One. There are samples for HRM-client commands. Please read Chapter 3. Client programs to HRM-WSG
  • You need to supply a resource configuration file to HRM-client commands.  You can use bnl_hrm_client.rc provided by us. The HRM-client commands srm-* will look for the configuration file "hrm.rc" in the running directory automatically unless you specify your configuration file via "-conf" option.
  • From Oct. 13, 2004, BNL HRM server is also open to outside users
  • The HRM-client package has been deployed to all RCF/ACF nodes. So clients from RCF/ACF nodes don't need to install the package and can use HRM-client directly. The HRM-client commands are located in /user/local/bin/ on RCF/ACF nodes.
  • The clients can just use web browsers to check the status of their data transfer jobs which have been submitted through HRM-client commands.
    File Monitoring Tool (FMT) has been set up on our HRM server. FMT is a tool to track the progress of transfers. After clients submit their data transfer jobs to HRM server, they can just use web browsers to check the status of the transfer jobs submitted by HRM-clients.
    Steps for clients to use FMT:
    * Start the HRM client programs on a client machine. For example, the client submits some data transfer jobs by using srm-copy.
    * Go to http://srm.bnl.gov:6178/fmt_applet.html by using your web browser.
    * Login window will appear in a moment and it will ask for your userID. The format for userID is as login@hostname. The logid is the login ID used to submit the HRM-client job, and the hostname is the host name of the client machine, e.g., zhliu@client.usatlas.bnl.gov.
    * Once logged in, all the requests for the user will appear in the left-side list. Click on the request list to see the monitor.
  • Now you may go to the next sections "Sample scenarios to transfer data via BNL HRM by using HRM-client tool" and "Sample scenarios to access BNL HRM through Web Service Interface by HRM-WSG Client", which will give you a quick and simple start to try our HRM service.

 

 

6. Sample scenarios to transfer data via BNL HRM by using HRM-client tool

 

Note: Please initiate a valid grid proxy before running HRM-client commands.

  • srm-ping
The first client command you may want to try is srm-ping. It can tell you whether DRM and TRM are alive and running well.
Please use the following command:
srm-ping
or: srm-ping -conf bnl_hrm_client.rc
Note: the first command "srm-ping" uses the default rc file "hrm.rc" as configuration file. The second command uses the specified rc file "bnl_hrm_client.rc" as configuration file.
If the output message is something as follows, then our HRM server is running well.
Client CONFIGURATION
HRM status:
<DRM> => alive
<TRM> =>HPSS_OKAY: 0.000 MB/sec
Wed Aug 18 13:48:56 2004
 
  • srm-ls
    Sample command:
    srm-ls -debug -s "srm://hpss.rcf.bnl.gov/home/zhliu" -at GSI
    This command lists the content of directory /home/zhliu/ in BNL HPSS.
    Note: here,
    "-debug" option enables the print-out of debugging messages, you may choose using it or not.
    "-s $source_url" specifies the source URL. Here, $source_url is "srm://hpss.rcf.bnl.gov/home/zhliu". "hpss.rcf.bnl.gov" is our MSS hostname. So the source is the file directory "/home/zhliu" in BNL HPSS.
    "-at" specifies login type to source (BNL HPSS system). The default is GSI.   So here, we can omit "-at GSI".
    If you want to view the content recursively, use the "-r" option.
    srm-ls -debug -s "srm://hpss.rcf.bnl.gov/home/zhliu" -r -at GSI

 

  • srm-get
    Note: Suppose, the following srm-get commands are submitted from client.usatlas.bnl.gov and the HRM server is running on srm.bnl.gov. The HRM server srm.bnl.gov is an HRM "local" to the client client.usatlas.bnl.gov.
To bring a file from BNL HPSS to local disk (where the client is):
Sample command:
srm-get -debug -s "srm://hpss.rcf.bnl.gov/home/zhliu/test.dat" -t "file:/home/zhliu/test.dat" -b 11 -at GSI
This command brings a MSS file from HPSS to the local disk where the client is (client.usatlas.bnl.gov).

Note: the caller is responsible for making sure the target path exists in the local client. Srm-get will NOT create directories in client directory.

Note: here,
"-debug" option enables the print-out of debugging messages, you may choose using it or not.
"-s $source_url" specifies the source URL for the request, here $source_url is "srm://hpss.rcf.bnl.gov/home/zhliu/test.dat" "hpss.rcf.bnl.gov" is our HPSS hostname. So here, the source is the file "test.dat" under the directory /home/zhliu/ in BNL HPSS system.
"-t $target_url" specifies the target URL for the request. Here, $target_url is "file:/home/zhliu/test.dat". So the target file is the file "test.dat" under the directory /home/zhliu/ on local client.
"-b $size" specifies the file size of the source file. In this example, it's 11 bytes. If the size is not specified, the default value "2GB" will be used.
"-at" specifies login type to source (BNL HPSS). The default is GSI. So here, we can omit "-at GSI"
  • srm-copy
    Note: Suppose, the following srm-copy commands are submitted from client.usatlas.bnl.gov and the HRM server is running on srm.bnl.gov. The HRM server srm.bnl.gov is an HRM "local" to the client client.usatlas.bnl.gov.
To bring a file from remote location to BNL HPSS (through gsiftp):
Sample command:
srm-copy -debug -s "gsiftp://stargrid03.rcf.bnl.gov/direct/u0b/zhliu/test_star.dat" -t "srm://hpss.rcf.bnl.gov/home/zhliu/test.dat" -b 11 -at GSI -et GSI

This command brings a data file from remote location through gsiftp and put it into BNL HPSS via our HRM.

Note: GridFTP service need to be running on remote location, i.e., stargrid03.rcf.bnl.gov in this sample command.

Note: here,
"-debug" option enables the print-out of debugging messages, you may choose using it or not.
"-s $source_url" specifies the source URL for the request. Here, $source_url is "gsiftp://stargrid03.rcf.bnl.gov/direct/u0b/zhliu/test_star.dat". So the source is the file "test_star.dat" under the directory /direct/u0b/zhliu/ on the node "stargrid03.rcf.bnl.gov"
"-t $target_url" specifies the target URL for the request. Here, $target_url is "srm://hpss.rcf.bnl.gov/home/zhliu/test.dat". "hpss.rcf.bnl.gov" is our HPSS hostname. So the target is the file "test.dat" under the directory /home/zhliu/ in the BNL HPSS system.
"-b $size" specifies the file size of the source file. In this example, it's 11 bytes. If the size is not specified, the default value "2GB" will be used.
"-at" specifies login type to source (stargrid03.rcf.bnl.gov). The default is GSI. So here, we can omit "-at GSI"
"-et" specifies login type to destination (BNL HPSS system). The default is GSI. So here, we can omit "-et GSI"

Note: if there's already a test.dat in the target directory of HPSS, then the existed file won't be overwritten by this command.

 
To bring a file from BNL HPSS to local HRM:
Sample command: