Perhaps this will get us on the same page.
At this time, we have 3 distinct user data stores: (1) NFS systems attached to the LSST development hardware and (2) Nebula’s cinder services and (3) storage local to VMWare instances. We also have some storage on the GPFS Condo and Nearline but, as far as I can tell, we are using these as an administrative function (secondary store of certain data ex. data challenges). In the future, we plan to add swift storage as well.
From what I can tell, developers seem to be somewhat content (or perhaps not vocally discontent) using NFS storage for their development. Some developers may be using VMware instances, some may be moving to OpenStack; the choice is theirs. In either of the VM scenarios, the user must replicate portions of the datasets from NFS (or externally if it is new data) to the VMs.
@nidever is about to start (using this timing phrase loosely, most likely meaning at some point in 2016) larger integration testing. This will necessitate a larger compute and storage infrastructure than we have now. We are planning to deploy this new infrastructure in NCPF (current dev and OpenStack are in NCSA lab). This is the planned location of the verification data sets though it need not be.
@frossie and her group are making the largest strides with cloud technologies. This is really a specific case of the VM usage mentioned above. Portions of the data sets must be copied as needed.
There is also a copy of PAN STARRs within a new Qserv environment for SUI integration but that is not necessary for this discussion.
As a side note, I would be very interested in discussing whether object stores could replace GPFS for LSST use. My gut feeling is, at this time, this would have a negative impact on productivity since current software expects a POSIX interface. Correct me as needed.
So … straw man:
New data sets land in GPFS attached to the new integration cluster.
(1) integration testing has typical ‘local’ POSIX access from new cluster
(2) portions of raw data sets are copied to NFS stores for developer use on the developer hardware
(3) VM users will copy data as needed from the NFS stores as they do now
As a consequence, the organization of data sets on the integration cluster can be independent of the organization of data on the development systems. No computed data sets are shared between the integration cluster and the development systems.
Correct as needed.