WebDAV vs HTTP method semantics

Jim Whitehead ejw@cse.ucsc.edu
Mon, 27 Aug 2001 10:52:40 -0700


I agree with Roy's reply -- this post should be viewed as building upon
Roy's.

Mark Baker writes:
> My hypothesis is that, at least in theory, all operations with side
> effects on resources can be represented with only PUT or POST.  Here's
> an attempt to recast LOCK, COPY, and MOVE in that way.

An early proposal I developed for WebDAV (unfortunately, one I never put
online), simply added properties to Web resources, and an additional
"test-and-set" operation for properties. Locking could be implemented using
the test-and-set protocol. At the time, I was influenced by the NUCM system
<www.cs.colorado.edu/serl/cm/NUCM.html> which provided a very generic
repository that versioning clients could use. The drawback of this scheme is
that, in order to have interoperability, all clients need to use the same
conventions when interacting with the repository (i.e., they need to
represent version histories the same, configurations the same, etc.)
Furthermore, since the repository doesn't know ahead of time how common
abstractions (version histories, configurations, logical changes,
workspaces, etc.) are represented, it cannot be optimized for them. Of
course, the big benefit is that many different versioning styles can be
accommodated by the same system at the same time, which is not the case if
you hard-wire specific representations into the system.

> I believe MOVE would simply be COPY+DELETE, aka GET+PUT+DELETE.

Nope. We learned this one the hard way in DAV. There are two reasons why you
really, really do not want to define MOVE as COPY + DELETE.

1) Multiple containment.  Most of the time we think of a resource as being a
member of only one collection. But, for many reasons, including creating
multiple hierarchies for the same set of resources, and representing
workspaces and configurations in CM systems, you want multiple containment,
where a resource is contained in multiple collections simultaneously.

Consider:

     C1      C2     C3
    / | \   / |
   /  |  \ /  |
  R1 R2  R3  R4

That is, there are three collections, C1, C2, C3, and four resources, R1,
R2, R3, R4. The membership of each collection is:

C1 = {R1, R2, R3}
C2 = {R3, R4}
C3 = {}

Now, consider the outcome of the two definitions of MOVE:

1.1) MOVE = make resource available under new name

This allows the server to do pointer manipulation.

MOVE C2/R3 to C3

     C1      C3     C2
    / | \   /        |
   /  |  \ /         |
  R1 R2  R3         R4

(I swapped the position of C3 and C2 since it was easier to draw.)

1.2) MOVE = copy + delete

MOVE C2/E3 to C3

Outcome depends on semantics of delete.

1.2.1) delete = remove identifier from collection

     C1      C2     C3
    / | \     |      |
   /  |  \    |      |
  R1 R2  R3  R4     R5

There are now five resources. The fact that C1 and C3 contain the same
resource is now lost. That is, the new resource has different *identity*
than the original.

1.2.2) delete = eliminate resource's state from the system, and remove
identifier from collection(s)

     C1      C2     C3
    / |       |      |
   /  |       |      |
  R1 R2      R4     R5

There are now four resources. The MOVE command had a (possibly unintended)
side-effect on collection C1. The new resource has different identity than
the original (which has now gone away).

2) Versioning. It is common in versioning systems to separate the "resource
being edited" from the "resource(s) that represent the version history".
Think of RCS. You don't directly edit the ",v" file, instead you edit the
file checked-out from the ",v" file.

DeltaV's data model is slightly more complex than RCS - it has one resource
per revision, one resource representing the version history, and multiple
"handle" resources for each revision history. Editing operations are
performed on the handles, which exist in the user-editable URL space.

OK, let's revisit the example, adding versioning:

     C1      C2     C3
    / | \   / |
   /  |  \ /  |
  R1 R2  R3  R4
          ^
          ^
         V-H
          +- R3.rev1
          +- R3.rev2
          +- R3.rev3

That is, R3 is the "handle" for making edits to a revision history. The
revision history is represented by V-H. V-H points off to three revision
resources, R3.rev1, R3.rev2, R3.rev3.

Now, let's try the two different move semantics again:

2.1) MOVE = make resource available under new name

MOVE C2/R3 to C3

     C1      C3     C2
    / | \   /        |
   /  |  \ /         |
  R1 R2  R3         R4
          ^
          ^
         V-H
          +- R3.rev1
          +- R3.rev2
          +- R3.rev3

Again, I has swapped C2 and C3 for ease of drawing.

Under these semantics, the server only does pointer manipulations, and is
able to preserve the mapping between the handle, R3, and its version history
and associated revisions.

2.2.1) MOVE = COPY + DELETE (where delete is "remove identifier from
collection", and copy is defined as "duplicate the resource state and
properties at the destination")

MOVE C2/R3 to C3

     C1      C2     C3
    / | \     |      |
   /  |  \    |      |
  R1 R2  R3  R4     R5
          ^
          ^
         V-H
          +- R3.rev1
          +- R3.rev2
          +- R3.rev3

Careful readers saw this one coming from miles away: the connection between
the handle and its revision history has ben severed. R5 is no longer under
version control.

2.2.2) MOVE = COPY + DELETE (where deleted is "remove the state of the
destination from the repository, plus pointer cleanup, and copy is
"duplicate the resource state and properties at the destination").

MOVE C2/R3 to C3

     C1      C2     C3
    / |       |      |
   /  |       |      |
  R1 R2      R4     R5

The worst possible outcome. R5 is severed from its version history, and the
version history itself is no longer linked to any user-visible object. Some
systems, seeing no further links to the version history, will have performed
garbage collection and expunged the version history, and revisions.



Conclusions:

1. To achieve desirable outcomes for multiple containment cases, clients
must have a way of clearly indicating that they want move semantics
2. The server must be free to implement move using pointer manipulations
3. Defining MOVE as COPY + DELETE is often the *worst* possible semantic,
leading to the most undesirable outcome in several scenarios.
4. Defining MOVE as COPY + DELETE unnecessarily couples the semantics of
MOVE and DELETE.

Now you're starting to get a glimpse of why COPY and MOVE were surprisingly
difficult to get right in 2518. :-) The other surprising thing is that, even
though this information is relatively obvious (once you've banged your head
against it for a few years), I have never seen a detailed containment model
describe anywhere, nor have I found a good description of why move = copy +
delete is a bad idea for document management systems.

- Jim





gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.