Saturday, February 07, 2004

More on Pass-by-Reference

Stefan Tilkov made an interesting point, " I believe anytime you make a copy of some data - which is unavoidable if caller and callee don't reside in the same address space - you can't do pass by reference."

Here are some things that I've been thinking about:

----
Pass by Value implies the creation of multiple copies of the data and any changes made to one copy do not propagate to the other copies.

Pass by Reference implies a single 'master' instance of the data; thus all changes made to one copy apply to all references.
----
Pass by Value implies that the data 'value' is created at the time determined by the sender.

Pass by Reference implies the data 'value' is populated at the time determined by the receiver.
----
Pass by Value implies that the copies of the data were a snapshot in time. Thus the data may become stale between accesses.

Pass by Reference implies that the data is 'current' - however, it may not reflect the state of the data as intended by the sender.

Both Pass by Reference and Pass by Value can use a 'leasing' mechanism to deal with stale information or stale resources.
----
Pass by Value implies that the contents of the message is data. (Out of scope: mobile agents and Jini style proxies)

Pass by Reference implies that the contents of the message is either a pointer to data, a pointer to functionality or both.
----
Pass by Value implies that the contents of the message are fully contained within the message, thus you pass *everything*

Pass by Reference implies that the contents of the message are not contained within the message and the recipient can choose at runtime which piece of the data they are interested in.
----
Pass by Value can be implemented in a single call.

Pass by Reference requires at least 2 calls; the first one sends the pointer, the second call retrieves the data (by value)
----
Pass by Value implies that the message contains the exact information desired by the sender (maintains data integrity)

Pass by Reference does not guarantee that the data being passed is that which was desired by the sender. This is due to the potential modifications of the data between 'send time' and 'retrieve time'. This problem can be overcome by using some sort of locking or checksum mechanism.
----

- I like the idea of not passing around 'fat messages' (extra information that the ultimate receive may not be interested in)
- In some cases, I like the idea of passing pointers to data.
- In some cases, I like the idea of passing XQuery expressions that point at virtual endpoints where the predicate is populated by the ultimate receiver and a lease verifies that the data/resource isn't stale. Here, I'm not sending the data - I'm sending a partial evaluation expression for the data.

However, you'll notice that this style of programming is starting to sound like what we did in the client/server days of stuffing all of our messages in our relational database and then passing around SQL statements that referenced the data. Issues here too.

Stefan is right in that remotely referenced information eventually becomes a *value*. But the Pass by Reference model has additional attributes that should be considered at design time. It is my opinion that PbV and PbR will both exist in the web services world and should be part of the toolbelt of the service designer. The aforementioned differences should be weighed when deciding between them.

No comments: