[vos-d] Van Jacobson: "named data"
Peter Amstutz
tetron at interreality.org
Mon May 7 22:57:05 EDT 2007
On Mon, May 07, 2007 at 05:51:45AM +0000, Lalo Martins wrote:
> Aaron Bentley posted to the bzr list about a Van Jacobson talk:
> > I was watching this talk by Van Jacobson about a new networking
> > paradigm, and I started going "hey, I know this stuff".
> >
> > http://video.google.com/videoplay?docid=-6972678839686672840&hl=en
> >
> > Around 37:31, he starts talking about a new dissemination mechanism in
> > which you look for named data, rather than having conversations with
> > servers.
>
> I can't actually *watch* the talk, though, as stupid google video doesn't
> work in China. If anyone is interested, can you please watch, and post a
> summary? In particular, how much it's relevant to the way we're already
> doing things ("named data" sounds a lot like "vobject" from my chair).
Summary:
First 40 or so minutes explaining why networks up until now evolved the
way they did. Circuit-oriented telephone networks evolved the way they
did due to specific ways the underlying circut switching technology
worked (going back to human operators working a switchboard!).
Packet-switched networks were revolutionary because, unlike the phone
system, they were agnostic to the underlying transport medium. TCP/IP
was designed for point-to-point communication based on the assumption
that the primary use of data networks would still be for point-to-point
"conversations". Also, TCP/IP was designed in an environment where each
computer had many users, by constrast with today, where you have many
computers per user.
The second part of the talk describes where we are today, and how
networks can be adapted to make it better. Modern Internet usage has
evolved such that the vast majority of traffic is better described as
broadcast traffic rather than point-to-point: publishing web pages,
streaming video, file sharing, even email in the case of mailing lists.
This is very inefficient if many users are requesting the same data at
once. Another problem posed by current architechtures are the
challenges of data synchronization between devices, which can also be
traced to the fact that devices are often required to synchronize on a
peer-to-peer basis, rather than having a mechanism to broadcast changes
to other devices.
The proposed solution is a bit light on details, but big on ideas: to
deal with problems of scale in the age of Internet publishing, we step
away from our notions of purely fixed-address, point-to-point
communication, and consider that in many cases, it is highly desirable
to be able to automatically replicate and propagate that data. In the
example given, when you access the New York Times (newspaper) front
page, you shouldn't care whether the actual data you get is served from
the NYT web server, or from some other downstream server that has a copy
-- provided you can verify that it originated from the NYT by checking
the digital signature. One significant idea mentioned was that, in the
way that TCP/IP abstracts the underlying physical transport layer, such
a system ought to be abstracted from the protocol layer -- so that data
can be propagated by whatever physical or virtual means are most
appropriate or available.
He points to Gnutella and Bittorrent as examples of trends in this
direction. Each system demonstrates the two key properties of this type
of approach, that once something is published and replicated a few times
it may stay in the network even if the original source is no longer
available, and that popular resources are inherently load balanced by
virtue of the fact that the more people access a resource the more
intermediate servers will have a copy. Unfortunately he didn't seem to
mention Freenet (http://freenetproject.org), which to my knowledge is
the most complete implementation of many of the ideas he's promoting.
Commentary:
This talk is primarily aimed at spurring people to do more research in
this area. For this reason, it poses many questions but provides few
concrete answers as to how such a system would be put together in
practice. He helpfully separates it out into the "easy stuff" (problems
for which reasonable solutions already exist) and the "hard stuff"
(everything else).
He doesn't really touch on the highly dynamic nature of current web
sites. When every user is served a custom web site, complete with
widgets and ads personalized to their zip code, it's much more difficult
to replicate in a useful way. Of course media (sound, images, video,
maybe 3D meshes later on) are usually not (yet) dynamically generated,
and account for quite a lot of bandwidth, so there are still gains to be
made there. Resources like HTML pages could also be divided up into
finer grained representations that distinguish static and dynamic
elements.
He does mention that timestamps and versioning would need to be an
inherent part of this system so that published resources can be updated.
It's worth noting that a key difference from caching seems to be that
this would be inherently a "push" system -- when you publish something,
you go and bang on the doors of nearby hosts and ask them to pretty
please replicate your data and pass it on if they know of any other
hosts that might be interested. This is interesting, because this kind
of "push/flood" system ends up being similar to store-and-forward
message routing as new data is directed through several hops to
eventually reach every host that's expressed an interest in that data.
How this might influence VOS:
Replication, migration and versioning are essential for long-term
scalability of a distributed system like VOS, and that VOS is in many
ways a great example of a "data dissemination" system that he talks
about. Something I've also come around to realize is that some notion
of time in the system is critical, and that "time" and "versioning" are
fairly closely related concepts when describing a series of changes to a
particular resource.
So, it is useful to think about how the s5 design will accomodate object
replication and migration and their relationship to time and versioning.
Something that we also need to consider is the fact that vobjects are
both declarative (well defined data fields, not opaque) and
computational objects. Replicating data is relatively straightforward,
but what about computation? I can think of at least three cases when
making a call on a replicated object:
- No replicated computation: no chance for local processing, always
send a message to the master vobject. Example: talk messages.
- Predictive computation: send a message, try to guess the result but
there's a chance we'll be overruled. Example: movement interpolation,
physics.
- Deterministic computation: the behavior will have effectively
identical outcome whether run in the replication local replication or
the master vobject. Example: a mouse rollover graphic effect.
To really support replication in the presence of versioning, the vobject
"descriptor" needs to incorporate time and versioning to get a fully
qualified vobject identifier. suitable for replication and caching
(including routing and security bits) might include:
* site id
* vobject id
* embedded child id
* last modification time
* version number
* capability key
* hash code
So, Lalo, this is probably a bit more than you expected :-) I think the
answer to your question ("could VOS be useful for the things Van
Jacobson talks about") is yes, if we incorporate a robust notion of time
and version as related to state changes. If anyone thinks this is
fanciful, this actually cuts right to the core of how remote vobjects
work, and how we eventually handle caching -- central issues to the s5
redesign.
--
[ Peter Amstutz ][ tetron at interreality.org ][ peter.amstutz at gdit.com ]
[Lead Programmer][Interreality Project][Virtual Reality for the Internet]
[ VOS: Next Generation Internet Communication][ http://interreality.org ]
[ http://interreality.org/~tetron ][ pgpkey: pgpkeys.mit.edu 18C21DF7 ]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.interreality.org/pipermail/vos-d/attachments/20070507/2acefc3a/attachment.pgp
More information about the vos-d
mailing list