[vos-d] Van Jacobson: "named data"
Len Bullard
cbullard at hiwaay.net
Mon May 7 23:20:55 EDT 2007
Versioning yes, but also vetting and revetting of sources. The further you
get from original sources in any communication system, the more noise you
incur without adequate checks. Shannon 101. Names alone won't do it.
I put a trivia test at my personal blog just for a "Do you trust Google and
Wikipedia" test. The problem is one of not starting from an authenticated
or original source. If you start from wikipedia to answer those questions
without the original source, you will get about half of them wrong or near
wrong.
Modern Internet traffic worries about efficiency but typically the data is
short lived. If you live where I live you get to watch a fascinating
change: NASA is hiring as many sixty and even 70 plus year old engineers as
they can find if they have actual J2 series engine experience. The original
sources and digital systems failed to keep enough documents alive. They
have the designs but like the Canadians who tried to rebuild the V2 engines
for their contest submission, they don't know how to run them and it turns
out the devil is really in the details.
len
-----Original Message-----
From: vos-d-bounces at interreality.org [mailto:vos-d-bounces at interreality.org]
On Behalf Of Peter Amstutz
Summary:
First 40 or so minutes explaining why networks up until now evolved the
way they did. Circuit-oriented telephone networks evolved the way they
did due to specific ways the underlying circut switching technology
worked (going back to human operators working a switchboard!).
Packet-switched networks were revolutionary because, unlike the phone
system, they were agnostic to the underlying transport medium. TCP/IP
was designed for point-to-point communication based on the assumption
that the primary use of data networks would still be for point-to-point
"conversations". Also, TCP/IP was designed in an environment where each
computer had many users, by constrast with today, where you have many
computers per user.
The second part of the talk describes where we are today, and how
networks can be adapted to make it better. Modern Internet usage has
evolved such that the vast majority of traffic is better described as
broadcast traffic rather than point-to-point: publishing web pages,
streaming video, file sharing, even email in the case of mailing lists.
This is very inefficient if many users are requesting the same data at
once. Another problem posed by current architechtures are the
challenges of data synchronization between devices, which can also be
traced to the fact that devices are often required to synchronize on a
peer-to-peer basis, rather than having a mechanism to broadcast changes
to other devices.
The proposed solution is a bit light on details, but big on ideas: to
deal with problems of scale in the age of Internet publishing, we step
away from our notions of purely fixed-address, point-to-point
communication, and consider that in many cases, it is highly desirable
to be able to automatically replicate and propagate that data. In the
example given, when you access the New York Times (newspaper) front
page, you shouldn't care whether the actual data you get is served from
the NYT web server, or from some other downstream server that has a copy
-- provided you can verify that it originated from the NYT by checking
the digital signature. One significant idea mentioned was that, in the
way that TCP/IP abstracts the underlying physical transport layer, such
a system ought to be abstracted from the protocol layer -- so that data
can be propagated by whatever physical or virtual means are most
appropriate or available.
He points to Gnutella and Bittorrent as examples of trends in this
direction. Each system demonstrates the two key properties of this type
of approach, that once something is published and replicated a few times
it may stay in the network even if the original source is no longer
available, and that popular resources are inherently load balanced by
virtue of the fact that the more people access a resource the more
intermediate servers will have a copy. Unfortunately he didn't seem to
mention Freenet (http://freenetproject.org), which to my knowledge is
the most complete implementation of many of the ideas he's promoting.
Commentary:
This talk is primarily aimed at spurring people to do more research in
this area. For this reason, it poses many questions but provides few
concrete answers as to how such a system would be put together in
practice. He helpfully separates it out into the "easy stuff" (problems
for which reasonable solutions already exist) and the "hard stuff"
(everything else).
He doesn't really touch on the highly dynamic nature of current web
sites. When every user is served a custom web site, complete with
widgets and ads personalized to their zip code, it's much more difficult
to replicate in a useful way. Of course media (sound, images, video,
maybe 3D meshes later on) are usually not (yet) dynamically generated,
and account for quite a lot of bandwidth, so there are still gains to be
made there. Resources like HTML pages could also be divided up into
finer grained representations that distinguish static and dynamic
elements.
He does mention that timestamps and versioning would need to be an
inherent part of this system so that published resources can be updated.
It's worth noting that a key difference from caching seems to be that
this would be inherently a "push" system -- when you publish something,
you go and bang on the doors of nearby hosts and ask them to pretty
please replicate your data and pass it on if they know of any other
hosts that might be interested. This is interesting, because this kind
of "push/flood" system ends up being similar to store-and-forward
message routing as new data is directed through several hops to
eventually reach every host that's expressed an interest in that data.
How this might influence VOS:
Replication, migration and versioning are essential for long-term
scalability of a distributed system like VOS, and that VOS is in many
ways a great example of a "data dissemination" system that he talks
about. Something I've also come around to realize is that some notion
of time in the system is critical, and that "time" and "versioning" are
fairly closely related concepts when describing a series of changes to a
particular resource.
So, it is useful to think about how the s5 design will accomodate object
replication and migration and their relationship to time and versioning.
Something that we also need to consider is the fact that vobjects are
both declarative (well defined data fields, not opaque) and
computational objects. Replicating data is relatively straightforward,
but what about computation? I can think of at least three cases when
making a call on a replicated object:
- No replicated computation: no chance for local processing, always
send a message to the master vobject. Example: talk messages.
- Predictive computation: send a message, try to guess the result but
there's a chance we'll be overruled. Example: movement interpolation,
physics.
- Deterministic computation: the behavior will have effectively
identical outcome whether run in the replication local replication or
the master vobject. Example: a mouse rollover graphic effect.
To really support replication in the presence of versioning, the vobject
"descriptor" needs to incorporate time and versioning to get a fully
qualified vobject identifier. suitable for replication and caching
(including routing and security bits) might include:
* site id
* vobject id
* embedded child id
* last modification time
* version number
* capability key
* hash code
So, Lalo, this is probably a bit more than you expected :-) I think the
answer to your question ("could VOS be useful for the things Van
Jacobson talks about") is yes, if we incorporate a robust notion of time
and version as related to state changes. If anyone thinks this is
fanciful, this actually cuts right to the core of how remote vobjects
work, and how we eventually handle caching -- central issues to the s5
redesign.
More information about the vos-d
mailing list