Vectorcast: A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks
This was my Systems Programming Term Project a while back. I wrote all the
conceptual stuff, and came up with the original idea, while my group and I
collaborated on the sample implementation. The implementation is…who
knows where. This is what I have. Comments to me at email@example.com.
Oh, most of this text was written in December of 1997, but some minor
updates have been done as of January 1998.
Introduction: Why Do We Network?
Think of a network, any network, in fact, every network. Ignore the how,
the when, the who, the whatever, just think of the why. Why do we seek to
connect one computer to another? It is not an inexpensive investment for
any institution, so how do we justify the substantial TCO (Total Cost of
Ownership) that computer networking incurs? We can derive our justification
by examining computers as an extension of those who utilize them: The
purpose of civilization is to uncouple existence with the means necessary to
sustain existence, i.e. I do not need to know how to farm in order to eat,
nor do I need to know how to fabricate a CPU in order to take advantage of
one. Computers are networked according to the same logic: Since computers
cannot self-fabricate all that is necessary for them to be most effective,
they must be able to go elsewhere for what they require. Thus, computer
networks are quite simply about getting information from point A to point B.
Unicast: The Old Standby
The major model utilized for file distribution on the global Internet, as
well as on most smaller networks, is the unicast model: One server sends a
requested file to the client who requested it. This process is repeated for
each additional client that requests the file. The more client requests,
the more files the original server must send. If an error is detected in
the transmission, such as packets received out of order, the server
automatically retransmits packets to compensate for that computer’s error.
To use a more human analogy, it’s like being a teacher who uses one on one
sessions to educate students. This has the advantage in that the teacher
can quickly deduce what concepts the student failed to grasp and teach
accordingly. While this model works for teaching a few students, it fails if
the teacher becomes responsible for hundreds of students simultaneously.
Multicast: Heir Apparent?
A newer model that is emerging is the multicast model. In the multicast
model the server sends out the data only once, and the network takes care of
sending the file to everybody who requires it. The multicast model
radically simplifies batch file transmission for the server but demands
substantial modifications in network infrastructure, as routers must be
reprogrammed to send the same packet to multiple locations. It also demands
a massive change in client behavior because the required information is no
longer available whenever the client desires it. Instead, the client must
wait until every other client is scheduled to receive. Furthermore,
individual error checking is much harder to implement because the server is
built to send the same message out to everyone. To extend our analogy, the
teacher, unable to handle the load of hundreds of students demanding
individual attention, requires that the students come to a classroom at a
specified hour and listen to the lecture. If the students aren’t there at
the specified time, they don’t get the knowledge they want. The problem is,
since the classroom is so large and the teacher wants to teach to the group
as a whole, a single student’s lack of understanding is much more difficult
Reality: Neither Paradigm Suffices
Neither of these distribution methods effectively fulfills the design
requirement of getting needed information from point A to point B. The
unicast model makes the server a slave to the masses and the multicast model
makes the masses a slave to the server. There must be something better, and
I propose that there is: Vectorcast.
Vectorcast: An Analogy
For students, one of the most effective strategies for comprehending lessons
is asking a fellow student for assistance. If the student waited to ask the
teacher every time he/she didn’t understand a concept, they would be
spending more time waiting for the teacher to be free as opposed to
learning. Therefore, the student asks classmates a question, one of whom
might know the answer. The required information has thus been transferred
to the student, not through the original source, but from a fellow student.
This is the heart of Vectorcast.
Sharing The Load
Vectorcast is based on a simple idea that, presuming there isn’t a
considerable barrier between “vectors”, a location that was once the
destination of information should automatically become a source. Suppose
you have two computers in a lab. If computer A spends an hour downloading
an application from Japan, and computer B decides to receive that file it
should be able to download it from computer A. It is not rational to
utilize transatlantic links when there is a computer on the local subnet
that has already retrieved the desired information. Vectorcast says that not
only should the computer seize what it needs from the closest possible
source, but also that once it obtains the data, it becomes a possible
source. Hence, if a file is really popular, by definition, it is available
on a large number of computers, and thus a lot of systems will be available
to send that file out to the remaining hordes that still haven’t gotten the
Order From Chaos
Another issue for vectorcast is how does each computer know which system has
what files? One computer could query every computer it knows, and request a
listing, but the network traffic caused by this arrangement increases
exponentially. Vectorcasting thus depends on a director, or a single
machine that tells all the rest of the machines where to get what they’re
looking for. The machines are still going to a single source to fulfill
requests, but the director doesn’t repeatedly distribute massive files.
Rather it tracks who has which huge files and redirects the vector to the
closest available vector with the requested data. It is a paradigm shift
from a model of the central powerful computer that provides information to
the weak ones to an egalitarian scenario where everybody serves everybody
and the only purpose of the “central” computer is to guide and list instead
of to actually distribute.
A major alternate purpose for the director is verification–how does the
client know what it is recieving is authentic? Checksum archives hosted on
directors or trusted director-networks are the solution to this. While this
increases the load somewhat, it is far more preferable to serve a 128 bit
checksum rather than a 8MB file. As a side bonus, the download of the 8MB
file can be tracked even though its being sent from an alternate location,
thus solving the cached hits quandry–the net(AOL in particular) can not run
without caching, but optional reporting of cached hits inevitably creates
Vectorcast does seem to me to be the only solution to the growing demand for
bandwidth. Infinitely scalable, self-tuning(the less popular a file
becomes, the fewer people keep it around), and judicious in its use of
network resources, it should eventually overtake all other non-interactive
data transmission methodologies, and possibly some interactive ones.
I am willing to work with those implementing Vectorcast applications.