Vectorcast: A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks

Home > Security > Vectorcast: A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks

Vectorcast: A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks

January 12, 1998 Dan Kaminsky Leave a comment Go to comments

Preface

This was my Systems Programming Term Project a while back. I wrote all the
conceptual stuff, and came up with the original idea, while my group and I
collaborated on the sample implementation. The implementation is…who
knows where. This is what I have. Comments to me at [4]effugas@best.com.
Oh, most of this text was written in December of 1997, but some minor
updates have been done as of January 1998.

Introduction: Why Do We Network?

Think of a network, any network, in fact, every network. Ignore the how,
the when, the who, the whatever, just think of the why. Why do we seek to
connect one computer to another? It is not an inexpensive investment for
any institution, so how do we justify the substantial TCO (Total Cost of
Ownership) that computer networking incurs? We can derive our justification
by examining computers as an extension of those who utilize them: The
purpose of civilization is to uncouple existence with the means necessary to
sustain existence, i.e. I do not need to know how to farm in order to eat,
nor do I need to know how to fabricate a CPU in order to take advantage of
one. Computers are networked according to the same logic: Since computers
cannot self-fabricate all that is necessary for them to be most effective,
they must be able to go elsewhere for what they require. Thus, computer
networks are quite simply about getting information from point A to point B.

Unicast: The Old Standby

The major model utilized for file distribution on the global Internet, as
well as on most smaller networks, is the unicast model: One server sends a
requested file to the client who requested it. This process is repeated for
each additional client that requests the file. The more client requests,
the more files the original server must send. If an error is detected in
the transmission, such as packets received out of order, the server
automatically retransmits packets to compensate for that computer’s error.
To use a more human analogy, it’s like being a teacher who uses one on one
sessions to educate students. This has the advantage in that the teacher
can quickly deduce what concepts the student failed to grasp and teach
accordingly. While this model works for teaching a few students, it fails if
the teacher becomes responsible for hundreds of students simultaneously.

Multicast: Heir Apparent?

A newer model that is emerging is the multicast model. In the multicast
model the server sends out the data only once, and the network takes care of
sending the file to everybody who requires it. The multicast model
radically simplifies batch file transmission for the server but demands
substantial modifications in network infrastructure, as routers must be
reprogrammed to send the same packet to multiple locations. It also demands
a massive change in client behavior because the required information is no
longer available whenever the client desires it. Instead, the client must
wait until every other client is scheduled to receive. Furthermore,
individual error checking is much harder to implement because the server is
built to send the same message out to everyone. To extend our analogy, the
teacher, unable to handle the load of hundreds of students demanding
individual attention, requires that the students come to a classroom at a
specified hour and listen to the lecture. If the students aren’t there at
the specified time, they don’t get the knowledge they want. The problem is,
since the classroom is so large and the teacher wants to teach to the group
as a whole, a single student’s lack of understanding is much more difficult
to handle.

Reality: Neither Paradigm Suffices

Neither of these distribution methods effectively fulfills the design
requirement of getting needed information from point A to point B. The
unicast model makes the server a slave to the masses and the multicast model
makes the masses a slave to the server. There must be something better, and
I propose that there is: Vectorcast.

Vectorcast: An Analogy

For students, one of the most effective strategies for comprehending lessons
is asking a fellow student for assistance. If the student waited to ask the
teacher every time he/she didn’t understand a concept, they would be
spending more time waiting for the teacher to be free as opposed to
learning. Therefore, the student asks classmates a question, one of whom
might know the answer. The required information has thus been transferred
to the student, not through the original source, but from a fellow student.
This is the heart of Vectorcast.

Sharing The Load

Vectorcast is based on a simple idea that, presuming there isn’t a
considerable barrier between “vectors”, a location that was once the
destination of information should automatically become a source. Suppose
you have two computers in a lab. If computer A spends an hour downloading
an application from Japan, and computer B decides to receive that file it
should be able to download it from computer A. It is not rational to
utilize transatlantic links when there is a computer on the local subnet
that has already retrieved the desired information. Vectorcast says that not
only should the computer seize what it needs from the closest possible
source, but also that once it obtains the data, it becomes a possible
source. Hence, if a file is really popular, by definition, it is available
on a large number of computers, and thus a lot of systems will be available
to send that file out to the remaining hordes that still haven’t gotten the
file.

Order From Chaos

Another issue for vectorcast is how does each computer know which system has
what files? One computer could query every computer it knows, and request a
listing, but the network traffic caused by this arrangement increases
exponentially. Vectorcasting thus depends on a director, or a single
machine that tells all the rest of the machines where to get what they’re
looking for. The machines are still going to a single source to fulfill
requests, but the director doesn’t repeatedly distribute massive files.
Rather it tracks who has which huge files and redirects the vector to the
closest available vector with the requested data. It is a paradigm shift
from a model of the central powerful computer that provides information to
the weak ones to an egalitarian scenario where everybody serves everybody
and the only purpose of the “central” computer is to guide and list instead
of to actually distribute.

A major alternate purpose for the director is verification–how does the
client know what it is recieving is authentic? Checksum archives hosted on
directors or trusted director-networks are the solution to this. While this
increases the load somewhat, it is far more preferable to serve a 128 bit
checksum rather than a 8MB file. As a side bonus, the download of the 8MB
file can be tracked even though its being sent from an alternate location,
thus solving the cached hits quandry–the net(AOL in particular) can not run
without caching, but optional reporting of cached hits inevitably creates
unlogged downloads.

Conclusion

Vectorcast does seem to me to be the only solution to the growing demand for
bandwidth. Infinitely scalable, self-tuning(the less popular a file
becomes, the fewer people keep it around), and judicious in its use of
network resources, it should eventually overtake all other non-interactive
data transmission methodologies, and possibly some interactive ones.

I am willing to work with those implementing Vectorcast applications.
Email me.

Categories: Security

Comments (0) Trackbacks (0) Leave a comment Trackback

No comments yet.

No trackbacks yet.

Security Talks

2014

Yet Another Dan Kaminsky Talk: Hard Drive Operating Systems, Storage XOR Execution, Secure Random By Default, Cryptomnemonics, Ending Use After Free in Browsers, Fast Spoofed DDoS Tracing, NSA Crypto Fallout
Slides
2012

Black Ops: Practical System-Wide Timing Attack Defense, Real World Entropy Generation For Devices, Safe String Interpolation, Image Loads For Censorship Detection, Certificate Extraction w/ Flash Sockets, Stateless TCP Sockets
Slides
2011

Black Ops of TCP/IP 2011: Bitcoin Cloud Deanon/Data Embedding, External Interface UPNP, TCP SEQ# Attacks Revisted, Generic Password to Asymmetric Key Generation, Net Neutrality Validation
Slides
2010

Introducing The Domain Key Infrastructure:
Zero Configuration DNSSEC Serving, End-To-End Client Integration w/ UI Via OpenSSL and Secure Proxies, Federated OpenSSH, DNS over HTTP/X.509, Self-Securing URLs, Secure Scalable Email (Finally!)
Slides
Code (Phreebird Suite)
Black Hat USA Slides

Interpolique:
Where's The Safety in Type Safety?, Preventing Injection Attacks (XSS/SQL) With String Safety, Why Ease Of Use Matters, Automatic Query Parameterization, How LISP Was Right About Dynamic Scope, Dynamic DOM Manipulation For Secure Integration of Untrusted HTML
Slides Audio
Code

Realism in Web Defense:
Why Security Fails, What's Wrong With Session Management On The Web, The Failure Of Referrer Checking, Interpreter Suicide, Towards a Real Session Context, Treelocking, The Beginnings of Interpolique
Slides
2009

Staring Into The Abyss:
Middleware Fingerprinting, Firewall Rule Bypass, Internal Address Disclosure, Same Origin Attacks Against Proxied Hosts, TCP NAT2NAT via Active FTP And TCP Spoofing
Slides Paper

Black Ops Of PKI:
Structural Weaknesses of X.509, Architectural Advantages of DNSSEC, ASN.1 Confusion, Null Terminator Attacks Against Certificates
Slides Video
Financial Cryptography Paper
2008

It's The End Of The Cache As We Know It:
DNS Server+Client Cache Poisoning, Issues with SSL, Breaking “Forgot My Password” Systems, Attacking Autoupdaters and Unhardened Parsers, Rerouting Internal Traffic
Black Hat Slides
BH Fed Slides (Adds Drupal, DNSSEC)
Video Audio
"Illustrated Guide To The Kaminsky Bug"
Sarah on DNS

Ad Injection Gone Wild:
Subdomain NXDOMAIN injection for Universal Cross Site Scripting
Slides
2007

Design Reviewing The Web:
DNS Rebinding, VPN to the Browser, Provider Hostility Detection, Audio CAPTCHA Analysis
Slides Video
2006

Pattern Recognition:
Net Neutrality Violation Detection, Large Scale SSL Scanning, Securing Online Banking, Cryptomnemonics, Context Free Grammar Fuzzing, Security Dotplots
Slides
Weaponizing Noam Chomsky, or Hacking with Pattern Languages:
The Nymic Domain, XML Trees For Automatically Extracted Grammar, Syntax Highlighting for Compression Depth, Live Discovered Grammar Rendering, "CFG9000" Context Free Grammar Fuzzer, Dotplots for Format Identification and Fuzzer Guidance, Tilt Shift Dotplots, Visual Bindiff
Slides Video Code
2005:

Black Ops of TCP/IP 2005.5:
Worldwide DNS Scans, Temporal IDS Evasion, the Sony Rootkit, MD5 Conflation of Web Pages
Slides Video
2004:

MD5 To Be Considered Harmful Someday:
Applied Attacks Against Simple Collisions Via Malicious Appendage, Executable Confusion, Auditor Bypass, Bit Commitment Shirking, HMAC Implications, Collision Steganography, P2P Attacks Against Kazaa Hash
Slides Paper
Code (Confoo)
Code (Stripwire)

Black Ops of DNS:
Tunneling Audio, Video, and SSH over DNS
Slides Audio
Code (OzymanDNS 0.1)
Code (OzymanDNS 0.1 for Windows)
2003:

Stack Black Ops:
Generic ActiveX, SQL for Large Network Scans, Bandwidth Brokering, SSL for IDS’s
Slides Audio
Code (Paketto Keiretsu 2.00pre5)
2002:

Black Ops of TCP/IP:
High Speed Scanning, Parasitic Traceroute, TCP NAT2NAT
Slides Audio 1 Audio 2
Code (Paketto Keiretsu 1.01)
2001:

Gateway Cryptography:
SSH Dynamic Forwarding, Securing Meet-In-The-Middle, PPTP over SSH
Slides Audio
SSH Cheat Sheet

Dan Kaminsky's Blog

Vectorcast: A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks

Leave a comment Cancel reply

Email Subscription

Contact Information

Major Projects

Security Talks

Other Research

@dakami

Login

Dan Kaminsky's Blog

Vectorcast: A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks

Share this:

Leave a comment Cancel reply

Email Subscription

Contact Information

Major Projects

Security Talks

Other Research

@dakami

Login