In order to protect both the privacy and anonymity of Internet communication against both eavesdropping and traffic analysis, Onion Routing employs a route selection algorithm to generate random routes for communicating endpoints. The goal of the route selection algorithm is to produce a route for a source to reach its destination, while keeping the length within a pre-specified minimum and maximum number of hops. Further, the algorithm must be non-deterministic, such that given the same input on subsequent invocations the algorithm should return different paths with an evenly distributed probability of repeating routes.

In order to analyze different route selection algorithms, we view the system as a graph, where all nodes and edges in the system are known. During initialization and at event-driven intervals, node and edge information propagates through the network allowing each node to keep track of the current state of the network by updating local adjacency list information. In addition to this information, each node distributes its access control policy, which specifies a node's willingness to be an exit point from the network. Currently, policies are specified via a white-list/black-list scheme. For example, a node may have the following white-list/black-list:

```
+ 1-1000:.itd.nrl.navy.mil, 20-30,80:.edu, 80:.com
- 1-1000:.foo.bar.com, 80:.onion-router.net
```

This access policy states that this node will act as an exit point
for any itd.nrl.navy.mil site, .edu site, or .com site wishing to
communicate on the specified ports. This node will NOT be an exit
point for any site ending in foo.bar.com or onion-router.net on the
specified ports. Any discrepancies or unspecified sites or ports is
handled by assuming it is in the black list. In the above example,
*23:.onion-router.net* is not specified and therefore is not
allowed. Currently, the grammar for specifying access control lists
is as follows:

```
lines: lines line
| line
line: "acl" BOOLEAN items
items: items ";" item
| item
item: port_question ":" address_list
port_question: port_list
| "*"
port_list: port_list "," INTEGER
| port_list "," INTEGER "-" INTEGER
| INTEGER "-" INTEGER
| INTEGER
address_list: address_list "," hostdom
| hostdom
hostdom: HOSTNAME
| DOMAIN
| ADDRESS
| "*"
INTEGER: [1-9][0-9]*
BOOLEAN: "true"
| "t"
| "yes"
| "false"
| "f"
| "no"
HOSTNAME: [a-zA-Z][^ \t\n]*
DOMAINNAME: \.[a-zA-Z][^ \t\n]*
| [0-9]+"."[0-9]+"."[0-9]+"."[0-9]+"/"[1-9][0-9]*
ADDRESS: [0-9]+"."[0-9]+"."[0-9]+"."[0-9]+
```

Thus, access control lists will be used to select an eligible destination point. Selecting a destination site is likely to be a recursive function that utilizes a hash table. For example, suppose a user wants to talk to itd.nrl.navy.mil, we first call the recursive function with the full destination: "itd.nrl.navy.mil." Using a hash function, we can locate the list of nodes in the system that will act as an exit point for "itd.nrl.navy.mil."

- If this list is empty, then we recurse with the substring ".nrl.navy.mil" to find any site that is willing to be an exit node for this site. If we fail to find any nodes, we recurse again with the substring ".navy.mil.", and then with ".mil." If all attempts fail and the string "*" is not included in the white list, then the request is denied.
- Otherwise, the list returned is not empty. At this point, we remove any possible exit nodes containing the destination string or substring as a black list entry. Then, we randomly select an exit point node (destination) from the remaining list of nodes which contain the destination string or substring as a white list entry. Using the results of computing the single-source shortest-path, we eliminate the randomly selected node if its shortest path exceeds the maximum number of hops specified by the source. We repeat this until we find an eligible node. If the list is empty before we find an eligible node, then we return to step 1.

In our initial discussions of access control lists, we have identified one possible vulnerability. Since the goal is to find a destination point with the longest matching canonical name, it is likely that there will be favoritism toward selecting certain exit points. Thus, it may be desirable to require that there are at least a certain number of eligible destination points from which one is randomly selected. Any ideas for determining this lower bound list requirement would be greatly appreciated. For our initial investigation of route selection algorithms, we have not employed the use of access control lists. Therefore, we assume that an eligible destination point has already been selected. We will address access control lists in the near future.

At this point the communicating source node knows adjacency list information of the entire system and the destination point for sending data. Using the adjacency list information, each of our potential route selection algorithms requires that the source node computes (and maintains) single-source shortest-path (the path from itself to every other node in the system). Currently, single-source-shortest-path is calculated using randomly assigned edge weights. We have opted to not use link capacities to weight graph edges as we are uncertain about the security ramifications of this.

The route selection algorithm will most likely be called with the
parameters: *src, dest, hardMin, softMin, hardMax and softMax*.
The *src* parameter is the node initiating communication and the
*dest* parameter is the destination selected as a suitable exit
point from the network. The parameters *hardMin* and
*hardMax* correspond to the absolute minimum and maximum number
of hops in the route from *src* to *dest*. Typically, we
will want a route that is at least 6 hops (hardMin), because fewer
than 6 hops yields a higher probability of determining the
communicating end points. Likewise, we will want to impose a limit on
the number of hops (hardMax), because an enormously long route hinders
performance. The *softMin* and *softMax* parameters
correspond to the "desired" bounds of the length of the route. Thus,
if the route generated is not within the bounds of *softMin* and
*softMax*, then it will only be discarded if it exceeds our
absolute bounds. The goal is to generate the "bulk" of routes within
the confines of the specified *softMin* and *softMax*, and
few will fall outside of these bounds. The specification of these
bounds can be viewed as a tradeoff between performance and security.
Note, the parameters of this algorithm may change and suggestions are
welcome. The goal is to make the route selection algorithm API
generic enough to allow different route selection algorithms to be
used without changing much of the code. For simplicity, the
pseudocode of the algorithms below will only utilize the parameters:
*src, dest, min,* and *max* (*i.e.,* the soft and hard
bounds are equal).

It is likely that the graph will be fairly dense (edge-rich), however, our simulations utilize five graphs with different topologies. This allows us to investigate the behavior of each route selection algorithm under other circumstances which may occur due to link or node failures in the network. Currently, we have two algorithms to generate random targeted routes. To compare these algorithms, we have five, undirected twenty-node graphs.

**Three-Connected:**Our first graph has many edges, though not a complete graph (clique). We refer to this graph as a three-connected graph, because three is the minimum number of edges that if removed, disconnects the graph. Thus, no node in the graph has a degree (number of incident edges) less than three.**Tree:**Our second graph is a tree, a connected, acyclic graph. This graph was selected to analyze each algorithm's behavior on a network with very few edges.**Bridge:**This graph contains a "central" bridge. A bridge is an edge that if removed, disconnects the graph. Both sides of this "central" bridge, contain regions with many edges as well as nodes that have a degree of one (leaf nodes).**Four-Connected:**The topology of this graph is likely to be the norm for network topologies. This graph contains two highly connected areas which are joined by a few edges.**Clique:**Our final graph is a clique, where every node in the graph is connected by an edge to every other node in the graph. This topology is also known as a complete graph.

**Intermediate Node Algorithm:** In our first algorithm, a node
is randomly selected that is neither the source nor the destination
node. Using the adjacency list information, we calculate the shortest
path from the source node to the randomly selected node and the
shortest path from the randomly selected node to the destination node.
If the total number of hops necessary to pass through this randomly
selected node is within our Max and Min boundaries, then we are done.
Otherwise, we continue to select other nodes in the graph to be the
random node in the "middle." Naturally, when this loop iterates the
random selection of the node would not select any previously selected
node that failed to meet the boundary limits. The pseudocode below is
probably easier to follow:

```
do {
X = randomly select a node in the graph;
/* where X != src, X != dest and X does not equal any
previously rejected nodes */
total number of hops = shortest-path from src to X
+ shortest-path from dest to X;
} while( Min > total number of hops || total number of hops > Max );
```

Our simulation executed the above algorithm 1000 times for each graph topology with randomly assigned edge weights which were regenerated (assignment of new random edge weights) every 200 runs. The frequency of edge weight regeneration was picked arbitrarily, but can be a configurable parameter if this algorithm is employed in Onion Routing.

Though this may seem odd, the results on the following pages did not specify a min or max. The reason we did not impose soft or hard bounds on the length of the route is because we want to see all routes returned by the algorithm before we use the bounds to eliminate certain routes.

**Random Walk Algorithm:** Our second algorithm combines random
walking of the graph with a shortest-path segment to keep the total
number of hops under the pre-specified maximum. Unlike the previous
algorithm, this algorithm requires that all-pairs shortest-paths is
computed (instead of single-source shortest-path). The algorithm is
as follows. A route length is randomly selected. In our algorithm
below, we selected a number, *Length*, between *Min* and
*Max*, where *Min* is equal to 6 and *Max* is equal to
twice the number of nodes in the graph (2*20). Then, a coin is tossed
to determine whether the random walk from the source or the random
walk from the destination takes one random hop to an adjacent node.
The coin may or may not be a fair coin. That is, it may be desirable
to have more random walking close to the source node or close to the
destination node. In either case the coin toss can easily reflect
this preference. In the algorithm below, we use a fair coin to select
which random walk proceeds. After an adjacent node is selected, the
total number of hops including this new node plus the shortest-path to
connect the two random walks is computed before the node is
incorporated into the route. The above steps continue until the route
length is reached or once the next random step exceeds the route
length. If the later occurs, then we simply back up one hop (do not
include the last random hop in the route). The pseudocode for this
algorithm is below (for simplicity, storing the actual route in a data
structure has not been included).

```
LengthFromSrc = 0;
LengthFromDest = 0;
TotalNumberHops = 0;
X = SRC; /*Last Node Visited from Random walk starting at SRC;*/
Y = DEST; /*Last Node Visited from Random walk starting at DEST;*/
/* Randomly select a route length */
do {
Length = rand( ) % Max;
while( Length < Min );
while( TotalNumberHops < Length ) {
Next = Toss Coin to Pick Random Walk from Src or from Dest;
if( Next == RandWalkFromSrc ) {
Z = Randomly select an adjacent node to X;
TotalNumberHops = 1 + LengthFromSrc + LengthFromDest
+ shortest-path from Z to Y;
if( TotalNumberHops > Length )
break;
X = Z; /*include the node in the route*/
Store X in the route data structure
LengthFromSrc++;
}
else { /* Next = RandWalkFromDest */
Z = Randomly select an adjacent node to Y;
TotalNumberHops = 1 + LengthFromSrc + LengthFromDest
+ shortest-path from Z to X;
if( TotalNumberHops > Length )
break;
Y = Z;
Store Y in the route data structure
LengthFromDest++;
}
}
```

The number of unique paths generated by this algorithm is approximately 13 times the length of routes generated by the Intermediate Node Algorithm. In many cases, the Random Walk Algorithm did not repeat any routes over 1000 executions for every source and destination pair. Due to the larger number of routes generated by this algorithm, we only displayed two communicating pairs for each graph.

We have a few noteworthy extensions and future experiments that we wish to investigate.

- Modify the Intermediate Node algorithm to allow an arbitrary number of intermediate nodes.
- Experiment with the frequency of regenerating edge weights.
- Execute the above algorithms with other network topologies
- Experiment with Min and Max boundaries

Historical page reflecting onion-router.net as of 2005, not regularly
maintained. Address questions to Paul Syverson.