Investigation of Route Selection Algorithms

In order to protect both the privacy and anonymity of Internet communication against both eavesdropping and traffic analysis, Onion Routing employs a route selection algorithm to generate random routes for communicating endpoints. The goal of the route selection algorithm is to produce a route for a source to reach its destination, while keeping the length within a pre-specified minimum and maximum number of hops. Further, the algorithm must be non-deterministic, such that given the same input on subsequent invocations the algorithm should return different paths with an evenly distributed probability of repeating routes.

In order to analyze different route selection algorithms, we view the system as a graph, where all nodes and edges in the system are known. During initialization and at event-driven intervals, node and edge information propagates through the network allowing each node to keep track of the current state of the network by updating local adjacency list information. In addition to this information, each node distributes its access control policy, which specifies a node's willingness to be an exit point from the network. Currently, policies are specified via a white-list/black-list scheme. For example, a node may have the following white-list/black-list:


	+ 1-1000:.itd.nrl.navy.mil, 20-30,80:.edu, 80:.com
	- 1-1000:.foo.bar.com, 80:.onion-router.net

This access policy states that this node will act as an exit point for any itd.nrl.navy.mil site, .edu site, or .com site wishing to communicate on the specified ports. This node will NOT be an exit point for any site ending in foo.bar.com or onion-router.net on the specified ports. Any discrepancies or unspecified sites or ports is handled by assuming it is in the black list. In the above example, 23:.onion-router.net is not specified and therefore is not allowed. Currently, the grammar for specifying access control lists is as follows:


     lines:  lines line
           | line

     line:  "acl" BOOLEAN items
     
     items:  items ";" item
           | item

     item:  port_question ":" address_list

     port_question:  port_list
                   | "*"

     port_list:  port_list "," INTEGER
	       | port_list "," INTEGER "-" INTEGER
	       | INTEGER "-" INTEGER
               | INTEGER

     address_list:  address_list "," hostdom
                  | hostdom

     hostdom:  HOSTNAME
             | DOMAIN
	     | ADDRESS
	     | "*"

     INTEGER:  [1-9][0-9]*

     BOOLEAN:  "true"
             | "t"
	     | "yes"
	     | "false"
	     | "f"
	     | "no"     

     HOSTNAME: [a-zA-Z][^ \t\n]*

     DOMAINNAME:  \.[a-zA-Z][^ \t\n]*
            | [0-9]+"."[0-9]+"."[0-9]+"."[0-9]+"/"[1-9][0-9]*

     ADDRESS:  [0-9]+"."[0-9]+"."[0-9]+"."[0-9]+

Thus, access control lists will be used to select an eligible destination point. Selecting a destination site is likely to be a recursive function that utilizes a hash table. For example, suppose a user wants to talk to itd.nrl.navy.mil, we first call the recursive function with the full destination: "itd.nrl.navy.mil." Using a hash function, we can locate the list of nodes in the system that will act as an exit point for "itd.nrl.navy.mil."

If this list is empty, then we recurse with the substring ".nrl.navy.mil" to find any site that is willing to be an exit node for this site. If we fail to find any nodes, we recurse again with the substring ".navy.mil.", and then with ".mil." If all attempts fail and the string "*" is not included in the white list, then the request is denied.
Otherwise, the list returned is not empty. At this point, we remove any possible exit nodes containing the destination string or substring as a black list entry. Then, we randomly select an exit point node (destination) from the remaining list of nodes which contain the destination string or substring as a white list entry. Using the results of computing the single-source shortest-path, we eliminate the randomly selected node if its shortest path exceeds the maximum number of hops specified by the source. We repeat this until we find an eligible node. If the list is empty before we find an eligible node, then we return to step 1.

In our initial discussions of access control lists, we have identified one possible vulnerability. Since the goal is to find a destination point with the longest matching canonical name, it is likely that there will be favoritism toward selecting certain exit points. Thus, it may be desirable to require that there are at least a certain number of eligible destination points from which one is randomly selected. Any ideas for determining this lower bound list requirement would be greatly appreciated. For our initial investigation of route selection algorithms, we have not employed the use of access control lists. Therefore, we assume that an eligible destination point has already been selected. We will address access control lists in the near future.

At this point the communicating source node knows adjacency list information of the entire system and the destination point for sending data. Using the adjacency list information, each of our potential route selection algorithms requires that the source node computes (and maintains) single-source shortest-path (the path from itself to every other node in the system). Currently, single-source-shortest-path is calculated using randomly assigned edge weights. We have opted to not use link capacities to weight graph edges as we are uncertain about the security ramifications of this.

The route selection algorithm will most likely be called with the parameters: src, dest, hardMin, softMin, hardMax and softMax. The src parameter is the node initiating communication and the dest parameter is the destination selected as a suitable exit point from the network. The parameters hardMin and hardMax correspond to the absolute minimum and maximum number of hops in the route from src to dest. Typically, we will want a route that is at least 6 hops (hardMin), because fewer than 6 hops yields a higher probability of determining the communicating end points. Likewise, we will want to impose a limit on the number of hops (hardMax), because an enormously long route hinders performance. The softMin and softMax parameters correspond to the "desired" bounds of the length of the route. Thus, if the route generated is not within the bounds of softMin and softMax, then it will only be discarded if it exceeds our absolute bounds. The goal is to generate the "bulk" of routes within the confines of the specified softMin and softMax, and few will fall outside of these bounds. The specification of these bounds can be viewed as a tradeoff between performance and security. Note, the parameters of this algorithm may change and suggestions are welcome. The goal is to make the route selection algorithm API generic enough to allow different route selection algorithms to be used without changing much of the code. For simplicity, the pseudocode of the algorithms below will only utilize the parameters: src, dest, min, and max (i.e., the soft and hard bounds are equal).

It is likely that the graph will be fairly dense (edge-rich), however, our simulations utilize five graphs with different topologies. This allows us to investigate the behavior of each route selection algorithm under other circumstances which may occur due to link or node failures in the network. Currently, we have two algorithms to generate random targeted routes. To compare these algorithms, we have five, undirected twenty-node graphs.

Three-Connected: Our first graph has many edges, though not a complete graph (clique). We refer to this graph as a three-connected graph, because three is the minimum number of edges that if removed, disconnects the graph. Thus, no node in the graph has a degree (number of incident edges) less than three.
Tree: Our second graph is a tree, a connected, acyclic graph. This graph was selected to analyze each algorithm's behavior on a network with very few edges.
Bridge: This graph contains a "central" bridge. A bridge is an edge that if removed, disconnects the graph. Both sides of this "central" bridge, contain regions with many edges as well as nodes that have a degree of one (leaf nodes).
Four-Connected: The topology of this graph is likely to be the norm for network topologies. This graph contains two highly connected areas which are joined by a few edges.
Clique: Our final graph is a clique, where every node in the graph is connected by an edge to every other node in the graph. This topology is also known as a complete graph.

Intermediate Node Algorithm: In our first algorithm, a node is randomly selected that is neither the source nor the destination node. Using the adjacency list information, we calculate the shortest path from the source node to the randomly selected node and the shortest path from the randomly selected node to the destination node. If the total number of hops necessary to pass through this randomly selected node is within our Max and Min boundaries, then we are done. Otherwise, we continue to select other nodes in the graph to be the random node in the "middle." Naturally, when this loop iterates the random selection of the node would not select any previously selected node that failed to meet the boundary limits. The pseudocode below is probably easier to follow:


     do {
	X = randomly select a node in the graph; 
	    /* where X != src, X != dest and X does not equal any 
	       previously rejected nodes */
	total number of hops = shortest-path from src to X
       			     + shortest-path from dest to X;
     } while( Min > total number of hops || total number of hops > Max );

Our simulation executed the above algorithm 1000 times for each graph topology with randomly assigned edge weights which were regenerated (assignment of new random edge weights) every 200 runs. The frequency of edge weight regeneration was picked arbitrarily, but can be a configurable parameter if this algorithm is employed in Onion Routing.

Though this may seem odd, the results on the following pages did not specify a min or max. The reason we did not impose soft or hard bounds on the length of the route is because we want to see all routes returned by the algorithm before we use the bounds to eliminate certain routes.

Random Walk Algorithm: Our second algorithm combines random walking of the graph with a shortest-path segment to keep the total number of hops under the pre-specified maximum. Unlike the previous algorithm, this algorithm requires that all-pairs shortest-paths is computed (instead of single-source shortest-path). The algorithm is as follows. A route length is randomly selected. In our algorithm below, we selected a number, Length, between Min and Max, where Min is equal to 6 and Max is equal to twice the number of nodes in the graph (2*20). Then, a coin is tossed to determine whether the random walk from the source or the random walk from the destination takes one random hop to an adjacent node. The coin may or may not be a fair coin. That is, it may be desirable to have more random walking close to the source node or close to the destination node. In either case the coin toss can easily reflect this preference. In the algorithm below, we use a fair coin to select which random walk proceeds. After an adjacent node is selected, the total number of hops including this new node plus the shortest-path to connect the two random walks is computed before the node is incorporated into the route. The above steps continue until the route length is reached or once the next random step exceeds the route length. If the later occurs, then we simply back up one hop (do not include the last random hop in the route). The pseudocode for this algorithm is below (for simplicity, storing the actual route in a data structure has not been included).


     LengthFromSrc   = 0;
     LengthFromDest  = 0;
     TotalNumberHops = 0;

     X = SRC;  /*Last Node Visited from Random walk starting at SRC;*/
     Y = DEST; /*Last Node Visited from Random walk starting at DEST;*/

     /* Randomly select a route length */
     do {
        Length = rand( ) % Max;
     while( Length < Min );


     while( TotalNumberHops < Length ) {           

          Next = Toss Coin to Pick Random Walk from Src or from Dest;
	  
	  if( Next == RandWalkFromSrc ) {
	  	  Z = Randomly select an adjacent node to X;
		  TotalNumberHops = 1 + LengthFromSrc + LengthFromDest
	                            + shortest-path from Z to Y; 
		  if( TotalNumberHops > Length )  
		      break;
		  X = Z;            /*include the node in the route*/
                  Store X in the route data structure 
	          LengthFromSrc++;
	  }
	  else {  /* Next = RandWalkFromDest */
	  	  Z = Randomly select an adjacent node to Y;
		  TotalNumberHops = 1 + LengthFromSrc + LengthFromDest
	                            + shortest-path from Z to X; 
		  if( TotalNumberHops > Length )  
		      break;
		  Y = Z;            
                  Store Y in the route data structure 
	          LengthFromDest++;
	  }
     }

The number of unique paths generated by this algorithm is approximately 13 times the length of routes generated by the Intermediate Node Algorithm. In many cases, the Random Walk Algorithm did not repeat any routes over 1000 executions for every source and destination pair. Due to the larger number of routes generated by this algorithm, we only displayed two communicating pairs for each graph.

We have a few noteworthy extensions and future experiments that we wish to investigate.

Modify the Intermediate Node algorithm to allow an arbitrary number of intermediate nodes.
Experiment with the frequency of regenerating edge weights.
Execute the above algorithms with other network topologies
Experiment with Min and Max boundaries

Historical page reflecting onion-router.net as of 2005, not regularly maintained.