IPv6 Multihoming/PI using Tunnel Endpoint Lookup
                      (pi-in-6)

Version 1.0 2005-10-17

Discussion: 

The simplicity and flexibility of 6to4 (rfc3056/3068) makes it tempting
to adapt it for multihoming/PI purposes in IPv6.  The idea is that
dynamic stateless tunnels could be used to isolate end site addresses
from the Default Free Zone (DFZ).

The  limitations of the 6to4 method are the inability to embed full IPv6
addresses inside IPv6 addresses and the desire to have PI based
addresses. To solve this we could use reverse DNS to look up tunnel
endpoints and assign unique /48 prefixes out of special "unroutable"
space.

Operation:

A unique /48 out of special "multihoming/PI" space is assigned to an
end site. For discussion let's say this space is in 3000::/16.
The corresponding section of 0.0.0.3.ip6.arpa is then delegated to the
end-site.

Outbound traffic from those addresses is not tunneled and needs no
special handling unless the destination is also in 3000:/16. The site
can use conventional bgp feeds, default routes, PPLB etc. for outbound
path selection.  No prefixes are inserted into the DFZ to facilitate
this. The source address would be from within the site's /48 in
3000::/16.

Packets sent to 3000::/16 addresses would be encapsulated in another
IPv6 packet (6in6) with a tunnel endpoint as the destination of the
outer header.  The tunnel endpoint is determined by a DNS lookup in
0.0.0.3.ip6.arpa for one or more "TEP" records. the TEP records are
set and modified by the end-site as desired.  Tunnel end points would
typically be addresses on the edge routers of the multihomed/PI site.
A device encapsulating a pi-in-6 packet should try to cache the TEP
record(s) for some minimum time (say five minutes) or the DNS TTL
whichever is longer.  Refresh of the TEP record(s) should attempted
befor they expire from the cache so packet flow is normally
uninterrupted.  Attempting a refresh every 60 seconds might be
reasonable.

Like 6to4 the encapsulation could theoretically occur anywhere on the
Internet but will be most efficient when done closest to the node
originating the packet, optimally at the node sending the packet itself.
3000::/16 could be anycasted to facilitate migration until this is
widely deployed. If the node doing the encapsulation has access to
routing information it may use that information to pick which TEP to
use but it is not expected to have a specific route for the TEP address.
Various methods of testing availability of a given TEP could be used but
it is ultimately the responsibility of the multihomed/PI end site to
make the TEP records reflect current availability/preference.

When the packet reaches the tunnel endpoint (or optionally any router
that has a /48 or longer route for the inner header destination address)
it is extracted and forwarded as a regular packet to the destination.
destination.

Example:

Site A is assigned 3000:0:A::/48 and assigns 3000:0:A::1 to a node
(node A). Site A also has two upstream ISP's and is assigned
2001:db8:1::/48 from one and 2001:db8:2::/48 from another.  On it's
site exit router it has loopback addresses of 2001:db8:1::1 and
2001:db8:2::1 assigned.

Site B is assigned 2001:db8:B::/48 and assigns 2001:db8:B::1 to a node
(node B).

For node A to send a packet to node B it simply creates a normal IPv6
packet with 3000:0:A::1 as the source and 2001:db8:B::1 as the
destination. No special handling or tunneling is necessary.  At node A's
site exit router(s) the decision of which path to use is made the same
way that those decisions are made in ipv4 today.  A site exit router may
have full BGP feeds from upstream ISP's though no prefixes are
advertised to those ISP's.

Fig. 1 Packets to normal addresses need no special handling:

               2001:db8:1::1
                 +--------+   
 3000:0:A::1   +-| router |-+                   2001:db8:B::1
  +--------+   | +--------+ |                    +--------+
  | node A |---+            +---- Internet ------| node B |
  +--------+   | +--------+ |        |           +--------+
               +-| router |-+        |
                 +--------+          |
               2001:db8:2::1         |
                                     |
                        3000:0:A::1->2001:db8:B::1

For node B to send a packet to node A, it also creates a normal IPv6
packet with 2001:db8:B::1 as the source and 3000:0:A::1 as the
destination.  The routing/forwarding system on that node or perhaps the
next hop router from that node sees that the destination is in 3000::/16
and encapsulates the packet inside another IPv6 packet.

To determine the destination address (tunnel endpoint) of the outer
header, the encapsulating node would perform a DNS lookup for "TEP"
records in the
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.A.0.0.0.0.0.0.0.0.0.0.3.ip6.arpa
zone.  In this example a TEP records for "2001:db8:1::1" and
"2001:db8:2::1" are found.   If both addresses are determined
(or assumed) to be reachable the destination address of the outer header
is set to one of the two at random.  We'll pick 2001:db8:1::1 for
example.  Any TEP records found are cached by the encapsulating node to
speed up future processing.

The source address of the outer header is set to any valid globally
unique unicast address on the node performing encapsulation.

The encapsulated packet is then forwarded to 2001:db8:1::1 where the
inner packet is  extracted and forwarded to 3000:0:A::1.

Fig. 2 If encapsulation is done on the originating node:

                                          2001:db8:B::1->3000:0:A::1
                                                        |
                                                        |
                                         2001:db8:1::1  | 
                                           +--------+   |
2001:db8:B::1                            +-| router |-+ | 3000:0:A::1
  +--------+                             | +--------+ | | +--------+
  | node B |------------------ Internet -+            +---| node A |
  +--------+                      |      | +--------+ |   +--------+
                                  |      +-| router |-+
                                  |        +--------+
                                  |      2001:db8:2::1
                                  |
       2001:db8:B::1->2001:db8:1::1[2001:db8:B::1->3000:0:A::1]
 or
       2001:db8:B::1->2001:db8:2::1[2001:db8:B::1->3000:0:A::1]


Fig. 3 If encapsulation is done on the next hop router:

 2001:db8:B::1->3000:0:A::1                2001:db8:B::1->3000:0:A::1
              |                                         |
              |                                         |
              |                          2001:db8:1::1  | 
              |                            +--------+   | 
2001:db8:B::1 | 2001:db8:B::2            +-| router |-+ | 3000:0:A::1
  +--------+  |  +--------+              | +--------+ | | +--------+
  | node B |-----| router |--- Internet -+            +---| node A |
  +--------+     +--------+       |      | +--------+ |   +--------+
                                  |      +-| router |-+
                                  |        +--------+
                                  |      2001:db8:2::1
                                  | 
       2001:db8:B::1->2001:db8:1::1[2001:db8:B::1->3000:0:A::1]
 or
       2001:db8:B::1->2001:db8:2::1[2001:db8:B::1->3000:0:A::1]

Problems:

Obviously there are concerns about these supposedly "unroutable"
prefixes leaking into the DFZ.  Since nobody has control over what
prefixes others accept or filter,  at least this method would allow
multihoming to work when those routes are (hopefully) filtered.

Then there is the issue of delay in looking up the tunnel endpoint and
resources consumed in lookup and caching of the results.  For typical
end user workstations this should not be a significant problem.
Servers communicating with large numbers of pi-in-6 clients may need
specialized hardware to optimize the lookup and encapsulation but it is
not unusual for such operatins to already use specialized hardware
(load balancing, caches etc) between the servers and clients.  At least
traffic from pi-in-6 addresses to non-pi-in-6 addresses can be sent
natively without any lookup or tunneling.

Conideration should be given to the impact this would have on the
global DNS system.

The suggestion of widely anycasting 3000::/16 is intended only as a 
transitional step.   The goal should be to have any node that generates
IPv6 packets to support lookup and encapsulation or at least have it
occur at the first hop.

Kevin Loch
kloch@hotnic.net