File:  [ELWIX - Embedded LightWeight unIX -] / embedaddon / bird / doc / prog-5.html
Revision 1.1.1.2 (vendor branch): download - view: text, annotated - select for diffs - revision graph
Wed Mar 17 19:50:23 2021 UTC (3 years, 9 months ago) by misho
Branches: bird, MAIN
CVS tags: v1_6_8p3, HEAD
bird 1.6.8

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 1.0.9">
 <TITLE>BIRD Programmer's Documentation: Protocols</TITLE>
 <LINK HREF="prog-6.html" REL=next>
 <LINK HREF="prog-4.html" REL=previous>
 <LINK HREF="prog.html#toc5" REL=contents>
</HEAD>
<BODY>
<A HREF="prog-6.html">Next</A>
<A HREF="prog-4.html">Previous</A>
<A HREF="prog.html#toc5">Contents</A>
<HR>
<H2><A NAME="s5">5.</A> <A HREF="prog.html#toc5">Protocols</A></H2>

<H2><A NAME="ss5.1">5.1</A> <A HREF="prog.html#toc5.1">The Babel protocol</A>
</H2>

<P>
<P>Babel (RFC6126) is a loop-avoiding distance-vector routing protocol that is
robust and efficient both in ordinary wired networks and in wireless mesh
networks.
<P>The Babel protocol keeps state for each neighbour in a <I>babel_neighbor</I>
struct, tracking received Hello and I Heard You (IHU) messages. A
<I>babel_interface</I> struct keeps hello and update times for each interface, and
a separate hello seqno is maintained for each interface.
<P>For each prefix, Babel keeps track of both the possible routes (with next hop
and router IDs), as well as the feasibility distance for each prefix and
router id. The prefix itself is tracked in a <I>babel_entry</I> struct, while the
possible routes for the prefix are tracked as <I>babel_route</I> entries and the
feasibility distance is maintained through <I>babel_source</I> structures.
<P>The main route selection is done in <B>babel_select_route()</B>. This is called when
an entry is updated by receiving updates from the network or when modified by
internal timers. It performs feasibility checks on the available routes for
the prefix and selects the one with the lowest metric to be announced to the
core.
<P>
<P><HR><H3>Function</H3>
<P><I>void</I>
<B>babel_announce_rte</B>
(<I>struct babel_proto *</I> <B>p</B>, <I>struct babel_entry *</I> <B>e</B>) --     announce selected route to the core
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct babel_proto *</I> <B>p</B><DD><P>Babel protocol instance
<DT><I>struct babel_entry *</I> <B>e</B><DD><P>Babel route entry to announce
</DL>
<H3>Description</H3>
<P>This function announces a Babel entry to the core if it has a selected
incoming path, and retracts it otherwise. If the selected entry has infinite
metric, the route is announced as unreachable.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_select_route</B>
(<I>struct babel_entry *</I> <B>e</B>) --     select best route for given route entry
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct babel_entry *</I> <B>e</B><DD><P>Babel entry to select the best route for
</DL>
<H3>Description</H3>
<P>Select the best feasible route for a given prefix among the routes received
from peers, and propagate it to the nest. This just selects the feasible
route with the lowest metric.
<P>If no feasible route is available for a prefix that previously had a route
selected, a seqno request is sent to try to get a valid route. In the
meantime, the route is marked as infeasible in the nest (to blackhole packets
going to it, as per the RFC).
<P>If no feasible route is available, and no previous route is selected, the
route is removed from the nest entirely.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_send_update</B>
(<I>struct babel_iface *</I> <B>ifa</B>, <I>bird_clock_t</I> <B>changed</B>) --     send route table updates
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct babel_iface *</I> <B>ifa</B><DD><P>Interface to transmit on
<DT><I>bird_clock_t</I> <B>changed</B><DD><P>Only send entries changed since this time
</DL>
<H3>Description</H3>
<P>This function produces update TLVs for all entries changed since the time
indicated by the <I>changed</I> parameter and queues them for transmission on the
selected interface. During the process, the feasibility distance for each
transmitted entry is updated.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_handle_update</B>
(<I>union babel_msg *</I> <B>m</B>, <I>struct babel_iface *</I> <B>ifa</B>) --     handle incoming route updates
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>union babel_msg *</I> <B>m</B><DD><P>Incoming update TLV
<DT><I>struct babel_iface *</I> <B>ifa</B><DD><P>Interface the update was received on
</DL>
<H3>Description</H3>
<P>This function is called as a handler for update TLVs and handles the updating
and maintenance of route entries in Babel's internal routing cache. The
handling follows the actions described in the Babel RFC, and at the end of
each update handling, <B>babel_select_route()</B> is called on the affected entry to
optionally update the selected routes and propagate them to the core.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_iface_timer</B>
(<I>timer *</I> <B>t</B>) --     Babel interface timer handler
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>t</B><DD><P>Timer
</DL>
<H3>Description</H3>
<P>This function is called by the per-interface timer and triggers sending of
periodic Hello's and both triggered and periodic updates. Periodic Hello's
and updates are simply handled by setting the next_{hello,regular} variables
on the interface, and triggering an update (and resetting the variable)
whenever 'now' exceeds that value.
<P>For triggered updates, <B>babel_trigger_iface_update()</B> will set the
want_triggered field on the interface to a timestamp value. If this is set
(and the next_triggered time has passed; this is a rate limiting mechanism),
<B>babel_send_update()</B> will be called with this timestamp as the second
parameter. This causes updates to be send consisting of only the routes that
have changed since the time saved in want_triggered.
<P>Mostly when an update is triggered, the route being modified will be set to
the value of 'now' at the time of the trigger; the &gt;= comparison for
selecting which routes to send in the update will make sure this is included.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_timer</B>
(<I>timer *</I> <B>t</B>) --     global timer hook
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>t</B><DD><P>Timer
</DL>
<H3>Description</H3>
<P>This function is called by the global protocol instance timer and handles
expiration of routes and neighbours as well as pruning of the seqno request
cache.


<HR><H3>Function</H3>
<P><I>uint</I>
<B>babel_write_queue</B>
(<I>struct babel_iface *</I> <B>ifa</B>, <I>list *</I> <B>queue</B>) --  Write a TLV queue to a transmission buffer
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct babel_iface *</I> <B>ifa</B><DD><P>Interface holding the transmission buffer
<DT><I>list *</I> <B>queue</B><DD><P>TLV queue to write (containing internal-format TLVs)
</DL>
<H3>Description</H3>
<P>This function writes a packet to the interface transmission buffer with as
many TLVs from the <I>queue</I> as will fit in the buffer. It returns the number of
bytes written (NOT counting the packet header). The function is called by
<B>babel_send_queue()</B> and <B>babel_send_unicast()</B> to construct packets for
transmission, and uses per-TLV helper functions to convert the
internal-format TLVs to their wire representations.
<P>The TLVs in the queue are freed after they are written to the buffer.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_send_unicast</B>
(<I>union babel_msg *</I> <B>msg</B>, <I>struct babel_iface *</I> <B>ifa</B>, <I>ip_addr</I> <B>dest</B>) --  send a single TLV via unicast to a destination
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>union babel_msg *</I> <B>msg</B><DD><P>TLV to send
<DT><I>struct babel_iface *</I> <B>ifa</B><DD><P>Interface to send via
<DT><I>ip_addr</I> <B>dest</B><DD><P>Destination of the TLV
</DL>
<H3>Description</H3>
<P>This function is used to send a single TLV via unicast to a designated
receiver. This is used for replying to certain incoming requests, and for
sending unicast requests to refresh routes before they expire.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_enqueue</B>
(<I>union babel_msg *</I> <B>msg</B>, <I>struct babel_iface *</I> <B>ifa</B>) --  enqueue a TLV for transmission on an interface
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>union babel_msg *</I> <B>msg</B><DD><P>TLV to enqueue (in internal TLV format)
<DT><I>struct babel_iface *</I> <B>ifa</B><DD><P>Interface to enqueue to
</DL>
<H3>Description</H3>
<P>This function is called to enqueue a TLV for subsequent transmission on an
interface. The transmission event is triggered whenever a TLV is enqueued;
this ensures that TLVs will be transmitted in a timely manner, but that TLVs
which are enqueued in rapid succession can be transmitted together in one
packet.


<HR><H3>Function</H3>
<P><I>void</I>
<B>babel_process_packet</B>
(<I>struct babel_pkt_header *</I> <B>pkt</B>, <I>int</I> <B>len</B>, <I>ip_addr</I> <B>saddr</B>, <I>struct babel_iface *</I> <B>ifa</B>) --  process incoming data packet
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct babel_pkt_header *</I> <B>pkt</B><DD><P>Pointer to the packet data
<DT><I>int</I> <B>len</B><DD><P>Length of received packet
<DT><I>ip_addr</I> <B>saddr</B><DD><P>Address of packet sender
<DT><I>struct babel_iface *</I> <B>ifa</B><DD><P>Interface packet was received on.
</DL>
<H3>Description</H3>
<P>This function is the main processing hook of incoming Babel packets. It
checks that the packet header is well-formed, then processes the TLVs
contained in the packet. This is done in two passes: First all TLVs are
parsed into the internal TLV format. If a TLV parser fails, processing of the
rest of the packet is aborted.
<P>After the parsing step, the TLV handlers are called for each parsed TLV in
order.

<H2><A NAME="ss5.2">5.2</A> <A HREF="prog.html#toc5.2">Bidirectional Forwarding Detection</A>
</H2>

<P>
<P>The BFD protocol is implemented in three files: <CODE>bfd.c</CODE> containing the
protocol logic and the protocol glue with BIRD core, <CODE>packets.c</CODE> handling BFD
packet processing, RX, TX and protocol sockets. <CODE>io.c</CODE> then contains generic
code for the event loop, threads and event sources (sockets, microsecond
timers). This generic code will be merged to the main BIRD I/O code in the
future.
<P>The BFD implementation uses a separate thread with an internal event loop for
handling the protocol logic, which requires high-res and low-latency timing,
so it is not affected by the rest of BIRD, which has several low-granularity
hooks in the main loop, uses second-based timers and cannot offer good
latency. The core of BFD protocol (the code related to BFD sessions,
interfaces and packets) runs in the BFD thread, while the rest (the code
related to BFD requests, BFD neighbors and the protocol glue) runs in the
main thread.
<P>BFD sessions are represented by structure <I>bfd_session</I> that contains a state
related to the session and two timers (TX timer for periodic packets and hold
timer for session timeout). These sessions are allocated from <B>session_slab</B>
and are accessible by two hash tables, <B>session_hash_id</B> (by session ID) and
<B>session_hash_ip</B> (by IP addresses of neighbors). Slab and both hashes are in
the main protocol structure <I>bfd_proto</I>. The protocol logic related to BFD
sessions is implemented in internal functions bfd_session_*(), which are
expected to be called from the context of BFD thread, and external functions
<B>bfd_add_session()</B>, <B>bfd_remove_session()</B> and <B>bfd_reconfigure_session()</B>, which
form an interface to the BFD core for the rest and are expected to be called
from the context of main thread.
<P>Each BFD session has an associated BFD interface, represented by structure
<I>bfd_iface</I>. A BFD interface contains a socket used for TX (the one for RX is
shared in <I>bfd_proto</I>), an interface configuration and reference counter.
Compared to interface structures of other protocols, these structures are not
created and removed based on interface notification events, but according to
the needs of BFD sessions. When a new session is created, it requests a
proper BFD interface by function <B>bfd_get_iface()</B>, which either finds an
existing one in <I>iface_list</I> (from <I>bfd_proto</I>) or allocates a new one. When a
session is removed, an associated iface is discharged by <B>bfd_free_iface()</B>.
<P>BFD requests are the external API for the other protocols. When a protocol
wants a BFD session, it calls <B>bfd_request_session()</B>, which creates a
structure <I>bfd_request</I> containing approprite information and an notify hook.
This structure is a resource associated with the caller's resource pool. When
a BFD protocol is available, a BFD request is submitted to the protocol, an
appropriate BFD session is found or created and the request is attached to
the session. When a session changes state, all attached requests (and related
protocols) are notified. Note that BFD requests do not depend on BFD protocol
running. When the BFD protocol is stopped or removed (or not available from
beginning), related BFD requests are stored in <B>bfd_wait_list</B>, where waits
for a new protocol.
<P>BFD neighbors are just a way to statically configure BFD sessions without
requests from other protocol. Structures <I>bfd_neighbor</I> are part of BFD
configuration (like static routes in the static protocol). BFD neighbors are
handled by BFD protocol like it is a BFD client -- when a BFD neighbor is
ready, the protocol just creates a BFD request like any other protocol.
<P>The protocol uses a new generic event loop (structure <I>birdloop</I>) from <CODE>io.c</CODE>,
which supports sockets, timers and events like the main loop. Timers
(structure <I>timer2</I>) are new microsecond based timers, while sockets and
events are the same. A birdloop is associated with a thread (field <B>thread</B>)
in which event hooks are executed. Most functions for setting event sources
(like <B>sk_start()</B> or <B>tm2_start()</B>) must be called from the context of that
thread. Birdloop allows to temporarily acquire the context of that thread for
the main thread by calling <B>birdloop_enter()</B> and then <B>birdloop_leave()</B>, which
also ensures mutual exclusion with all event hooks. Note that resources
associated with a birdloop (like timers) should be attached to the
independent resource pool, detached from the main resource tree.
<P>There are two kinds of interaction between the BFD core (running in the BFD
thread) and the rest of BFD (running in the main thread). The first kind are
configuration calls from main thread to the BFD thread (like <B>bfd_add_session()</B>).
These calls are synchronous and use <B>birdloop_enter()</B> mechanism for mutual
exclusion. The second kind is a notification about session changes from the
BFD thread to the main thread. This is done in an asynchronous way, sesions
with pending notifications are linked (in the BFD thread) to <B>notify_list</B> in
<I>bfd_proto</I>, and then <B>bfd_notify_hook()</B> in the main thread is activated using
<B>bfd_notify_kick()</B> and a pipe. The hook then processes scheduled sessions and
calls hooks from associated BFD requests. This <B>notify_list</B> (and state fields
in structure <I>bfd_session</I>) is protected by a spinlock in <I>bfd_proto</I> and
functions <B>bfd_lock_sessions()</B> / <B>bfd_unlock_sessions()</B>.
<P>There are few data races (accessing <B>p</B>-&gt;p.debug from <B>TRACE()</B> from the BFD
thread and accessing some some private fields of <I>bfd_session</I> from
<B>bfd_show_sessions()</B> from the main thread, but these are harmless (i hope).
<P>TODO: document functions and access restrictions for fields in BFD structures.
<P>Supported standards:
- RFC 5880 - main BFD standard
- RFC 5881 - BFD for IP links
- RFC 5882 - generic application of BFD
- RFC 5883 - BFD for multihop paths
<P>
<P>
<H2><A NAME="ss5.3">5.3</A> <A HREF="prog.html#toc5.3">Border Gateway Protocol</A>
</H2>

<P>
<P>The BGP protocol is implemented in three parts: <CODE>bgp.c</CODE> which takes care of the
connection and most of the interface with BIRD core, <CODE>packets.c</CODE> handling
both incoming and outgoing BGP packets and <CODE>attrs.c</CODE> containing functions for
manipulation with BGP attribute lists.
<P>As opposed to the other existing routing daemons, BIRD has a sophisticated core
architecture which is able to keep all the information needed by BGP in the
primary routing table, therefore no complex data structures like a central
BGP table are needed. This increases memory footprint of a BGP router with
many connections, but not too much and, which is more important, it makes
BGP much easier to implement.
<P>Each instance of BGP (corresponding to a single BGP peer) is described by a <I>bgp_proto</I>
structure to which are attached individual connections represented by <I>bgp_connection</I>
(usually, there exists only one connection, but during BGP session setup, there
can be more of them). The connections are handled according to the BGP state machine
defined in the RFC with all the timers and all the parameters configurable.
<P>In incoming direction, we listen on the connection's socket and each time we receive
some input, we pass it to <B>bgp_rx()</B>. It decodes packet headers and the markers and
passes complete packets to <B>bgp_rx_packet()</B> which distributes the packet according
to its type.
<P>In outgoing direction, we gather all the routing updates and sort them to buckets
(<I>bgp_bucket</I>) according to their attributes (we keep a hash table for fast comparison
of <I>rta</I>'s and a <I>fib</I> which helps us to find if we already have another route for
the same destination queued for sending, so that we can replace it with the new one
immediately instead of sending both updates). There also exists a special bucket holding
all the route withdrawals which cannot be queued anywhere else as they don't have any
attributes. If we have any packet to send (due to either new routes or the connection
tracking code wanting to send a Open, Keepalive or Notification message), we call
<B>bgp_schedule_packet()</B> which sets the corresponding bit in a <B>packet_to_send</B>
bit field in <I>bgp_conn</I> and as soon as the transmit socket buffer becomes empty,
we call <B>bgp_fire_tx()</B>. It inspects state of all the packet type bits and calls
the corresponding <B>bgp_create_xx()</B> functions, eventually rescheduling the same packet
type if we have more data of the same type to send.
<P>The processing of attributes consists of two functions: <B>bgp_decode_attrs()</B> for checking
of the attribute blocks and translating them to the language of BIRD's extended attributes
and <B>bgp_encode_attrs()</B> which does the converse. Both functions are built around a
<B>bgp_attr_table</B> array describing all important characteristics of all known attributes.
Unknown transitive attributes are attached to the route as <I>EAF_TYPE_OPAQUE</I> byte streams.
<P>BGP protocol implements graceful restart in both restarting (local restart)
and receiving (neighbor restart) roles. The first is handled mostly by the
graceful restart code in the nest, BGP protocol just handles capabilities,
sets <B>gr_wait</B> and locks graceful restart until end-of-RIB mark is received.
The second is implemented by internal restart of the BGP state to <I>BS_IDLE</I>
and protocol state to <I>PS_START</I>, but keeping the protocol up from the core
point of view and therefore maintaining received routes. Routing table
refresh cycle (<B>rt_refresh_begin()</B>, <B>rt_refresh_end()</B>) is used for removing
stale routes after reestablishment of BGP session during graceful restart.
<P>
<P><HR><H3>Function</H3>
<P><I>int</I>
<B>bgp_open</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     open a BGP instance
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>This function allocates and configures shared BGP resources.
Should be called as the last step during initialization
(when lock is acquired and neighbor is ready).
When error, state changed to PS_DOWN, -1 is returned and caller
should return immediately.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_close</B>
(<I>struct bgp_proto *</I> <B>p</B>, <I>int</I> <B>apply_md5</B>) --     close a BGP instance
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
<DT><I>int</I> <B>apply_md5</B><DD><P>0 to disable unsetting MD5 auth
</DL>
<H3>Description</H3>
<P>This function frees and deconfigures shared BGP resources.
<B>apply_md5</B> is set to 0 when bgp_close is called as a cleanup
from failed <B>bgp_open()</B>.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_start_timer</B>
(<I>timer *</I> <B>t</B>, <I>int</I> <B>value</B>) --     start a BGP timer
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>t</B><DD><P>timer
<DT><I>int</I> <B>value</B><DD><P>time to fire (0 to disable the timer)
</DL>
<H3>Description</H3>
<P>This functions calls <B>tm_start()</B> on <B>t</B> with time <B>value</B> and the
amount of randomization suggested by the BGP standard. Please use
it for all BGP timers.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_close_conn</B>
(<I>struct bgp_conn *</I> <B>conn</B>) --     close a BGP connection
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_conn *</I> <B>conn</B><DD><P>connection to close
</DL>
<H3>Description</H3>
<P>This function takes a connection described by the <I>bgp_conn</I> structure,
closes its socket and frees all resources associated with it.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_update_startup_delay</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     update a startup delay
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>This function updates a startup delay that is used to postpone next BGP connect.
It also handles disable_after_error and might stop BGP instance when error
happened and disable_after_error is on.
<P>It should be called when BGP protocol error happened.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_handle_graceful_restart</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     handle detected BGP graceful restart
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>This function is called when a BGP graceful restart of the neighbor is
detected (when the TCP connection fails or when a new TCP connection
appears). The function activates processing of the restart - starts routing
table refresh cycle and activates BGP restart timer. The protocol state goes
back to <I>PS_START</I>, but changing BGP state back to <I>BS_IDLE</I> is left for the
caller.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_graceful_restart_done</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     finish active BGP graceful restart
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>This function is called when the active BGP graceful restart of the neighbor
should be finished - either successfully (the neighbor sends all paths and
reports end-of-RIB on the new session) or unsuccessfully (the neighbor does
not support BGP graceful restart on the new session). The function ends
routing table refresh cycle and stops BGP restart timer.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_graceful_restart_timeout</B>
(<I>timer *</I> <B>t</B>) --     timeout of graceful restart 'restart timer'
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>t</B><DD><P>timer
</DL>
<H3>Description</H3>
<P>This function is a timeout hook for <B>gr_timer</B>, implementing BGP restart time
limit for reestablisment of the BGP session after the graceful restart. When
fired, we just proceed with the usual protocol restart.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_refresh_begin</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     start incoming enhanced route refresh sequence
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>This function is called when an incoming enhanced route refresh sequence is
started by the neighbor, demarcated by the BoRR packet. The function updates
the load state and starts the routing table refresh cycle. Note that graceful
restart also uses routing table refresh cycle, but RFC 7313 and load states
ensure that these two sequences do not overlap.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_refresh_end</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     finish incoming enhanced route refresh sequence
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>This function is called when an incoming enhanced route refresh sequence is
finished by the neighbor, demarcated by the EoRR packet. The function updates
the load state and ends the routing table refresh cycle. Routes not received
during the sequence are removed by the nest.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_connect</B>
(<I>struct bgp_proto *</I> <B>p</B>) --     initiate an outgoing connection
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
</DL>
<H3>Description</H3>
<P>The <B>bgp_connect()</B> function creates a new <I>bgp_conn</I> and initiates
a TCP connection to the peer. The rest of connection setup is governed
by the BGP state machine as described in the standard.


<HR><H3>Function</H3>
<P><I>struct bgp_proto *</I>
<B>bgp_find_proto</B>
(<I>sock *</I> <B>sk</B>) --     find existing proto for incoming connection
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>sock *</I> <B>sk</B><DD><P>TCP socket
</DL>


<HR><H3>Function</H3>
<P><I>int</I>
<B>bgp_incoming_connection</B>
(<I>sock *</I> <B>sk</B>, <I>uint dummy</I> <B>UNUSED</B>) --     handle an incoming connection
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>sock *</I> <B>sk</B><DD><P>TCP socket
<DT><I>uint dummy</I> <B>UNUSED</B><DD><P>-- undescribed --
</DL>
<H3>Description</H3>
<P>This function serves as a socket hook for accepting of new BGP
connections. It searches a BGP instance corresponding to the peer
which has connected and if such an instance exists, it creates a
<I>bgp_conn</I> structure, attaches it to the instance and either sends
an Open message or (if there already is an active connection) it
closes the new connection by sending a Notification message.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_error</B>
(<I>struct bgp_conn *</I> <B>c</B>, <I>unsigned</I> <B>code</B>, <I>unsigned</I> <B>subcode</B>, <I>byte *</I> <B>data</B>, <I>int</I> <B>len</B>) --     report a protocol error
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_conn *</I> <B>c</B><DD><P>connection
<DT><I>unsigned</I> <B>code</B><DD><P>error code (according to the RFC)
<DT><I>unsigned</I> <B>subcode</B><DD><P>error sub-code
<DT><I>byte *</I> <B>data</B><DD><P>data to be passed in the Notification message
<DT><I>int</I> <B>len</B><DD><P>length of the data
</DL>
<H3>Description</H3>
<P><B>bgp_error()</B> sends a notification packet to tell the other side that a protocol
error has occurred (including the data considered erroneous if possible) and
closes the connection.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_store_error</B>
(<I>struct bgp_proto *</I> <B>p</B>, <I>struct bgp_conn *</I> <B>c</B>, <I>u8</I> <B>class</B>, <I>u32</I> <B>code</B>) --     store last error for status report
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance
<DT><I>struct bgp_conn *</I> <B>c</B><DD><P>connection
<DT><I>u8</I> <B>class</B><DD><P>error class (BE_xxx constants)
<DT><I>u32</I> <B>code</B><DD><P>error code (class specific)
</DL>
<H3>Description</H3>
<P><B>bgp_store_error()</B> decides whether given error is interesting enough
and store that error to last_error variables of <B>p</B>


<HR><H3>Function</H3>
<P><I>int</I>
<B>bgp_fire_tx</B>
(<I>struct bgp_conn *</I> <B>conn</B>) --  transmit packets
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_conn *</I> <B>conn</B><DD><P>connection
</DL>
<H3>Description</H3>
<P>Whenever the transmit buffers of the underlying TCP connection
are free and we have any packets queued for sending, the socket functions
call <B>bgp_fire_tx()</B> which takes care of selecting the highest priority packet
queued (Notification &gt; Keepalive &gt; Open &gt; Update), assembling its header
and body and sending it to the connection.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_schedule_packet</B>
(<I>struct bgp_conn *</I> <B>conn</B>, <I>int</I> <B>type</B>) --  schedule a packet for transmission
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_conn *</I> <B>conn</B><DD><P>connection
<DT><I>int</I> <B>type</B><DD><P>packet type
</DL>
<H3>Description</H3>
<P>Schedule a packet of type <B>type</B> to be sent as soon as possible.


<HR><H3>Function</H3>
<P><I>const char *</I>
<B>bgp_error_dsc</B>
(<I>unsigned</I> <B>code</B>, <I>unsigned</I> <B>subcode</B>) --  return BGP error description
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>unsigned</I> <B>code</B><DD><P>BGP error code
<DT><I>unsigned</I> <B>subcode</B><DD><P>BGP error subcode
</DL>
<H3>Description</H3>
<P><B>bgp_error_dsc()</B> returns error description for BGP errors
which might be static string or given temporary buffer.


<HR><H3>Function</H3>
<P><I>void</I>
<B>bgp_rx_packet</B>
(<I>struct bgp_conn *</I> <B>conn</B>, <I>byte *</I> <B>pkt</B>, <I>unsigned</I> <B>len</B>) --  handle a received packet
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_conn *</I> <B>conn</B><DD><P>BGP connection
<DT><I>byte *</I> <B>pkt</B><DD><P>start of the packet
<DT><I>unsigned</I> <B>len</B><DD><P>packet size
</DL>
<H3>Description</H3>
<P><B>bgp_rx_packet()</B> takes a newly received packet and calls the corresponding
packet handler according to the packet type.


<HR><H3>Function</H3>
<P><I>int</I>
<B>bgp_rx</B>
(<I>sock *</I> <B>sk</B>, <I>uint</I> <B>size</B>) --  handle received data
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>sock *</I> <B>sk</B><DD><P>socket
<DT><I>uint</I> <B>size</B><DD><P>amount of data received
</DL>
<H3>Description</H3>
<P><B>bgp_rx()</B> is called by the socket layer whenever new data arrive from
the underlying TCP connection. It assembles the data fragments to packets,
checks their headers and framing and passes complete packets to
<B>bgp_rx_packet()</B>.


<HR><H3>Function</H3>
<P><I>uint</I>
<B>bgp_encode_attrs</B>
(<I>struct bgp_proto *</I> <B>p</B>, <I>byte *</I> <B>w</B>, <I>ea_list *</I> <B>attrs</B>, <I>int</I> <B>remains</B>) --  encode BGP attributes
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_proto *</I> <B>p</B><DD><P>BGP instance (or NULL)
<DT><I>byte *</I> <B>w</B><DD><P>buffer
<DT><I>ea_list *</I> <B>attrs</B><DD><P>a list of extended attributes
<DT><I>int</I> <B>remains</B><DD><P>remaining space in the buffer
</DL>
<H3>Description</H3>
<P>The <B>bgp_encode_attrs()</B> function takes a list of extended attributes
and converts it to its BGP representation (a part of an Update message).
<H3>Result</H3>
<P>Length of the attribute block generated or -1 if not enough space.


<HR><H3>Function</H3>
<P><I>struct rta *</I>
<B>bgp_decode_attrs</B>
(<I>struct bgp_conn *</I> <B>conn</B>, <I>byte *</I> <B>attr</B>, <I>uint</I> <B>len</B>, <I>struct linpool *</I> <B>pool</B>, <I>int</I> <B>mandatory</B>) --  check and decode BGP attributes
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct bgp_conn *</I> <B>conn</B><DD><P>connection
<DT><I>byte *</I> <B>attr</B><DD><P>start of attribute block
<DT><I>uint</I> <B>len</B><DD><P>length of attribute block
<DT><I>struct linpool *</I> <B>pool</B><DD><P>linear pool to make all the allocations in
<DT><I>int</I> <B>mandatory</B><DD><P>1 iff presence of mandatory attributes has to be checked
</DL>
<H3>Description</H3>
<P>This function takes a BGP attribute block (a part of an Update message), checks
its consistency and converts it to a list of BIRD route attributes represented
by a <I>rta</I>.

<H2><A NAME="ss5.4">5.4</A> <A HREF="prog.html#toc5.4">Multi-Threaded Routing Toolkit (MRT) protocol</A>
</H2>

<P>
<P>The MRT protocol is implemented in just one file: <CODE>mrt.c</CODE>. It contains of
several parts: Generic functions for preparing MRT messages in a buffer,
functions for MRT table dump (called from timer or CLI), functions for MRT
BGP4MP dump (called from BGP), and the usual protocol glue. For the MRT table
dump, the key structure is struct mrt_table_dump_state, which contains all
necessary data and created when the MRT dump cycle is started for the
duration of the MRT dump. The MBGP4MP dump is currently not bound to MRT
protocol instance and uses the config-&gt;mrtdump_file fd.
<P>The protocol is simple, just periodically scans routing table and export it
to a file. It does not use the regular update mechanism, but a direct access
in order to handle iteration through multiple routing tables. The table dump
needs to dump all peers first and then use indexes to address the peers, we
use a hash table (<B>peer_hash</B>) to find peer index based on BGP protocol key
attributes.
<P>One thing worth documenting is the locking. During processing, the currently
processed table (<B>table</B> field in the state structure) is locked and also the
explicitly named table is locked (<B>table_ptr</B> field in the state structure) if
specified. Between dumps no table is locked. Also the current config is
locked (by <B>config_add_obstacle()</B>) during table dumps as some data (strings,
filters) are shared from the config and the running table dump may be
interrupted by reconfiguration.
<P>Supported standards:
- RFC 6396 - MRT format standard
- RFC 8050 - ADD_PATH extension
<P>
<P>
<H2><A NAME="ss5.5">5.5</A> <A HREF="prog.html#toc5.5">Open Shortest Path First (OSPF)</A>
</H2>

<P>
<P>The OSPF protocol is quite complicated and its complex implemenation is split
to many files. In <CODE>ospf.c</CODE>, you will find mainly the interface for
communication with the core (e.g., reconfiguration hooks, shutdown and
initialisation and so on). File <CODE>iface.c</CODE> contains the interface state
machine and functions for allocation and deallocation of OSPF's interface
data structures. Source <CODE>neighbor.c</CODE> includes the neighbor state machine and
functions for election of Designated Router and Backup Designated router. In
<CODE>packet.c</CODE>, you will find various functions for sending and receiving generic
OSPF packets. There are also routines for authentication and checksumming.
In <CODE>hello.c</CODE>, there are routines for sending and receiving of hello packets
as well as functions for maintaining wait times and the inactivity timer.
Files <CODE>lsreq.c</CODE>, <CODE>lsack.c</CODE>, <CODE>dbdes.c</CODE> contain functions for sending and
receiving of link-state requests, link-state acknowledgements and database
descriptions respectively.  In <CODE>lsupd.c</CODE>, there are functions for sending and
receiving of link-state updates and also the flooding algorithm. Source
<CODE>topology.c</CODE> is a place where routines for searching LSAs in the link-state
database, adding and deleting them reside, there also are functions for
originating of various types of LSAs (router LSA, net LSA, external LSA).
File <CODE>rt.c</CODE> contains routines for calculating the routing table. <CODE>lsalib.c</CODE>
is a set of various functions for working with the LSAs (endianity
conversions, calculation of checksum etc.).
<P>One instance of the protocol is able to hold LSA databases for multiple OSPF
areas, to exchange routing information between multiple neighbors and to
calculate the routing tables. The core structure is <I>ospf_proto</I> to which
multiple <I>ospf_area</I> and <I>ospf_iface</I> structures are connected. <I>ospf_proto</I> is
also connected to <I>top_hash_graph</I> which is a dynamic hashing structure that
describes the link-state database. It allows fast search, addition and
deletion. Each LSA is kept in two pieces: header and body. Both of them are
kept in the endianity of the CPU.
<P>In OSPFv2 specification, it is implied that there is one IP prefix for each
physical network/interface (unless it is an ptp link). But in modern systems,
there might be more independent IP prefixes associated with an interface.  To
handle this situation, we have one <I>ospf_iface</I> for each active IP prefix
(instead for each active iface); This behaves like virtual interface for the
purpose of OSPF.  If we receive packet, we associate it with a proper virtual
interface mainly according to its source address.
<P>OSPF keeps one socket per <I>ospf_iface</I>. This allows us (compared to one socket
approach) to evade problems with a limit of multicast groups per socket and
with sending multicast packets to appropriate interface in a portable way.
The socket is associated with underlying physical iface and should not
receive packets received on other ifaces (unfortunately, this is not true on
BSD). Generally, one packet can be received by more sockets (for example, if
there are more <I>ospf_iface</I> on one physical iface), therefore we explicitly
filter received packets according to src/dst IP address and received iface.
<P>Vlinks are implemented using particularly degenerate form of <I>ospf_iface</I>,
which has several exceptions: it does not have its iface or socket (it copies
these from 'parent' <I>ospf_iface</I>) and it is present in iface list even when
down (it is not freed in <B>ospf_iface_down()</B>).
<P>The heart beat of ospf is <B>ospf_disp()</B>. It is called at regular intervals
(<I>ospf_proto</I>-&gt;tick). It is responsible for aging and flushing of LSAs in the
database, updating topology information in LSAs and for routing table
calculation.
<P>To every <I>ospf_iface</I>, we connect one or more <I>ospf_neighbor</I>'s -- a structure
containing many timers and queues for building adjacency and for exchange of
routing messages.
<P>BIRD's OSPF implementation respects RFC2328 in every detail, but some of
internal algorithms do differ. The RFC recommends making a snapshot of the
link-state database when a new adjacency is forming and sending the database
description packets based on the information in this snapshot. The database
can be quite large in some networks, so rather we walk through a <I>slist</I>
structure which allows us to continue even if the actual LSA we were working
with is deleted. New LSAs are added at the tail of this <I>slist</I>.
<P>We also do not keep a separate OSPF routing table, because the core helps us
by being able to recognize when a route is updated to an identical one and it
suppresses the update automatically. Due to this, we can flush all the routes
we have recalculated and also those we have deleted to the core's routing
table and the core will take care of the rest. This simplifies the process
and conserves memory.
<P>Supported standards:
- RFC 2328 - main OSPFv2 standard
- RFC 5340 - main OSPFv3 standard
- RFC 3101 - OSPFv2 NSSA areas
- RFC 6549 - OSPFv2 multi-instance extensions
- RFC 6987 - OSPF stub router advertisement
<P>
<P><HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_disp</B>
(<I>timer *</I> <B>timer</B>) --     invokes routing table calculation, aging and also <B>area_disp()</B>
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>timer</B><DD><P>timer usually called every <B>ospf_proto</B>-&gt;tick second, <B>timer</B>-&gt;data
point to <B>ospf_proto</B>
</DL>


<HR><H3>Function</H3>
<P><I>int</I>
<B>ospf_import_control</B>
(<I>struct proto *</I> <B>P</B>, <I>rte **</I> <B>new</B>, <I>ea_list **</I> <B>attrs</B>, <I>struct linpool *</I> <B>pool</B>) --     accept or reject new route from nest's routing table
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct proto *</I> <B>P</B><DD><P>OSPF protocol instance
<DT><I>rte **</I> <B>new</B><DD><P>the new route
<DT><I>ea_list **</I> <B>attrs</B><DD><P>list of attributes
<DT><I>struct linpool *</I> <B>pool</B><DD><P>pool for allocation of attributes
</DL>
<H3>Description</H3>
<P>Its quite simple. It does not accept our own routes and leaves the decision on
import to the filters.


<HR><H3>Function</H3>
<P><I>int</I>
<B>ospf_shutdown</B>
(<I>struct proto *</I> <B>P</B>) --     Finish of OSPF instance
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct proto *</I> <B>P</B><DD><P>OSPF protocol instance
</DL>
<H3>Description</H3>
<P>RFC does not define any action that should be taken before router
shutdown. To make my neighbors react as fast as possible, I send
them hello packet with empty neighbor list. They should start
their neighbor state machine with event <I>NEIGHBOR_1WAY</I>.


<HR><H3>Function</H3>
<P><I>int</I>
<B>ospf_reconfigure</B>
(<I>struct proto *</I> <B>P</B>, <I>struct proto_config *</I> <B>c</B>) --     reconfiguration hook
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct proto *</I> <B>P</B><DD><P>current instance of protocol (with old configuration)
<DT><I>struct proto_config *</I> <B>c</B><DD><P>new configuration requested by user
</DL>
<H3>Description</H3>
<P>This hook tries to be a little bit intelligent. Instance of OSPF
will survive change of many constants like hello interval,
password change, addition or deletion of some neighbor on
nonbroadcast network, cost of interface, etc.


<HR><H3>Function</H3>
<P><I>struct top_hash_entry *</I>
<B>ospf_install_lsa</B>
(<I>struct ospf_proto *</I> <B>p</B>, <I>struct ospf_lsa_header *</I> <B>lsa</B>, <I>u32</I> <B>type</B>, <I>u32</I> <B>domain</B>, <I>void *</I> <B>body</B>) --  install new LSA into database
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
<DT><I>struct ospf_lsa_header *</I> <B>lsa</B><DD><P>LSA header
<DT><I>u32</I> <B>type</B><DD><P>type of LSA
<DT><I>u32</I> <B>domain</B><DD><P>domain of LSA
<DT><I>void *</I> <B>body</B><DD><P>pointer to LSA body
</DL>
<H3>Description</H3>
<P>This function ensures installing new LSA received in LS update into LSA
database. Old instance is replaced. Several actions are taken to detect if
new routing table calculation is necessary. This is described in 13.2 of RFC
2328. This function is for received LSA only, locally originated LSAs are
installed by <B>ospf_originate_lsa()</B>.
<P>The LSA body in <B>body</B> is expected to be mb_allocated by the caller and its
ownership is transferred to the LSA entry structure.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_advance_lsa</B>
(<I>struct ospf_proto *</I> <B>p</B>, <I>struct top_hash_entry *</I> <B>en</B>, <I>struct ospf_lsa_header *</I> <B>lsa</B>, <I>u32</I> <B>type</B>, <I>u32</I> <B>domain</B>, <I>void *</I> <B>body</B>) --  handle received unexpected self-originated LSA
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
<DT><I>struct top_hash_entry *</I> <B>en</B><DD><P>current LSA entry or NULL
<DT><I>struct ospf_lsa_header *</I> <B>lsa</B><DD><P>new LSA header
<DT><I>u32</I> <B>type</B><DD><P>type of LSA
<DT><I>u32</I> <B>domain</B><DD><P>domain of LSA
<DT><I>void *</I> <B>body</B><DD><P>pointer to LSA body
</DL>
<H3>Description</H3>
<P>This function handles received unexpected self-originated LSA (<B>lsa</B>, <B>body</B>)
by either advancing sequence number of the local LSA instance (<B>en</B>) and
propagating it, or installing the received LSA and immediately flushing it
(if there is no local LSA; i.e., <B>en</B> is NULL or MaxAge).
<P>The LSA body in <B>body</B> is expected to be mb_allocated by the caller and its
ownership is transferred to the LSA entry structure or it is freed.


<HR><H3>Function</H3>
<P><I>struct top_hash_entry *</I>
<B>ospf_originate_lsa</B>
(<I>struct ospf_proto *</I> <B>p</B>, <I>struct ospf_new_lsa *</I> <B>lsa</B>) --  originate new LSA
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
<DT><I>struct ospf_new_lsa *</I> <B>lsa</B><DD><P>New LSA specification
</DL>
<H3>Description</H3>
<P>This function prepares a new LSA, installs it into the LSA database and
floods it. If the new LSA cannot be originated now (because the old instance
was originated within MinLSInterval, or because the LSA seqnum is currently
wrapping), the origination is instead scheduled for later. If the new LSA is
equivalent to the current LSA, the origination is skipped. In all cases, the
corresponding LSA entry is returned. The new LSA is based on the LSA
specification (<B>lsa</B>) and the LSA body from lsab buffer of <B>p</B>, which is
emptied after the call. The opposite of this function is <B>ospf_flush_lsa()</B>.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_flush_lsa</B>
(<I>struct ospf_proto *</I> <B>p</B>, <I>struct top_hash_entry *</I> <B>en</B>) --  flush LSA from OSPF domain
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
<DT><I>struct top_hash_entry *</I> <B>en</B><DD><P>LSA entry to flush
</DL>
<H3>Description</H3>
<P>This function flushes <B>en</B> from the OSPF domain by setting its age to
<I>LSA_MAXAGE</I> and flooding it. That also triggers subsequent events in LSA
lifecycle leading to removal of the LSA from the LSA database (e.g. the LSA
content is freed when flushing is acknowledged by neighbors). The function
does nothing if the LSA is already being flushed. LSA entries are not
immediately removed when being flushed, the caller may assume that <B>en</B> still
exists after the call. The function is the opposite of <B>ospf_originate_lsa()</B>
and is supposed to do the right thing even in cases of postponed
origination.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_update_lsadb</B>
(<I>struct ospf_proto *</I> <B>p</B>) --  update LSA database
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
</DL>
<H3>Description</H3>
<P>This function is periodicaly invoked from <B>ospf_disp()</B>. It does some periodic
or postponed processing related to LSA entries. It originates postponed LSAs
scheduled by <B>ospf_originate_lsa()</B>, It continues in flushing processes started
by <B>ospf_flush_lsa()</B>. It also periodically refreshs locally originated LSAs --
when the current instance is older <I>LSREFRESHTIME</I>, a new instance is originated.
Finally, it also ages stored LSAs and flushes ones that reached <I>LSA_MAXAGE</I>.
<P>The RFC 2328 says that a router should periodically check checksums of all
stored LSAs to detect hardware problems. This is not implemented.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_originate_ext_lsa</B>
(<I>struct ospf_proto *</I> <B>p</B>, <I>struct ospf_area *</I> <B>oa</B>, <I>ort *</I> <B>nf</B>, <I>u8</I> <B>mode</B>, <I>u32</I> <B>metric</B>, <I>u32</I> <B>ebit</B>, <I>ip_addr</I> <B>fwaddr</B>, <I>u32</I> <B>tag</B>, <I>int</I> <B>pbit</B>) --  new route received from nest and filters
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
<DT><I>struct ospf_area *</I> <B>oa</B><DD><P>ospf_area for which LSA is originated
<DT><I>ort *</I> <B>nf</B><DD><P>network prefix and mask
<DT><I>u8</I> <B>mode</B><DD><P>the mode of the LSA (LSA_M_EXPORT or LSA_M_RTCALC)
<DT><I>u32</I> <B>metric</B><DD><P>the metric of a route
<DT><I>u32</I> <B>ebit</B><DD><P>E-bit for route metric (bool)
<DT><I>ip_addr</I> <B>fwaddr</B><DD><P>the forwarding address
<DT><I>u32</I> <B>tag</B><DD><P>the route tag
<DT><I>int</I> <B>pbit</B><DD><P>P-bit for NSSA LSAs (bool), ignored for external LSAs
</DL>
<H3>Description</H3>
<P>If I receive a message that new route is installed, I try to originate an
external LSA. If <B>oa</B> is an NSSA area, NSSA-LSA is originated instead.
<B>oa</B> should not be a stub area. <B>src</B> does not specify whether the LSA
is external or NSSA, but it specifies the source of origination -
the export from <B>ospf_rt_notify()</B>, or the NSSA-EXT translation.


<HR><H3>Function</H3>
<P><I>struct top_graph *</I>
<B>ospf_top_new</B>
(<I>struct ospf_proto *p UNUSED4</I> <B>UNUSED6</B>, <I>pool *</I> <B>pool</B>) --  allocated new topology database
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *p UNUSED4</I> <B>UNUSED6</B><DD><P>-- undescribed --
<DT><I>pool *</I> <B>pool</B><DD><P>pool for allocation
</DL>
<H3>Description</H3>
<P>This dynamically hashed structure is used for keeping LSAs. Mainly it is used
for the LSA database of the OSPF protocol, but also for LSA retransmission
and request lists of OSPF neighbors.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_neigh_chstate</B>
(<I>struct ospf_neighbor *</I> <B>n</B>, <I>u8</I> <B>state</B>) --  handles changes related to new or lod state of neighbor
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_neighbor *</I> <B>n</B><DD><P>OSPF neighbor
<DT><I>u8</I> <B>state</B><DD><P>new state
</DL>
<H3>Description</H3>
<P>Many actions have to be taken acording to a change of state of a neighbor. It
starts rxmt timers, call interface state machine etc.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_neigh_sm</B>
(<I>struct ospf_neighbor *</I> <B>n</B>, <I>int</I> <B>event</B>) --  ospf neighbor state machine
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_neighbor *</I> <B>n</B><DD><P>neighor
<DT><I>int</I> <B>event</B><DD><P>actual event
</DL>
<H3>Description</H3>
<P>This part implements the neighbor state machine as described in 10.3 of
RFC 2328. The only difference is that state <I>NEIGHBOR_ATTEMPT</I> is not
used. We discover neighbors on nonbroadcast networks in the
same way as on broadcast networks. The only difference is in
sending hello packets. These are sent to IPs listed in
<B>ospf_iface</B>-&gt;nbma_list .


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_dr_election</B>
(<I>struct ospf_iface *</I> <B>ifa</B>) --  (Backup) Designed Router election
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_iface *</I> <B>ifa</B><DD><P>actual interface
</DL>
<H3>Description</H3>
<P>When the wait timer fires, it is time to elect (Backup) Designated Router.
Structure describing me is added to this list so every electing router has
the same list. Backup Designated Router is elected before Designated
Router. This process is described in 9.4 of RFC 2328. The function is
supposed to be called only from <B>ospf_iface_sm()</B> as a part of the interface
state machine.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_iface_chstate</B>
(<I>struct ospf_iface *</I> <B>ifa</B>, <I>u8</I> <B>state</B>) --  handle changes of interface state
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_iface *</I> <B>ifa</B><DD><P>OSPF interface
<DT><I>u8</I> <B>state</B><DD><P>new state
</DL>
<H3>Description</H3>
<P>Many actions must be taken according to interface state changes. New network
LSAs must be originated, flushed, new multicast sockets to listen for messages for
<I>ALLDROUTERS</I> have to be opened, etc.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_iface_sm</B>
(<I>struct ospf_iface *</I> <B>ifa</B>, <I>int</I> <B>event</B>) --  OSPF interface state machine
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_iface *</I> <B>ifa</B><DD><P>OSPF interface
<DT><I>int</I> <B>event</B><DD><P>event comming to state machine
</DL>
<H3>Description</H3>
<P>This fully respects 9.3 of RFC 2328 except we have slightly
different handling of <I>DOWN</I> and <I>LOOP</I> state. We remove intefaces
that are <I>DOWN</I>. <I>DOWN</I> state is used when an interface is waiting
for a lock. <I>LOOP</I> state is used when an interface does not have a
link.


<HR><H3>Function</H3>
<P><I>int</I>
<B>ospf_rx_hook</B>
(<I>sock *</I> <B>sk</B>, <I>uint</I> <B>len</B>)
<H3>Arguments</H3>
<P>
<DL>
<DT><I>sock *</I> <B>sk</B><DD><P>socket we received the packet.
<DT><I>uint</I> <B>len</B><DD><P>size of the packet
</DL>
<H3>Description</H3>
<P>This is the entry point for messages from neighbors. Many checks (like
authentication, checksums, size) are done before the packet is passed to
non generic functions.


<HR><H3>Function</H3>
<P><I>int</I>
<B>lsa_validate</B>
(<I>struct ospf_lsa_header *</I> <B>lsa</B>, <I>u32</I> <B>lsa_type</B>, <I>int</I> <B>ospf2</B>, <I>void *</I> <B>body</B>) --  check whether given LSA is valid
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_lsa_header *</I> <B>lsa</B><DD><P>LSA header
<DT><I>u32</I> <B>lsa_type</B><DD><P>one of <I>LSA_T_xxx</I>
<DT><I>int</I> <B>ospf2</B><DD><P><I>true</I> means OSPF version 2, <I>false</I> means OSPF version 3
<DT><I>void *</I> <B>body</B><DD><P>pointer to LSA body
</DL>
<H3>Description</H3>
<P>Checks internal structure of given LSA body (minimal length,
consistency). Returns true if valid.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_send_dbdes</B>
(<I>struct ospf_proto *</I> <B>p</B>, <I>struct ospf_neighbor *</I> <B>n</B>) --  transmit database description packet
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
<DT><I>struct ospf_neighbor *</I> <B>n</B><DD><P>neighbor
</DL>
<H3>Description</H3>
<P>Sending of a database description packet is described in 10.8 of RFC 2328.
Reception of each packet is acknowledged in the sequence number of another.
When I send a packet to a neighbor I keep a copy in a buffer. If the neighbor
does not reply, I don't create a new packet but just send the content
of the buffer.


<HR><H3>Function</H3>
<P><I>void</I>
<B>ospf_rt_spf</B>
(<I>struct ospf_proto *</I> <B>p</B>) --  calculate internal routes
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct ospf_proto *</I> <B>p</B><DD><P>OSPF protocol instance
</DL>
<H3>Description</H3>
<P>Calculation of internal paths in an area is described in 16.1 of RFC 2328.
It's based on Dijkstra's shortest path tree algorithms.
This function is invoked from <B>ospf_disp()</B>.

<H2><A NAME="ss5.6">5.6</A> <A HREF="prog.html#toc5.6">Pipe</A>
</H2>

<P>
<P>The Pipe protocol is very simple. It just connects to two routing tables
using <B>proto_add_announce_hook()</B> and whenever it receives a <B>rt_notify()</B>
about a change in one of the tables, it converts it to a <B>rte_update()</B>
in the other one.
<P>To avoid pipe loops, Pipe keeps a `being updated' flag in each routing
table.
<P>A pipe has two announce hooks, the first connected to the main
table, the second connected to the peer table. When a new route is
announced on the main table, it gets checked by an export filter in
ahook 1, and, after that, it is announced to the peer table via
<B>rte_update()</B>, an import filter in ahook 2 is called. When a new
route is announced in the peer table, an export filter in ahook2
and an import filter in ahook 1 are used. Oviously, there is no
need in filtering the same route twice, so both import filters are
set to accept, while user configured 'import' and 'export' filters
are used as export filters in ahooks 2 and 1. Route limits are
handled similarly, but on the import side of ahooks.
<P>
<P>
<H2><A NAME="ss5.7">5.7</A> <A HREF="prog.html#toc5.7">Routing Information Protocol (RIP)</A>
</H2>

<P>
<P>The RIP protocol is implemented in two files: <CODE>rip.c</CODE> containing the protocol
logic, route management and the protocol glue with BIRD core, and <CODE>packets.c</CODE>
handling RIP packet processing, RX, TX and protocol sockets.
<P>Each instance of RIP is described by a structure <I>rip_proto</I>, which contains
an internal RIP routing table, a list of protocol interfaces and the main
timer responsible for RIP routing table cleanup.
<P>RIP internal routing table contains incoming and outgoing routes. For each
network (represented by structure <I>rip_entry</I>) there is one outgoing route
stored directly in <I>rip_entry</I> and an one-way linked list of incoming routes
(structures <I>rip_rte</I>). The list contains incoming routes from different RIP
neighbors, but only routes with the lowest metric are stored (i.e., all
stored incoming routes have the same metric).
<P>Note that RIP itself does not select outgoing route, that is done by the core
routing table. When a new incoming route is received, it is propagated to the
RIP table by <B>rip_update_rte()</B> and possibly stored in the list of incoming
routes. Then the change may be propagated to the core by <B>rip_announce_rte()</B>.
The core selects the best route and propagate it to RIP by <B>rip_rt_notify()</B>,
which updates outgoing route part of <I>rip_entry</I> and possibly triggers route
propagation by <B>rip_trigger_update()</B>.
<P>RIP interfaces are represented by structures <I>rip_iface</I>. A RIP interface
contains a per-interface socket, a list of associated neighbors, interface
configuration, and state information related to scheduled interface events
and running update sessions. RIP interfaces are added and removed based on
core interface notifications.
<P>There are two RIP interface events - regular updates and triggered updates.
Both are managed from the RIP interface timer (<B>rip_iface_timer()</B>). Regular
updates are called at fixed interval and propagate the whole routing table,
while triggered updates are scheduled by <B>rip_trigger_update()</B> due to some
routing table change and propagate only the routes modified since the time
they were scheduled. There are also unicast-destined requested updates, but
these are sent directly as a reaction to received RIP request message. The
update session is started by <B>rip_send_table()</B>. There may be at most one
active update session per interface, as the associated state (including the
fib iterator) is stored directly in <I>rip_iface</I> structure.
<P>RIP neighbors are represented by structures <I>rip_neighbor</I>. Compared to
neighbor handling in other routing protocols, RIP does not have explicit
neighbor discovery and adjacency maintenance, which makes the <I>rip_neighbor</I>
related code a bit peculiar. RIP neighbors are interlinked with core neighbor
structures (<I>neighbor</I>) and use core neighbor notifications to ensure that RIP
neighbors are timely removed. RIP neighbors are added based on received route
notifications and removed based on core neighbor and RIP interface events.
<P>RIP neighbors are linked by RIP routes and use counter to track the number of
associated routes, but when these RIP routes timeout, associated RIP neighbor
is still alive (with zero counter). When RIP neighbor is removed but still
has some associated routes, it is not freed, just changed to detached state
(core neighbors and RIP ifaces are unlinked), then during the main timer
cleanup phase the associated routes are removed and the <I>rip_neighbor</I>
structure is finally freed.
<P>Supported standards:
- RFC 1058 - RIPv1
- RFC 2453 - RIPv2
- RFC 2080 - RIPng
- RFC 4822 - RIP cryptographic authentication
<P>
<P><HR><H3>Function</H3>
<P><I>void</I>
<B>rip_announce_rte</B>
(<I>struct rip_proto *</I> <B>p</B>, <I>struct rip_entry *</I> <B>en</B>) --     announce route from RIP routing table to the core
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct rip_proto *</I> <B>p</B><DD><P>RIP instance
<DT><I>struct rip_entry *</I> <B>en</B><DD><P>related network
</DL>
<H3>Description</H3>
<P>The function takes a list of incoming routes from <B>en</B>, prepare appropriate
<I>rte</I> for the core and propagate it by <B>rte_update()</B>.


<HR><H3>Function</H3>
<P><I>void</I>
<B>rip_update_rte</B>
(<I>struct rip_proto *</I> <B>p</B>, <I>ip_addr *</I> <B>prefix</B>, <I>int</I> <B>pxlen</B>, <I>struct rip_rte *</I> <B>new</B>) --     enter a route update to RIP routing table
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct rip_proto *</I> <B>p</B><DD><P>RIP instance
<DT><I>ip_addr *</I> <B>prefix</B><DD><P>network prefix
<DT><I>int</I> <B>pxlen</B><DD><P>network prefix length
<DT><I>struct rip_rte *</I> <B>new</B><DD><P>a <I>rip_rte</I> representing the new route
</DL>
<H3>Description</H3>
<P>The function is called by the RIP packet processing code whenever it receives
a reachable route. The appropriate routing table entry is found and the list
of incoming routes is updated. Eventually, the change is also propagated to
the core by <B>rip_announce_rte()</B>. Note that for unreachable routes,
<B>rip_withdraw_rte()</B> should be called instead of <B>rip_update_rte()</B>.


<HR><H3>Function</H3>
<P><I>void</I>
<B>rip_withdraw_rte</B>
(<I>struct rip_proto *</I> <B>p</B>, <I>ip_addr *</I> <B>prefix</B>, <I>int</I> <B>pxlen</B>, <I>struct rip_neighbor *</I> <B>from</B>) --     enter a route withdraw to RIP routing table
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct rip_proto *</I> <B>p</B><DD><P>RIP instance
<DT><I>ip_addr *</I> <B>prefix</B><DD><P>network prefix
<DT><I>int</I> <B>pxlen</B><DD><P>network prefix length
<DT><I>struct rip_neighbor *</I> <B>from</B><DD><P>a <I>rip_neighbor</I> propagating the withdraw
</DL>
<H3>Description</H3>
<P>The function is called by the RIP packet processing code whenever it receives
an unreachable route. The incoming route for given network from nbr <B>from</B> is
removed. Eventually, the change is also propagated by <B>rip_announce_rte()</B>.


<HR><H3>Function</H3>
<P><I>void</I>
<B>rip_timer</B>
(<I>timer *</I> <B>t</B>) --     RIP main timer hook
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>t</B><DD><P>timer
</DL>
<H3>Description</H3>
<P>The RIP main timer is responsible for routing table maintenance. Invalid or
expired routes (<I>rip_rte</I>) are removed and garbage collection of stale routing
table entries (<I>rip_entry</I>) is done. Changes are propagated to core tables,
route reload is also done here. Note that garbage collection uses a maximal
GC time, while interfaces maintain an illusion of per-interface GC times in
<B>rip_send_response()</B>.
<P>Keeping incoming routes and the selected outgoing route are two independent
functions, therefore after garbage collection some entries now considered
invalid (RIP_ENTRY_DUMMY) still may have non-empty list of incoming routes,
while some valid entries (representing an outgoing route) may have that list
empty.
<P>The main timer is not scheduled periodically but it uses the time of the
current next event and the minimal interval of any possible event to compute
the time of the next run.


<HR><H3>Function</H3>
<P><I>void</I>
<B>rip_iface_timer</B>
(<I>timer *</I> <B>t</B>) --     RIP interface timer hook
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>timer *</I> <B>t</B><DD><P>timer
</DL>
<H3>Description</H3>
<P>RIP interface timers are responsible for scheduling both regular and
triggered updates. Fixed, delay-independent period is used for regular
updates, while minimal separating interval is enforced for triggered updates.
The function also ensures that a new update is not started when the old one
is still running.


<HR><H3>Function</H3>
<P><I>void</I>
<B>rip_send_table</B>
(<I>struct rip_proto *</I> <B>p</B>, <I>struct rip_iface *</I> <B>ifa</B>, <I>ip_addr</I> <B>addr</B>, <I>bird_clock_t</I> <B>changed</B>) --  RIP interface timer hook
<P>
<H3>Arguments</H3>
<P>
<DL>
<DT><I>struct rip_proto *</I> <B>p</B><DD><P>RIP instance
<DT><I>struct rip_iface *</I> <B>ifa</B><DD><P>RIP interface
<DT><I>ip_addr</I> <B>addr</B><DD><P>destination IP address
<DT><I>bird_clock_t</I> <B>changed</B><DD><P>time limit for triggered updates
</DL>
<H3>Description</H3>
<P>The function activates an update session and starts sending routing update
packets (using <B>rip_send_response()</B>). The session may be finished during the
call or may continue in <B>rip_tx_hook()</B> until all appropriate routes are
transmitted. Note that there may be at most one active update session per
interface, the function will terminate the old active session before
activating the new one.

<H2><A NAME="ss5.8">5.8</A> <A HREF="prog.html#toc5.8">Router Advertisements</A>
</H2>

<P>
<P>The RAdv protocol is implemented in two files: <CODE>radv.c</CODE> containing the
interface with BIRD core and the protocol logic and <CODE>packets.c</CODE> handling low
level protocol stuff (RX, TX and packet formats). The protocol does not
export any routes.
<P>The RAdv is structured in the usual way - for each handled interface there is
a structure <I>radv_iface</I> that contains a state related to that interface
together with its resources (a socket, a timer). There is also a prepared RA
stored in a TX buffer of the socket associated with an iface. These iface
structures are created and removed according to iface events from BIRD core
handled by <B>radv_if_notify()</B> callback.
<P>The main logic of RAdv consists of two functions: <B>radv_iface_notify()</B>, which
processes asynchronous events (specified by RA_EV_* codes), and <B>radv_timer()</B>,
which triggers sending RAs and computes the next timeout.
<P>The RAdv protocol could receive routes (through <B>radv_import_control()</B> and
<B>radv_rt_notify()</B>), but only the configured trigger route is tracked (in
<I>active</I> var).  When a radv protocol is reconfigured, the connected routing
table is examined (in <B>radv_check_active()</B>) to have proper <I>active</I> value in
case of the specified trigger prefix was changed.
<P>Supported standards:
- RFC 4861 - main RA standard
- RFC 4191 - Default Router Preferences and More-Specific Routes
- RFC 6106 - DNS extensions (RDDNS, DNSSL)
<P>
<P>
<H2><A NAME="ss5.9">5.9</A> <A HREF="prog.html#toc5.9">Static</A>
</H2>

<P>
<P>The Static protocol is implemented in a straightforward way. It keeps
two lists of static routes: one containing interface routes and one
holding the remaining ones. Interface routes are inserted and removed according
to interface events received from the core via the <B>if_notify()</B> hook. Routes
pointing to a neighboring router use a sticky node in the neighbor cache
to be notified about gaining or losing the neighbor. Special
routes like black holes or rejects are inserted all the time.
<P>Multipath routes are tricky. Because these routes depends on
several neighbors we need to integrate that to the neighbor
notification handling, we use dummy static_route nodes, one for
each nexthop. Therefore, a multipath route consists of a master
static_route node (of dest RTD_MULTIPATH), which specifies prefix
and is used in most circumstances, and a list of dummy static_route
nodes (of dest RTD_NONE), which stores info about nexthops and are
connected to neighbor entries and neighbor notifications. Dummy
nodes are chained using mp_next, they aren't in other_routes list,
and abuse some fields (masklen, if_name) for other purposes.
<P>The only other thing worth mentioning is that when asked for reconfiguration,
Static not only compares the two configurations, but it also calculates
difference between the lists of static routes and it just inserts the
newly added routes and removes the obsolete ones.
<P>
<P>
<H2><A NAME="ss5.10">5.10</A> <A HREF="prog.html#toc5.10">Direct</A>
</H2>

<P>
<P>The Direct protocol works by converting all <B>ifa_notify()</B> events it receives
to <B>rte_update()</B> calls for the corresponding network.
<P>
<P>
<P>
<HR>
<A HREF="prog-6.html">Next</A>
<A HREF="prog-4.html">Previous</A>
<A HREF="prog.html#toc5">Contents</A>
</BODY>
</HTML>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>