FIB is a data structure designed for storage of routes indexed by their network prefixes. It supports insertion, deletion, searching by prefix, `routing' (in CIDR sense, that is searching for a longest prefix matching a given IP address) and (which makes the structure very tricky to implement) asynchronous reading, that is enumerating the contents of a FIB while other modules add, modify or remove entries.
Internally, each FIB is represented as a collection of nodes of type fib_node indexed using a sophisticated hashing mechanism. We use two-stage hashing where we calculate a 16-bit primary hash key independent on hash table size and then we just divide the primary keys modulo table size to get a real hash key used for determining the bucket containing the node. The lists of nodes in each bucket are sorted according to the primary hash key, hence if we keep the total number of buckets to be a power of two, re-hashing of the structure keeps the relative order of the nodes.
To get the asynchronous reading consistent over node deletions, we need to keep a list of readers for each node. When a node gets deleted, its readers are automatically moved to the next node in the table.
Basic FIB operations are performed by functions defined by this module, enumerating of FIB contents is accomplished by using the FIB_WALK() macro or FIB_ITERATE_START() if you want to do it asynchronously.
For simple iteration just place the body of the loop between FIB_WALK() and FIB_WALK_END(). You can't modify the FIB during the iteration (you can modify data in the node, but not add or remove nodes).
If you need more freedom, you can use the FIB_ITERATE_*() group of macros. First, you initialize an iterator with FIB_ITERATE_INIT(). Then you can put the loop body in between FIB_ITERATE_START() and FIB_ITERATE_END(). In addition, the iteration can be suspended by calling FIB_ITERATE_PUT(). This'll link the iterator inside the FIB. While suspended, you may modify the FIB, exit the current function, etc. To resume the iteration, enter the loop again. You can use FIB_ITERATE_UNLINK() to unlink the iterator (while iteration is suspended) in cases like premature end of FIB iteration.
Note that the iterator must not be destroyed when the iteration is suspended, the FIB would then contain a pointer to invalid memory. Therefore, after each FIB_ITERATE_INIT() or FIB_ITERATE_PUT() there must be either FIB_ITERATE_START() or FIB_ITERATE_UNLINK() before the iterator is destroyed.
void fib_init (struct fib * f, pool * p, unsigned node_size, unsigned hash_order, fib_init_func init) -- initialize a new FIB
the FIB to be initialized (the structure itself being allocated by the caller)
pool to allocate the nodes in
node size to be used (each node consists of a standard header fib_node followed by user data)
initial hash order (a binary logarithm of hash table size), 0 to use default order (recommended)
pointer a function to be called to initialize a newly created node
This function initializes a newly allocated FIB and prepares it for use.
void * fib_find (struct fib * f, ip_addr * a, int len) -- search for FIB node by prefix
FIB to search in
pointer to IP address of the prefix
prefix length
Search for a FIB node corresponding to the given prefix, return a pointer to it or NULL if no such node exists.
void * fib_get (struct fib * f, ip_addr * a, int len) -- find or create a FIB node
FIB to work with
pointer to IP address of the prefix
prefix length
Search for a FIB node corresponding to the given prefix and return a pointer to it. If no such node exists, create it.
void * fib_route (struct fib * f, ip_addr a, int len) -- CIDR routing lookup
FIB to search in
pointer to IP address of the prefix
prefix length
Search for a FIB node with longest prefix matching the given network, that is a node which a CIDR router would use for routing that network.
void fib_delete (struct fib * f, void * E) -- delete a FIB node
FIB to delete from
entry to delete
This function removes the given entry from the FIB, taking care of all the asynchronous readers by shifting them to the next node in the canonical reading order.
void fib_free (struct fib * f) -- delete a FIB
FIB to be deleted
This function deletes a FIB -- it frees all memory associated with it and all its entries.
void fib_check (struct fib * f) -- audit a FIB
FIB to be checked
This debugging function audits a FIB by checking its internal consistency. Use when you suspect somebody of corrupting innocent data structures.
Routing tables are probably the most important structures BIRD uses. They hold all the information about known networks, the associated routes and their attributes.
There are multiple routing tables (a primary one together with any number of secondary ones if requested by the configuration). Each table is basically a FIB containing entries describing the individual destination networks. For each network (represented by structure net), there is a one-way linked list of route entries (rte), the first entry on the list being the best one (i.e., the one we currently use for routing), the order of the other ones is undetermined.
The rte contains information specific to the route (preference, protocol metrics, time of last modification etc.) and a pointer to a rta structure (see the route attribute module for a precise explanation) holding the remaining route attributes which are expected to be shared by multiple routes in order to conserve memory.
rte * rte_find (net * net, struct rte_src * src) -- find a route
network node
route source
The rte_find() function returns a route for destination net which is from route source src.
rte * rte_get_temp (rta * a) -- get a temporary rte
attributes to assign to the new route (a rta; in case it's un-cached, rte_update() will create a cached copy automatically)
Create a temporary rte and bind it with the attributes a. Also set route preference to the default preference set for the protocol.
rte * rte_cow_rta (rte * r, linpool * lp) -- get a private writable copy of rte with writable rta
a route entry to be copied
a linpool from which to allocate rta
rte_cow_rta() takes a rte and prepares it and associated rta for modification. There are three possibilities: First, both rte and rta are private copies, in that case they are returned unchanged. Second, rte is private copy, but rta is cached, in that case rta is duplicated using rta_do_cow(). Third, both rte is shared and rta is cached, in that case both structures are duplicated by rte_do_cow() and rta_do_cow().
Note that in the second case, cached rta loses one reference, while private copy created by rta_do_cow() is a shallow copy sharing indirect data (eattrs, nexthops, ...) with it. To work properly, original shared rta should have another reference during the life of created private copy.
a pointer to the new writable rte with writable rta.
void rte_announce (rtable * tab, unsigned type, net * net, rte * new, rte * old, rte * new_best, rte * old_best, rte * before_old) -- announce a routing table change
table the route has been added to
type of route announcement (RA_OPTIMAL or RA_ANY)
network in question
the new route to be announced
the previous route for the same network
the new best route for the same network
the previous best route for the same network
The previous route before old for the same network. If before_old is NULL old was the first.
This function gets a routing table update and announces it to all protocols that acccepts given type of route announcement and are connected to the same table by their announcement hooks.
Route announcement of type RA_OPTIMAL si generated when optimal route (in routing table tab) changes. In that case old stores the old optimal route.
Route announcement of type RA_ANY si generated when any route (in routing table tab) changes In that case old stores the old route from the same protocol.
For each appropriate protocol, we first call its import_control() hook which performs basic checks on the route (each protocol has a right to veto or force accept of the route before any filter is asked) and adds default values of attributes specific to the new protocol (metrics, tags etc.). Then it consults the protocol's export filter and if it accepts the route, the rt_notify() hook of the protocol gets called.
void rte_free (rte * e) -- delete a rte
rte to be deleted
rte_free() deletes the given rte from the routing table it's linked to.
void rte_update2 (struct announce_hook * ah, net * net, rte * new, struct rte_src * src) -- enter a new update to a routing table
pointer to table announce hook
network node
a rte representing the new route or NULL for route removal.
protocol originating the update
This function is called by the routing protocols whenever they discover a new route or wish to update/remove an existing route. The right announcement sequence is to build route attributes first (either un-cached with aflags set to zero or a cached one using rta_lookup(); in this case please note that you need to increase the use count of the attributes yourself by calling rta_clone()), call rte_get_temp() to obtain a temporary rte, fill in all the appropriate data and finally submit the new rte by calling rte_update().
src specifies the protocol that originally created the route and the meaning of protocol-dependent data of new. If new is not NULL, src have to be the same value as new->attrs->proto. p specifies the protocol that called rte_update(). In most cases it is the same protocol as src. rte_update() stores p in new->sender;
When rte_update() gets any route, it automatically validates it (checks, whether the network and next hop address are valid IP addresses and also whether a normal routing protocol doesn't try to smuggle a host or link scope route to the table), converts all protocol dependent attributes stored in the rte to temporary extended attributes, consults import filters of the protocol to see if the route should be accepted and/or its attributes modified, stores the temporary attributes back to the rte.
Now, having a "public" version of the route, we automatically find any old route defined by the protocol src for network n, replace it by the new one (or removing it if new is NULL), recalculate the optimal route for this destination and finally broadcast the change (if any) to all routing protocols by calling rte_announce().
All memory used for attribute lists and other temporary allocations is taken from a special linear pool rte_update_pool and freed when rte_update() finishes.
void rt_refresh_begin (rtable * t, struct announce_hook * ah) -- start a refresh cycle
related routing table
related announce hook
This function starts a refresh cycle for given routing table and announce hook. The refresh cycle is a sequence where the protocol sends all its valid routes to the routing table (by rte_update()). After that, all protocol routes (more precisely routes with ah as sender) not sent during the refresh cycle but still in the table from the past are pruned. This is implemented by marking all related routes as stale by REF_STALE flag in rt_refresh_begin(), then marking all related stale routes with REF_DISCARD flag in rt_refresh_end() and then removing such routes in the prune loop.
void rt_refresh_end (rtable * t, struct announce_hook * ah) -- end a refresh cycle
related routing table
related announce hook
This function starts a refresh cycle for given routing table and announce hook. See rt_refresh_begin() for description of refresh cycles.
void rte_dump (rte * e) -- dump a route
rte to be dumped
This functions dumps contents of a rte to debug output.
void rt_dump (rtable * t) -- dump a routing table
routing table to be dumped
This function dumps contents of a given routing table to debug output.
void rt_dump_all (void) -- dump all routing tables
This function dumps contents of all routing tables to debug output.
void rt_init (void) -- initialize routing tables
This function is called during BIRD startup. It initializes the routing table module.
int rt_prune_table (rtable * tab) -- prune a routing table
a routing table for pruning
This function scans the routing table tab and removes routes belonging to flushing protocols, discarded routes and also stale network entries, in a similar fashion like rt_prune_loop(). Returns 1 when all such routes are pruned. Contrary to rt_prune_loop(), this function is not a part of the protocol flushing loop, but it is called from rt_event() for just one routing table.
Note that rt_prune_table() and rt_prune_loop() share (for each table) the prune state (prune_state) and also the pruning iterator (prune_fit).
int rt_prune_loop (void) -- prune routing tables
The prune loop scans routing tables and removes routes belonging to flushing protocols, discarded routes and also stale network entries. Returns 1 when all such routes are pruned. It is a part of the protocol flushing loop.
void rt_lock_table (rtable * r) -- lock a routing table
routing table to be locked
Lock a routing table, because it's in use by a protocol, preventing it from being freed when it gets undefined in a new configuration.
void rt_unlock_table (rtable * r) -- unlock a routing table
routing table to be unlocked
Unlock a routing table formerly locked by rt_lock_table(), that is decrease its use count and delete it if it's scheduled for deletion by configuration changes.
void rt_commit (struct config * new, struct config * old) -- commit new routing table configuration
new configuration
original configuration or NULL if it's boot time config
Scan differences between old and new configuration and modify the routing tables according to these changes. If new defines a previously unknown table, create it, if it omits a table existing in old, schedule it for deletion (it gets deleted when all protocols disconnect from it by calling rt_unlock_table()), if it exists in both configurations, leave it unchanged.
int rt_feed_baby (struct proto * p) -- advertise routes to a new protocol
protocol to be fed
This function performs one pass of advertisement of routes to a newly initialized protocol. It's called by the protocol code as long as it has something to do. (We avoid transferring all the routes in single pass in order not to monopolize CPU time.)
void rt_feed_baby_abort (struct proto * p) -- abort protocol feeding
protocol
This function is called by the protocol code when the protocol stops or ceases to exist before the last iteration of rt_feed_baby() has finished.
net * net_find (rtable * tab, ip_addr addr, unsigned len) -- find a network entry
a routing table
address of the network
length of the network prefix
net_find() looks up the given network in routing table tab and returns a pointer to its net entry or NULL if no such network exists.
net * net_get (rtable * tab, ip_addr addr, unsigned len) -- obtain a network entry
a routing table
address of the network
length of the network prefix
net_get() looks up the given network in routing table tab and returns a pointer to its net entry. If no such entry exists, it's created.
rte * rte_cow (rte * r) -- copy a route for writing
a route entry to be copied
rte_cow() takes a rte and prepares it for modification. The exact action taken depends on the flags of the rte -- if it's a temporary entry, it's just returned unchanged, else a new temporary entry with the same contents is created.
The primary use of this function is inside the filter machinery -- when a filter wants to modify rte contents (to change the preference or to attach another set of attributes), it must ensure that the rte is not shared with anyone else (and especially that it isn't stored in any routing table).
a pointer to the new writable rte.
Each route entry carries a set of route attributes. Several of them vary from route to route, but most attributes are usually common for a large number of routes. To conserve memory, we've decided to store only the varying ones directly in the rte and hold the rest in a special structure called rta which is shared among all the rte's with these attributes.
Each rta contains all the static attributes of the route (i.e., those which are always present) as structure members and a list of dynamic attributes represented by a linked list of ea_list structures, each of them consisting of an array of eattr's containing the individual attributes. An attribute can be specified more than once in the ea_list chain and in such case the first occurrence overrides the others. This semantics is used especially when someone (for example a filter) wishes to alter values of several dynamic attributes, but it wants to preserve the original attribute lists maintained by another module.
Each eattr contains an attribute identifier (split to protocol ID and per-protocol attribute ID), protocol dependent flags, a type code (consisting of several bit fields describing attribute characteristics) and either an embedded 32-bit value or a pointer to a adata structure holding attribute contents.
There exist two variants of rta's -- cached and un-cached ones. Un-cached rta's can have arbitrarily complex structure of ea_list's and they can be modified by any module in the route processing chain. Cached rta's have their attribute lists normalized (that means at most one ea_list is present and its values are sorted in order to speed up searching), they are stored in a hash table to make fast lookup possible and they are provided with a use count to allow sharing.
Routing tables always contain only cached rta's.
struct mpnh * mpnh_merge (struct mpnh * x, struct mpnh * y, int rx, int ry, int max, linpool * lp) -- merge nexthop lists
list 1
list 2
reusability of list x
reusability of list y
max number of nexthops
linpool for allocating nexthops
The mpnh_merge() function takes two nexthop lists x and y and merges them, eliminating possible duplicates. The input lists must be sorted and the result is sorted too. The number of nexthops in result is limited by max. New nodes are allocated from linpool lp.
The arguments rx and ry specify whether corresponding input lists may be consumed by the function (i.e. their nodes reused in the resulting list), in that case the caller should not access these lists after that. To eliminate issues with deallocation of these lists, the caller should use some form of bulk deallocation (e.g. stack or linpool) to free these nodes when the resulting list is no longer needed. When reusability is not set, the corresponding lists are not modified nor linked from the resulting list.
eattr * ea_find (ea_list * e, unsigned id) -- find an extended attribute
attribute list to search in
attribute ID to search for
Given an extended attribute list, ea_find() searches for a first occurrence of an attribute with specified ID, returning either a pointer to its eattr structure or NULL if no such attribute exists.
eattr * ea_walk (struct ea_walk_state * s, uint id, uint max) -- walk through extended attributes
walk state structure
start of attribute ID interval
length of attribute ID interval
Given an extended attribute list, ea_walk() walks through the list looking for first occurrences of attributes with ID in specified interval from id to (id + max - 1), returning pointers to found eattr structures, storing its walk state in s for subsequent calls.
The function ea_walk() is supposed to be called in a loop, with initially zeroed walk state structure s with filled the initial extended attribute list, returning one found attribute in each call or NULL when no other attribute exists. The extended attribute list or the arguments should not be modified between calls. The maximum value of max is 128.
int ea_get_int (ea_list * e, unsigned id, int def) -- fetch an integer attribute
attribute list
attribute ID
default value
This function is a shortcut for retrieving a value of an integer attribute by calling ea_find() to find the attribute, extracting its value or returning a provided default if no such attribute is present.
void ea_sort (ea_list * e) -- sort an attribute list
list to be sorted
This function takes a ea_list chain and sorts the attributes within each of its entries.
If an attribute occurs multiple times in a single ea_list, ea_sort() leaves only the first (the only significant) occurrence.
unsigned ea_scan (ea_list * e) -- estimate attribute list size
attribute list
This function calculates an upper bound of the size of a given ea_list after merging with ea_merge().
void ea_merge (ea_list * e, ea_list * t) -- merge segments of an attribute list
attribute list
buffer to store the result to
This function takes a possibly multi-segment attribute list and merges all of its segments to one.
The primary use of this function is for ea_list normalization: first call ea_scan() to determine how much memory will the result take, then allocate a buffer (usually using alloca()), merge the segments with ea_merge() and finally sort and prune the result by calling ea_sort().
int ea_same (ea_list * x, ea_list * y) -- compare two ea_list's
attribute list
attribute list
ea_same() compares two normalized attribute lists x and y and returns 1 if they contain the same attributes, 0 otherwise.
void ea_show (struct cli * c, eattr * e) -- print an eattr to CLI
destination CLI
attribute to be printed
This function takes an extended attribute represented by its eattr structure and prints it to the CLI according to the type information.
If the protocol defining the attribute provides its own get_attr() hook, it's consulted first.
void ea_dump (ea_list * e) -- dump an extended attribute
attribute to be dumped
ea_dump() dumps contents of the extended attribute given to the debug output.
uint ea_hash (ea_list * e) -- calculate an ea_list hash key
attribute list
ea_hash() takes an extended attribute list and calculated a hopefully uniformly distributed hash value from its contents.
ea_list * ea_append (ea_list * to, ea_list * what) -- concatenate ea_list's
destination list (can be NULL)
list to be appended (can be NULL)
This function appends the ea_list what at the end of ea_list to and returns a pointer to the resulting list.
rta * rta_lookup (rta * o) -- look up a rta in attribute cache
a un-cached rta
rta_lookup() gets an un-cached rta structure and returns its cached counterpart. It starts with examining the attribute cache to see whether there exists a matching entry. If such an entry exists, it's returned and its use count is incremented, else a new entry is created with use count set to 1.
The extended attribute lists attached to the rta are automatically converted to the normalized form.
void rta_dump (rta * a) -- dump route attributes
attribute structure to dump
This function takes a rta and dumps its contents to the debug output.
void rta_dump_all (void) -- dump attribute cache
This function dumps the whole contents of route attribute cache to the debug output.
void rta_init (void) -- initialize route attribute cache
This function is called during initialization of the routing table module to set up the internals of the attribute cache.
rta * rta_clone (rta * r) -- clone route attributes
a rta to be cloned
rta_clone() takes a cached rta and returns its identical cached copy. Currently it works by just returning the original rta with its use count incremented.
void rta_free (rta * r) -- free route attributes
a rta to be freed
If you stop using a rta (for example when deleting a route which uses it), you need to call rta_free() to notify the attribute cache the attribute is no longer in use and can be freed if you were the last user (which rta_free() tests by inspecting the use count).
The routing protocols are the bird's heart and a fine amount of code
is dedicated to their management and for providing support functions to them.
(-: Actually, this is the reason why the directory with sources of the core
code is called nest
:-).
When talking about protocols, one need to distinguish between protocols and protocol instances. A protocol exists exactly once, not depending on whether it's configured or not and it can have an arbitrary number of instances corresponding to its "incarnations" requested by the configuration file. Each instance is completely autonomous, has its own configuration, its own status, its own set of routes and its own set of interfaces it works on.
A protocol is represented by a protocol structure containing all the basic information (protocol name, default settings and pointers to most of the protocol hooks). All these structures are linked in the protocol_list list.
Each instance has its own proto structure describing all its properties: protocol
type, configuration, a resource pool where all resources belonging to the instance
live, various protocol attributes (take a look at the declaration of proto in
protocol.h
), protocol states (see below for what do they mean), connections
to routing tables, filters attached to the protocol
and finally a set of pointers to the rest of protocol hooks (they
are the same for all instances of the protocol, but in order to avoid extra
indirections when calling the hooks from the fast path, they are stored directly
in proto). The instance is always linked in both the global instance list
(proto_list) and a per-status list (either active_proto_list for
running protocols, initial_proto_list for protocols being initialized or
flush_proto_list when the protocol is being shut down).
The protocol hooks are described in the next chapter, for more information about configuration of protocols, please refer to the configuration chapter and also to the description of the proto_commit function.
As startup and shutdown of each protocol are complex processes which can be affected by lots of external events (user's actions, reconfigurations, behavior of neighboring routers etc.), we have decided to supervise them by a pair of simple state machines -- the protocol state machine and a core state machine.
The protocol state machine corresponds to internal state of the protocol and the protocol can alter its state whenever it wants to. There are the following states:
PS_DOWN
The protocol is down and waits for being woken up by calling its start() hook.
PS_START
The protocol is waiting for connection with the rest of the network. It's active, it has resources allocated, but it still doesn't want any routes since it doesn't know what to do with them.
PS_UP
The protocol is up and running. It communicates with the core, delivers routes to tables and wants to hear announcement about route changes.
PS_STOP
The protocol has been shut down (either by being asked by the core code to do so or due to having encountered a protocol error).
Unless the protocol is in the PS_DOWN
state, it can decide to change
its state by calling the proto_notify_state function.
At any time, the core code can ask the protocol to shut itself down by calling its stop() hook.
The core state machine takes care of the core view of protocol state. The states are traversed according to changes of the protocol state machine, but sometimes the transitions are delayed if the core needs to finish some actions (for example sending of new routes to the protocol) before proceeding to the new state. There are the following core states:
FS_HUNGRY
The protocol is down, it doesn't have any routes and doesn't want them.
FS_FEEDING
The protocol has reached the PS_UP
state, but
we are still busy sending the initial set of routes to it.
FS_HAPPY
The protocol is up and has complete routing information.
FS_FLUSHING
The protocol is shutting down (it's in either PS_STOP
or PS_DOWN
state) and we're flushing all of its routes from the
routing tables.
The protocol module provides the following functions:
void * proto_new (struct proto_config * c, unsigned size) -- create a new protocol instance
protocol configuration
size of protocol data structure (each protocol instance is represented by a structure starting with generic part [struct proto] and continued with data specific to the protocol)
When a new configuration has been read in, the core code starts initializing all the protocol instances configured by calling their init() hooks with the corresponding instance configuration. The initialization code of the protocol is expected to create a new instance according to the configuration by calling this function and then modifying the default settings to values wanted by the protocol.
struct announce_hook * proto_add_announce_hook (struct proto * p, struct rtable * t, struct proto_stats * stats) -- connect protocol to a routing table
protocol instance
routing table to connect to
per-table protocol statistics
This function creates a connection between the protocol instance p and the routing table t, making the protocol hear all changes in the table.
The announce hook is linked in the protocol ahook list. Announce hooks are allocated from the routing table resource pool and when protocol accepts routes also in the table ahook list. The are linked to the table ahook list and unlinked from it depending on export_state (in proto_want_export_up() and proto_want_export_down()) and they are automatically freed after the protocol is flushed (in proto_fell_down()).
Unless you want to listen to multiple routing tables (as the Pipe protocol does), you needn't to worry about this function since the connection to the protocol's primary routing table is initialized automatically by the core code.
struct announce_hook * proto_find_announce_hook (struct proto * p, struct rtable * t) -- find announce hooks
protocol instance
routing table
Returns pointer to announce hook or NULL
void * proto_config_new (struct protocol * pr, int class) -- create a new protocol configuration
protocol the configuration will belong to
SYM_PROTO or SYM_TEMPLATE
Whenever the configuration file says that a new instance of a routing protocol should be created, the parser calls proto_config_new() to create a configuration entry for this instance (a structure staring with the proto_config header containing all the generic items followed by protocol-specific ones). Also, the configuration entry gets added to the list of protocol instances kept in the configuration.
The function is also used to create protocol templates (when class SYM_TEMPLATE is specified), the only difference is that templates are not added to the list of protocol instances and therefore not initialized during protos_commit()).
void proto_copy_config (struct proto_config * dest, struct proto_config * src) -- copy a protocol configuration
destination protocol configuration
source protocol configuration
Whenever a new instance of a routing protocol is created from the template, proto_copy_config() is called to copy a content of the source protocol configuration to the new protocol configuration. Name, class and a node in protos list of dest are kept intact. copy_config() protocol hook is used to copy protocol-specific data.
void protos_preconfig (struct config * c) -- pre-configuration processing
new configuration
This function calls the preconfig() hooks of all routing protocols available to prepare them for reading of the new configuration.
void protos_postconfig (struct config * c) -- post-configuration processing
new configuration
This function calls the postconfig() hooks of all protocol instances specified in configuration c. The hooks are not called for protocol templates.
void protos_commit (struct config * new, struct config * old, int force_reconfig, int type) -- commit new protocol configuration
new configuration
old configuration or NULL if it's boot time config
force restart of all protocols (used for example when the router ID changes)
type of reconfiguration (RECONFIG_SOFT or RECONFIG_HARD)
Scan differences between old and new configuration and adjust all protocol instances to conform to the new configuration.
When a protocol exists in the new configuration, but it doesn't in the original one, it's immediately started. When a collision with the other running protocol would arise, the new protocol will be temporarily stopped by the locking mechanism.
When a protocol exists in the old configuration, but it doesn't in the new one, it's shut down and deleted after the shutdown completes.
When a protocol exists in both configurations, the core decides whether it's possible to reconfigure it dynamically - it checks all the core properties of the protocol (changes in filters are ignored if type is RECONFIG_SOFT) and if they match, it asks the reconfigure() hook of the protocol to see if the protocol is able to switch to the new configuration. If it isn't possible, the protocol is shut down and a new instance is started with the new configuration after the shutdown is completed.
Graceful restart of a router is a process when the routing plane (e.g. BIRD) restarts but both the forwarding plane (e.g kernel routing table) and routing neighbors keep proper routes, and therefore uninterrupted packet forwarding is maintained.
BIRD implements graceful restart recovery by deferring export of routes to protocols until routing tables are refilled with the expected content. After start, protocols generate routes as usual, but routes are not propagated to them, until protocols report that they generated all routes. After that, graceful restart recovery is finished and the export (and the initial feed) to protocols is enabled.
When graceful restart recovery need is detected during initialization, then enabled protocols are marked with gr_recovery flag before start. Such protocols then decide how to proceed with graceful restart, participation is voluntary. Protocols could lock the recovery by proto_graceful_restart_lock() (stored in gr_lock flag), which means that they want to postpone the end of the recovery until they converge and then unlock it. They also could set gr_wait before advancing to PS_UP, which means that the core should defer route export to that protocol until the end of the recovery. This should be done by protocols that expect their neigbors to keep the proper routes (kernel table, BGP sessions with BGP graceful restart capability).
The graceful restart recovery is finished when either all graceful restart locks are unlocked or when graceful restart wait timer fires.
void graceful_restart_recovery (void) -- request initial graceful restart recovery
Called by the platform initialization code if the need for recovery after graceful restart is detected during boot. Have to be called before protos_commit().
void graceful_restart_init (void) -- initialize graceful restart
When graceful restart recovery was requested, the function starts an active phase of the recovery and initializes graceful restart wait timer. The function have to be called after protos_commit().
void graceful_restart_done (struct timer *t UNUSED) -- finalize graceful restart
-- undescribed --
When there are no locks on graceful restart, the functions finalizes the graceful restart recovery. Protocols postponing route export until the end of the recovery are awakened and the export to them is enabled. All other related state is cleared. The function is also called when the graceful restart wait timer fires (but there are still some locks).
void proto_graceful_restart_lock (struct proto * p) -- lock graceful restart by protocol
protocol instance
This function allows a protocol to postpone the end of graceful restart recovery until it converges. The lock is removed when the protocol calls proto_graceful_restart_unlock() or when the protocol is stopped.
The function have to be called during the initial phase of graceful restart recovery and only for protocols that are part of graceful restart (i.e. their gr_recovery is set), which means it should be called from protocol start hooks.
void proto_graceful_restart_unlock (struct proto * p) -- unlock graceful restart by protocol
protocol instance
This function unlocks a lock from proto_graceful_restart_lock(). It is also automatically called when the lock holding protocol went down.
void protos_dump_all (void) -- dump status of all protocols
This function dumps status of all existing protocol instances to the debug output. It involves printing of general status information such as protocol states, its position on the protocol lists and also calling of a dump() hook of the protocol to print the internals.
void proto_build (struct protocol * p) -- make a single protocol available
the protocol
After the platform specific initialization code uses protos_build() to add all the standard protocols, it should call proto_build() for all platform specific protocols to inform the core that they exist.
void protos_build (void) -- build a protocol list
This function is called during BIRD startup to insert all standard protocols to the global protocol list. Insertion of platform specific protocols (such as the kernel syncer) is in the domain of competence of the platform dependent startup code.
void proto_set_message (struct proto * p, char * msg, int len) -- set administrative message to protocol
protocol
message
message length (-1 for NULL-terminated string)
The function sets administrative message (string) related to protocol state change. It is called by the nest code for manual enable/disable/restart commands all routes to the protocol, and by protocol-specific code when the protocol state change is initiated by the protocol. Using NULL message clears the last message. The message string may be either NULL-terminated or with an explicit length.
void proto_request_feeding (struct proto * p) -- request feeding routes to the protocol
given protocol
Sometimes it is needed to send again all routes to the protocol. This is called feeding and can be requested by this function. This would cause protocol export state transition to ES_FEEDING (during feeding) and when completed, it will switch back to ES_READY. This function can be called even when feeding is already running, in that case it is restarted.
void proto_notify_limit (struct announce_hook * ah, struct proto_limit * l, int dir, u32 rt_count)
announce hook
limit being hit
limit direction (PLD_*)
the number of routes
The function is called by the route processing core when limit l is breached. It activates the limit and tooks appropriate action according to l->action.
void proto_notify_state (struct proto * p, unsigned ps) -- notify core about protocol state change
protocol the state of which has changed
the new status
Whenever a state of a protocol changes due to some event internal to the protocol (i.e., not inside a start() or shutdown() hook), it should immediately notify the core about the change by calling proto_notify_state() which will write the new state to the proto structure and take all the actions necessary to adapt to the new state. State change to PS_DOWN immediately frees resources of protocol and might execute start callback of protocol; therefore, it should be used at tail positions of protocol callbacks.
Each protocol can provide a rich set of hook functions referred to by pointers in either the proto or protocol structure. They are called by the core whenever it wants the protocol to perform some action or to notify the protocol about any change of its environment. All of the hooks can be set to NULL which means to ignore the change or to take a default action.
void preconfig (struct protocol * p, struct config * c) -- protocol preconfiguration
a routing protocol
new configuration
The preconfig() hook is called before parsing of a new configuration.
void postconfig (struct proto_config * c) -- instance post-configuration
instance configuration
The postconfig() hook is called for each configured instance after parsing of the new configuration is finished.
struct proto * init (struct proto_config * c) -- initialize an instance
instance configuration
The init() hook is called by the core to create a protocol instance according to supplied protocol configuration.
a pointer to the instance created
int reconfigure (struct proto * p, struct proto_config * c) -- request instance reconfiguration
an instance
new configuration
The core calls the reconfigure() hook whenever it wants to ask the protocol for switching to a new configuration. If the reconfiguration is possible, the hook returns 1. Otherwise, it returns 0 and the core will shut down the instance and start a new one with the new configuration.
After the protocol confirms reconfiguration, it must no longer keep any references to the old configuration since the memory it's stored in can be re-used at any time.
void dump (struct proto * p) -- dump protocol state
an instance
This hook dumps the complete state of the instance to the debug output.
void dump_attrs (rte * e) -- dump protocol-dependent attributes
a route entry
This hook dumps all attributes in the rte which belong to this protocol to the debug output.
int start (struct proto * p) -- request instance startup
protocol instance
The start() hook is called by the core when it wishes to start the instance. Multitable protocols should lock their tables here.
new protocol state
int shutdown (struct proto * p) -- request instance shutdown
protocol instance
The stop() hook is called by the core when it wishes to shut the instance down for some reason.
new protocol state
void cleanup (struct proto * p) -- request instance cleanup
protocol instance
The cleanup() hook is called by the core when the protocol became hungry/down, i.e. all protocol ahooks and routes are flushed. Multitable protocols should unlock their tables here.
void get_status (struct proto * p, byte * buf) -- get instance status
protocol instance
buffer to be filled with the status string
This hook is called by the core if it wishes to obtain an brief one-line user friendly representation of the status of the instance to be printed by the <cf/show protocols/ command.
void get_route_info (rte * e, byte * buf, ea_list * attrs) -- get route information
a route entry
buffer to be filled with the resulting string
extended attributes of the route
This hook is called to fill the buffer buf with a brief user friendly representation of metrics of a route belonging to this protocol.
int get_attr (eattr * a, byte * buf, int buflen) -- get attribute information
an extended attribute
buffer to be filled with attribute information
a length of the buf parameter
The get_attr() hook is called by the core to obtain a user friendly representation of an extended route attribute. It can either leave the whole conversion to the core (by returning GA_UNKNOWN), fill in only attribute name (and let the core format the attribute value automatically according to the type field; by returning GA_NAME) or doing the whole conversion (used in case the value requires extra care; return GA_FULL).
void if_notify (struct proto * p, unsigned flags, struct iface * i) -- notify instance about interface changes
protocol instance
interface change flags
the interface in question
This hook is called whenever any network interface changes its status. The change is described by a combination of status bits (IF_CHANGE_xxx) in the flags parameter.
void ifa_notify (struct proto * p, unsigned flags, struct ifa * a) -- notify instance about interface address changes
protocol instance
address change flags
the interface address
This hook is called to notify the protocol instance about an interface acquiring or losing one of its addresses. The change is described by a combination of status bits (IF_CHANGE_xxx) in the flags parameter.
void rt_notify (struct proto * p, net * net, rte * new, rte * old, ea_list * attrs) -- notify instance about routing table change
protocol instance
a network entry
new route for the network
old route for the network
extended attributes associated with the new entry
The rt_notify() hook is called to inform the protocol instance about changes in the connected routing table table, that is a route old belonging to network net being replaced by a new route new with extended attributes attrs. Either new or old or both can be NULL if the corresponding route doesn't exist.
If the type of route announcement is RA_OPTIMAL, it is an announcement of optimal route change, new stores the new optimal route and old stores the old optimal route.
If the type of route announcement is RA_ANY, it is an announcement of any route change, new stores the new route and old stores the old route from the same protocol.
p->accept_ra_types specifies which kind of route announcements protocol wants to receive.
void neigh_notify (neighbor * neigh) -- notify instance about neighbor status change
a neighbor cache entry
The neigh_notify() hook is called by the neighbor cache whenever a neighbor changes its state, that is it gets disconnected or a sticky neighbor gets connected.
ea_list * make_tmp_attrs (rte * e, struct linpool * pool) -- convert embedded attributes to temporary ones
route entry
linear pool to allocate attribute memory in
This hook is called by the routing table functions if they need to convert the protocol attributes embedded directly in the rte to temporary extended attributes in order to distribute them to other protocols or to filters. make_tmp_attrs() creates an ea_list in the linear pool pool, fills it with values of the temporary attributes and returns a pointer to it.
void store_tmp_attrs (rte * e, ea_list * attrs) -- convert temporary attributes to embedded ones
route entry
temporary attributes to be converted
This hook is an exact opposite of make_tmp_attrs() -- it takes a list of extended attributes and converts them to attributes embedded in the rte corresponding to this protocol.
You must be prepared for any of the attributes being missing from the list and use default values instead.
int import_control (struct proto * p, rte ** e, ea_list ** attrs, struct linpool * pool) -- pre-filtering decisions on route import
protocol instance the route is going to be imported to
the route in question
extended attributes of the route
linear pool for allocation of all temporary data
The import_control() hook is called as the first step of a exporting a route from a routing table to the protocol instance. It can modify route attributes and force acceptance or rejection of the route regardless of user-specified filters. See rte_announce() for a complete description of the route distribution process.
The standard use of this hook is to reject routes having originated from the same instance and to set default values of the protocol's metrics.
1 if the route has to be accepted, -1 if rejected and 0 if it should be passed to the filters.
int rte_recalculate (struct rtable * table, struct network * net, struct rte * new, struct rte * old, struct rte * old_best) -- prepare routes for comparison
a routing table
a network entry
new route for the network
old route for the network
old best route for the network (may be NULL)
This hook is called when a route change (from old to new for a net entry) is propagated to a table. It may be used to prepare routes for comparison by rte_better() in the best route selection. new may or may not be in net->routes list, old is not there.
1 if the ordering implied by rte_better() changes enough that full best route calculation have to be done, 0 otherwise.
int rte_better (rte * new, rte * old) -- compare metrics of two routes
the new route
the original route
This hook gets called when the routing table contains two routes for the same network which have originated from different instances of a single protocol and it wants to select which one is preferred over the other one. Protocols usually decide according to route metrics.
1 if new is better (more preferred) than old, 0 otherwise.
int rte_same (rte * e1, rte * e2) -- compare two routes
route
route
The rte_same() hook tests whether the routes e1 and e2 belonging to the same protocol instance have identical contents. Contents of rta, all the extended attributes and rte preference are checked by the core code, no need to take care of them here.
1 if e1 is identical to e2, 0 otherwise.
void rte_insert (net * n, rte * e) -- notify instance about route insertion
network
route
This hook is called whenever a rte belonging to the instance is accepted for insertion to a routing table.
Please avoid using this function in new protocols.
void rte_remove (net * n, rte * e) -- notify instance about route removal
network
route
This hook is called whenever a rte belonging to the instance is removed from a routing table.
Please avoid using this function in new protocols.
The interface module keeps track of all network interfaces in the system and their addresses.
Each interface is represented by an iface structure which carries interface capability flags (IF_MULTIACCESS, IF_BROADCAST etc.), MTU, interface name and index and finally a linked list of network prefixes assigned to the interface, each one represented by struct ifa.
The interface module keeps a `soft-up' state for each iface which is a conjunction of link being up, the interface being of a `sane' type and at least one IP address assigned to it.
void ifa_dump (struct ifa * a) -- dump interface address
interface address descriptor
This function dumps contents of an ifa to the debug output.
void if_dump (struct iface * i) -- dump interface
interface to dump
This function dumps all information associated with a given network interface to the debug output.
void if_dump_all (void) -- dump all interfaces
This function dumps information about all known network interfaces to the debug output.
void if_delete (struct iface * old) -- remove interface
interface
This function is called by the low-level platform dependent code whenever it notices an interface disappears. It is just a shorthand for if_update().
struct iface * if_update (struct iface * new) -- update interface status
new interface status
if_update() is called by the low-level platform dependent code whenever it notices an interface change.
There exist two types of interface updates -- synchronous and asynchronous ones. In the synchronous case, the low-level code calls if_start_update(), scans all interfaces reported by the OS, uses if_update() and ifa_update() to pass them to the core and then it finishes the update sequence by calling if_end_update(). When working asynchronously, the sysdep code calls if_update() and ifa_update() whenever it notices a change.
if_update() will automatically notify all other modules about the change.
void if_feed_baby (struct proto * p) -- advertise interfaces to a new protocol
protocol to feed
When a new protocol starts, this function sends it a series of notifications about all existing interfaces.
struct iface * if_find_by_index (unsigned idx) -- find interface by ifindex
ifindex
This function finds an iface structure corresponding to an interface of the given index idx. Returns a pointer to the structure or NULL if no such structure exists.
struct iface * if_find_by_name (char * name) -- find interface by name
interface name
This function finds an iface structure corresponding to an interface of the given name name. Returns a pointer to the structure or NULL if no such structure exists.
struct ifa * ifa_update (struct ifa * a) -- update interface address
new interface address
This function adds address information to a network interface. It's called by the platform dependent code during the interface update process described under if_update().
void ifa_delete (struct ifa * a) -- remove interface address
interface address
This function removes address information from a network interface. It's called by the platform dependent code during the interface update process described under if_update().
void if_init (void) -- initialize interface module
This function is called during BIRD startup to initialize all data structures of the interface module.
Most routing protocols need to associate their internal state data with neighboring routers, check whether an address given as the next hop attribute of a route is really an address of a directly connected host and which interface is it connected through. Also, they often need to be notified when a neighbor ceases to exist or when their long awaited neighbor becomes connected. The neighbor cache is there to solve all these problems.
The neighbor cache maintains a collection of neighbor entries. Each entry represents one IP address corresponding to either our directly connected neighbor or our own end of the link (when the scope of the address is set to SCOPE_HOST) together with per-neighbor data belonging to a single protocol.
Active entries represent known neighbors and are stored in a hash table (to allow fast retrieval based on the IP address of the node) and two linked lists: one global and one per-interface (allowing quick processing of interface change events). Inactive entries exist only when the protocol has explicitly requested it via the NEF_STICKY flag because it wishes to be notified when the node will again become a neighbor. Such entries are enqueued in a special list which is walked whenever an interface changes its state to up. Neighbor entry VRF association is implied by respective protocol.
When a neighbor event occurs (a neighbor gets disconnected or a sticky inactive neighbor becomes connected), the protocol hook neigh_notify() is called to advertise the change.
neighbor * neigh_find (struct proto * p, ip_addr * a, unsigned flags) -- find or create a neighbor entry.
protocol which asks for the entry.
pointer to IP address of the node to be searched for.
0 or NEF_STICKY if you want to create a sticky entry.
Search the neighbor cache for a node with given IP address. If it's found, a pointer to the neighbor entry is returned. If no such entry exists and the node is directly connected on one of our active interfaces, a new entry is created and returned to the caller with protocol-dependent fields initialized to zero. If the node is not connected directly or *a is not a valid unicast IP address, neigh_find() returns NULL.
void neigh_dump (neighbor * n) -- dump specified neighbor entry.
the entry to dump
This functions dumps the contents of a given neighbor entry to debug output.
void neigh_dump_all (void) -- dump all neighbor entries.
This function dumps the contents of the neighbor cache to debug output.
void neigh_if_up (struct iface * i)
interface in question
Tell the neighbor cache that a new interface became up.
The neighbor cache wakes up all inactive sticky neighbors with addresses belonging to prefixes of the interface i.
void neigh_if_down (struct iface * i) -- notify neighbor cache about interface down event
the interface in question
Notify the neighbor cache that an interface has ceased to exist.
It causes all entries belonging to neighbors connected to this interface to be flushed.
void neigh_if_link (struct iface * i) -- notify neighbor cache about interface link change
the interface in question
Notify the neighbor cache that an interface changed link state. All owners of neighbor entries connected to this interface are notified.
void neigh_ifa_update (struct ifa * a)
interface address in question
Tell the neighbor cache that an address was added or removed.
The neighbor cache wakes up all inactive sticky neighbors with addresses belonging to prefixes of the interface belonging to ifa and causes all unreachable neighbors to be flushed.
void neigh_prune (void) -- prune neighbor cache
neigh_prune() examines all neighbor entries cached and removes those corresponding to inactive protocols. It's called whenever a protocol is shut down to get rid of all its heritage.
void neigh_init (pool * if_pool) -- initialize the neighbor cache.
resource pool to be used for neighbor entries.
This function is called during BIRD startup to initialize the neighbor cache module.
This module takes care of the BIRD's command-line interface (CLI). The CLI exists to provide a way to control BIRD remotely and to inspect its status. It uses a very simple textual protocol over a stream connection provided by the platform dependent code (on UNIX systems, it's a UNIX domain socket).
Each session of the CLI consists of a sequence of request and replies, slightly resembling the FTP and SMTP protocols. Requests are commands encoded as a single line of text, replies are sequences of lines starting with a four-digit code followed by either a space (if it's the last line of the reply) or a minus sign (when the reply is going to continue with the next line), the rest of the line contains a textual message semantics of which depends on the numeric code. If a reply line has the same code as the previous one and it's a continuation line, the whole prefix can be replaced by a single white space character.
Reply codes starting with 0 stand for `action successfully completed' messages, 1 means `table entry', 8 `runtime error' and 9 `syntax error'.
Each CLI session is internally represented by a cli structure and a resource pool containing all resources associated with the connection, so that it can be easily freed whenever the connection gets closed, not depending on the current state of command processing.
The CLI commands are declared as a part of the configuration grammar
by using the CF_CLI
macro. When a command is received, it is processed
by the same lexical analyzer and parser as used for the configuration, but
it's switched to a special mode by prepending a fake token to the text,
so that it uses only the CLI command rules. Then the parser invokes
an execution routine corresponding to the command, which either constructs
the whole reply and returns it back or (in case it expects the reply will be long)
it prints a partial reply and asks the CLI module (using the cont hook)
to call it again when the output is transferred to the user.
The this_cli variable points to a cli structure of the session being currently parsed, but it's of course available only in command handlers not entered using the cont hook.
TX buffer management works as follows: At cli.tx_buf there is a list of TX buffers (struct cli_out), cli.tx_write is the buffer currently used by the producer (cli_printf(), cli_alloc_out()) and cli.tx_pos is the buffer currently used by the consumer (cli_write(), in system dependent code). The producer uses cli_out.wpos ptr as the current write position and the consumer uses cli_out.outpos ptr as the current read position. When the producer produces something, it calls cli_write_trigger(). If there is not enough space in the current buffer, the producer allocates the new one. When the consumer processes everything in the buffer queue, it calls cli_written(), tha frees all buffers (except the first one) and schedules cli.event .
void cli_printf (cli * c, int code, char * msg, ... ...) -- send reply to a CLI connection
CLI connection
numeric code of the reply, negative for continuation lines
a printf()-like formatting string.
variable arguments
This function send a single line of reply to a given CLI connection. In works in all aspects like bsprintf() except that it automatically prepends the reply line prefix.
Please note that if the connection can be already busy sending some data in which case cli_printf() stores the output to a temporary buffer, so please avoid sending a large batch of replies without waiting for the buffers to be flushed.
If you want to write to the current CLI output, you can use the cli_msg() macro instead.
void cli_init (void) -- initialize the CLI module
This function is called during BIRD startup to initialize the internal data structures of the CLI module.
The lock module provides a simple mechanism for avoiding conflicts between various protocols which would like to use a single physical resource (for example a network port). It would be easy to say that such collisions can occur only when the user specifies an invalid configuration and therefore he deserves to get what he has asked for, but unfortunately they can also arise legitimately when the daemon is reconfigured and there exists (although for a short time period only) an old protocol instance being shut down and a new one willing to start up on the same interface.
The solution is very simple: when any protocol wishes to use a network port or some other non-shareable resource, it asks the core to lock it and it doesn't use the resource until it's notified that it has acquired the lock.
Object locks are represented by object_lock structures which are in turn a kind of resource. Lockable resources are uniquely determined by resource type (OBJLOCK_UDP for a UDP port etc.), IP address (usually a broadcast or multicast address the port is bound to), port number, interface and optional instance ID.
struct object_lock * olock_new (pool * p) -- create an object lock
resource pool to create the lock in.
The olock_new() function creates a new resource of type object_lock and returns a pointer to it. After filling in the structure, the caller should call olock_acquire() to do the real locking.
void olock_acquire (struct object_lock * l) -- acquire a lock
the lock to acquire
This function attempts to acquire exclusive access to the non-shareable resource described by the lock l. It returns immediately, but as soon as the resource becomes available, it calls the hook() function set up by the caller.
When you want to release the resource, just rfree() the lock.
void olock_init (void) -- initialize the object lock mechanism
This function is called during BIRD startup. It initializes all the internal data structures of the lock module.