Annotation of embedaddon/libiconv/DESIGN, revision 1.1

1.1     ! misho       1: While some other iconv(3) implementations - like FreeBSD iconv(3) - choose
        !             2: the "many small shared libraries" and dlopen(3) approach, this implementation
        !             3: packs everything into a single shared library. Here is a comparison of the
        !             4: two designs.
        !             5: 
        !             6: * Run-time efficiency
        !             7:   1. A dlopen() based approach needs a cache of loaded shared libraries.
        !             8:   Otherwise, every iconv_open() call will result in a call to dlopen()
        !             9:   and thus to file system related system calls - which is prohibitive
        !            10:   because some applications use the iconv_open/iconv/iconv_close sequence
        !            11:   for every single filename, string, or piece of text.
        !            12:   2. In terms of virtual memory use, both approaches are on par. Being shared
        !            13:   libraries, the tables are shared between any processes that use them.
        !            14:   And because of the demand loading used by Unix systems (and because libiconv
        !            15:   does not have initialization functions), only those parts of the tables
        !            16:   which are needed (typically very few kilobytes) will be read from disk and
        !            17:   paged into main memory.
        !            18:   3. Even with a cache of loaded shared libraries, the dlopen() based approach
        !            19:   makes more system calls, because it has to load one or two shared libraries
        !            20:   for every encoding in use.
        !            21: 
        !            22: * Total size
        !            23:   In the dlopen(3) approach, every shared library has a symbol table and
        !            24:   relocation offset. All together, FreeBSD iconv installs more than 200 shared
        !            25:   libraries with a total size of 2.3 MB. Whereas libiconv installs 0.45 MB.
        !            26: 
        !            27: * Extensibility
        !            28:   The dlopen(3) approach is good for guaranteeing extensibility if the iconv
        !            29:   implementation is distributed without source. (Or when, as in glibc, you
        !            30:   cannot rebuild iconv without rebuilding your libc, thus possibly
        !            31:   destabilizing your system.)
        !            32:   The libiconv package achieves extensibility through the LGPL license:
        !            33:   Every user has access to the source of the package and can extend and
        !            34:   replace just libiconv.so.
        !            35:   The places which have to be modified when a new encoding is added are as
        !            36:   follows: add an #include statement in iconv.c, add an entry in the table in
        !            37:   iconv.c, and of course, update the README and iconv_open.3 manual page.
        !            38: 
        !            39: * Use within other packages
        !            40:   If you want to incorporate an iconv implementation into another package
        !            41:   (such as a mail user agent or web browser), the single library approach
        !            42:   is easier, because:
        !            43:   1. In the shared library approach you have to provide the right directory
        !            44:   prefix which will be used at run time.
        !            45:   2. Incorporating iconv as a static library into the executable is easy -
        !            46:   it won't need dynamic loading. (This assumes that your package is under
        !            47:   the LGPL or GPL license.)
        !            48: 
        !            49: 
        !            50: All conversions go through Unicode. This is possible because most of the
        !            51: world's characters have already been allocated in the Unicode standard.
        !            52: Therefore we have for each encoding two functions:
        !            53: - For conversion from the encoding to Unicode, a function called xxx_mbtowc.
        !            54: - For conversion from Unicode to the encoding, a function called xxx_wctomb,
        !            55:   and for stateful encodings, a function called xxx_reset which returns to
        !            56:   the initial shift state.
        !            57: 
        !            58: 
        !            59: All our functions operate on a single Unicode character at a time. This is
        !            60: obviously less efficient than operating on an entire buffer of characters at
        !            61: a time, but it makes the coding considerably easier and less bug-prone. Those
        !            62: who wish best performance should install the Real Thing (TM): GNU libc 2.1
        !            63: or newer.
        !            64: 

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>