File:  [ELWIX - Embedded LightWeight unIX -] / embedaddon / curl / docs / INTERNALS.md
Revision 1.1.1.1 (vendor branch): download - view: text, annotated - select for diffs - revision graph
Wed Jun 3 10:01:15 2020 UTC (4 years, 10 months ago) by misho
Branches: curl, MAIN
CVS tags: v7_70_0p4, HEAD
curl

    1: curl internals
    2: ==============
    3: 
    4:  - [Intro](#intro)
    5:  - [git](#git)
    6:  - [Portability](#Portability)
    7:  - [Windows vs Unix](#winvsunix)
    8:  - [Library](#Library)
    9:    - [`Curl_connect`](#Curl_connect)
   10:    - [`multi_do`](#multi_do)
   11:    - [`Curl_readwrite`](#Curl_readwrite)
   12:    - [`multi_done`](#multi_done)
   13:    - [`Curl_disconnect`](#Curl_disconnect)
   14:  - [HTTP(S)](#http)
   15:  - [FTP](#ftp)
   16:  - [Kerberos](#kerberos)
   17:  - [TELNET](#telnet)
   18:  - [FILE](#file)
   19:  - [SMB](#smb)
   20:  - [LDAP](#ldap)
   21:  - [E-mail](#email)
   22:  - [General](#general)
   23:  - [Persistent Connections](#persistent)
   24:  - [multi interface/non-blocking](#multi)
   25:  - [SSL libraries](#ssl)
   26:  - [Library Symbols](#symbols)
   27:  - [Return Codes and Informationals](#returncodes)
   28:  - [AP/ABI](#abi)
   29:  - [Client](#client)
   30:  - [Memory Debugging](#memorydebug)
   31:  - [Test Suite](#test)
   32:  - [Asynchronous name resolves](#asyncdns)
   33:    - [c-ares](#cares)
   34:  - [`curl_off_t`](#curl_off_t)
   35:  - [curlx](#curlx)
   36:  - [Content Encoding](#contentencoding)
   37:  - [`hostip.c` explained](#hostip)
   38:  - [Track Down Memory Leaks](#memoryleak)
   39:  - [`multi_socket`](#multi_socket)
   40:  - [Structs in libcurl](#structs)
   41:    - [Curl_easy](#Curl_easy)
   42:    - [connectdata](#connectdata)
   43:    - [Curl_multi](#Curl_multi)
   44:    - [Curl_handler](#Curl_handler)
   45:    - [conncache](#conncache)
   46:    - [Curl_share](#Curl_share)
   47:    - [CookieInfo](#CookieInfo)
   48: 
   49: <a name="intro"></a>
   50: Intro
   51: =====
   52: 
   53:  This project is split in two. The library and the client. The client part
   54:  uses the library, but the library is designed to allow other applications to
   55:  use it.
   56: 
   57:  The largest amount of code and complexity is in the library part.
   58: 
   59: 
   60: <a name="git"></a>
   61: git
   62: ===
   63: 
   64:  All changes to the sources are committed to the git repository as soon as
   65:  they're somewhat verified to work. Changes shall be committed as independently
   66:  as possible so that individual changes can be easily spotted and tracked
   67:  afterwards.
   68: 
   69:  Tagging shall be used extensively, and by the time we release new archives we
   70:  should tag the sources with a name similar to the released version number.
   71: 
   72: <a name="Portability"></a>
   73: Portability
   74: ===========
   75: 
   76:  We write curl and libcurl to compile with C89 compilers.  On 32-bit and up
   77:  machines. Most of libcurl assumes more or less POSIX compliance but that's
   78:  not a requirement.
   79: 
   80:  We write libcurl to build and work with lots of third party tools, and we
   81:  want it to remain functional and buildable with these and later versions
   82:  (older versions may still work but is not what we work hard to maintain):
   83: 
   84: Dependencies
   85: ------------
   86: 
   87:  - OpenSSL      0.9.7
   88:  - GnuTLS       3.1.10
   89:  - zlib         1.1.4
   90:  - libssh2      0.16
   91:  - c-ares       1.6.0
   92:  - libidn2      2.0.0
   93:  - wolfSSL      2.0.0
   94:  - openldap     2.0
   95:  - MIT Kerberos 1.2.4
   96:  - GSKit        V5R3M0
   97:  - NSS          3.14.x
   98:  - Heimdal      ?
   99:  - nghttp2      1.12.0
  100: 
  101: Operating Systems
  102: -----------------
  103: 
  104:  On systems where configure runs, we aim at working on them all - if they have
  105:  a suitable C compiler. On systems that don't run configure, we strive to keep
  106:  curl running correctly on:
  107: 
  108:  - Windows      98
  109:  - AS/400       V5R3M0
  110:  - Symbian      9.1
  111:  - Windows CE   ?
  112:  - TPF          ?
  113: 
  114: Build tools
  115: -----------
  116: 
  117:  When writing code (mostly for generating stuff included in release tarballs)
  118:  we use a few "build tools" and we make sure that we remain functional with
  119:  these versions:
  120: 
  121:  - GNU Libtool  1.4.2
  122:  - GNU Autoconf 2.57
  123:  - GNU Automake 1.7
  124:  - GNU M4       1.4
  125:  - perl         5.004
  126:  - roffit       0.5
  127:  - groff        ? (any version that supports `groff -Tps -man [in] [out]`)
  128:  - ps2pdf (gs)  ?
  129: 
  130: <a name="winvsunix"></a>
  131: Windows vs Unix
  132: ===============
  133: 
  134:  There are a few differences in how to program curl the Unix way compared to
  135:  the Windows way. Perhaps the four most notable details are:
  136: 
  137:  1. Different function names for socket operations.
  138: 
  139:    In curl, this is solved with defines and macros, so that the source looks
  140:    the same in all places except for the header file that defines them. The
  141:    macros in use are `sclose()`, `sread()` and `swrite()`.
  142: 
  143:  2. Windows requires a couple of init calls for the socket stuff.
  144: 
  145:    That's taken care of by the `curl_global_init()` call, but if other libs
  146:    also do it etc there might be reasons for applications to alter that
  147:    behaviour.
  148: 
  149:  3. The file descriptors for network communication and file operations are
  150:     not as easily interchangeable as in Unix.
  151: 
  152:    We avoid this by not trying any funny tricks on file descriptors.
  153: 
  154:  4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
  155:     destroying binary data, although you do want that conversion if it is
  156:     text coming through... (sigh)
  157: 
  158:    We set stdout to binary under windows
  159: 
  160:  Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
  161:  conditionals that deal with features *should* instead be in the format
  162:  `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
  163:  we maintain a `curl_config-win32.h` file in lib directory that is supposed to
  164:  look exactly like a `curl_config.h` file would have looked like on a Windows
  165:  machine!
  166: 
  167:  Generally speaking: always remember that this will be compiled on dozens of
  168:  operating systems. Don't walk on the edge!
  169: 
  170: <a name="Library"></a>
  171: Library
  172: =======
  173: 
  174:  (See [Structs in libcurl](#structs) for the separate section describing all
  175:  major internal structs and their purposes.)
  176: 
  177:  There are plenty of entry points to the library, namely each publicly defined
  178:  function that libcurl offers to applications. All of those functions are
  179:  rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
  180:  put in the `lib/easy.c` file.
  181: 
  182:  `curl_global_init()` and `curl_global_cleanup()` should be called by the
  183:  application to initialize and clean up global stuff in the library. As of
  184:  today, it can handle the global SSL initing if SSL is enabled and it can init
  185:  the socket layer on windows machines. libcurl itself has no "global" scope.
  186: 
  187:  All printf()-style functions use the supplied clones in `lib/mprintf.c`. This
  188:  makes sure we stay absolutely platform independent.
  189: 
  190:  [ `curl_easy_init()`][2] allocates an internal struct and makes some
  191:  initializations.  The returned handle does not reveal internals. This is the
  192:  `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
  193:  functions. All connections performed will get connect-specific data allocated
  194:  that should be used for things related to particular connections/requests.
  195: 
  196:  [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
  197:  be passed in pairs: the parameter-ID and the parameter-value. The list of
  198:  options is documented in the man page. This function mainly sets things in
  199:  the `Curl_easy` struct.
  200: 
  201:  `curl_easy_perform()` is just a wrapper function that makes use of the multi
  202:  API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
  203:  `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
  204:  and then returns.
  205: 
  206:  Some of the most important key functions in `url.c` are called from
  207:  `multi.c` when certain key steps are to be made in the transfer operation.
  208: 
  209: <a name="Curl_connect"></a>
  210: Curl_connect()
  211: --------------
  212: 
  213:    Analyzes the URL, it separates the different components and connects to the
  214:    remote host. This may involve using a proxy and/or using SSL. The
  215:    `Curl_resolv()` function in `lib/hostip.c` is used for looking up host
  216:    names (it does then use the proper underlying method, which may vary
  217:    between platforms and builds).
  218: 
  219:    When `Curl_connect` is done, we are connected to the remote site. Then it
  220:    is time to tell the server to get a document/file. `Curl_do()` arranges
  221:    this.
  222: 
  223:    This function makes sure there's an allocated and initiated `connectdata`
  224:    struct that is used for this particular connection only (although there may
  225:    be several requests performed on the same connect). A bunch of things are
  226:    inited/inherited from the `Curl_easy` struct.
  227: 
  228: <a name="multi_do"></a>
  229: multi_do()
  230: ---------
  231: 
  232:    `multi_do()` makes sure the proper protocol-specific function is called.
  233:    The functions are named after the protocols they handle.
  234: 
  235:    The protocol-specific functions of course deal with protocol-specific
  236:    negotiations and setup. They have access to the `Curl_sendf()` (from
  237:    `lib/sendf.c`) function to send printf-style formatted data to the remote
  238:    host and when they're ready to make the actual file transfer they call the
  239:    `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the
  240:    transfer and returns.
  241: 
  242:    If this DO function fails and the connection is being re-used, libcurl will
  243:    then close this connection, setup a new connection and re-issue the DO
  244:    request on that. This is because there is no way to be perfectly sure that
  245:    we have discovered a dead connection before the DO function and thus we
  246:    might wrongly be re-using a connection that was closed by the remote peer.
  247: 
  248: <a name="Curl_readwrite"></a>
  249: Curl_readwrite()
  250: ----------------
  251: 
  252:    Called during the transfer of the actual protocol payload.
  253: 
  254:    During transfer, the progress functions in `lib/progress.c` are called at
  255:    frequent intervals (or at the user's choice, a specified callback might get
  256:    called). The speedcheck functions in `lib/speedcheck.c` are also used to
  257:    verify that the transfer is as fast as required.
  258: 
  259: <a name="multi_done"></a>
  260: multi_done()
  261: -----------
  262: 
  263:    Called after a transfer is done. This function takes care of everything
  264:    that has to be done after a transfer. This function attempts to leave
  265:    matters in a state so that `multi_do()` should be possible to call again on
  266:    the same connection (in a persistent connection case). It might also soon
  267:    be closed with `Curl_disconnect()`.
  268: 
  269: <a name="Curl_disconnect"></a>
  270: Curl_disconnect()
  271: -----------------
  272: 
  273:    When doing normal connections and transfers, no one ever tries to close any
  274:    connections so this is not normally called when `curl_easy_perform()` is
  275:    used. This function is only used when we are certain that no more transfers
  276:    are going to be made on the connection. It can be also closed by force, or
  277:    it can be called to make sure that libcurl doesn't keep too many
  278:    connections alive at the same time.
  279: 
  280:    This function cleans up all resources that are associated with a single
  281:    connection.
  282: 
  283: <a name="http"></a>
  284: HTTP(S)
  285: =======
  286: 
  287:  HTTP offers a lot and is the protocol in curl that uses the most lines of
  288:  code. There is a special file `lib/formdata.c` that offers all the
  289:  multipart post functions.
  290: 
  291:  base64-functions for user+password stuff (and more) is in `lib/base64.c`
  292:  and all functions for parsing and sending cookies are found in
  293:  `lib/cookie.c`.
  294: 
  295:  HTTPS uses in almost every case the same procedure as HTTP, with only two
  296:  exceptions: the connect procedure is different and the function used to read
  297:  or write from the socket is different, although the latter fact is hidden in
  298:  the source by the use of `Curl_read()` for reading and `Curl_write()` for
  299:  writing data to the remote server.
  300: 
  301:  `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
  302:  encoding.
  303: 
  304:  An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
  305:  series of functions we use. They append data to one single buffer, and when
  306:  the building is finished the entire request is sent off in one single write.
  307:  This is done this way to overcome problems with flawed firewalls and lame
  308:  servers.
  309: 
  310: <a name="ftp"></a>
  311: FTP
  312: ===
  313: 
  314:  The `Curl_if2ip()` function can be used for getting the IP number of a
  315:  specified network interface, and it resides in `lib/if2ip.c`.
  316: 
  317:  `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
  318:  was made a separate function to prevent us programmers from forgetting that
  319:  they must be CRLF terminated. They must also be sent in one single `write()`
  320:  to make firewalls and similar happy.
  321: 
  322: <a name="kerberos"></a>
  323: Kerberos
  324: ========
  325: 
  326:  Kerberos support is mainly in `lib/krb5.c` and `lib/security.c` but also
  327:  `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
  328:  `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
  329: 
  330: <a name="telnet"></a>
  331: TELNET
  332: ======
  333: 
  334:  Telnet is implemented in `lib/telnet.c`.
  335: 
  336: <a name="file"></a>
  337: FILE
  338: ====
  339: 
  340:  The `file://` protocol is dealt with in `lib/file.c`.
  341: 
  342: <a name="smb"></a>
  343: SMB
  344: ===
  345: 
  346:  The `smb://` protocol is dealt with in `lib/smb.c`.
  347: 
  348: <a name="ldap"></a>
  349: LDAP
  350: ====
  351: 
  352:  Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`.
  353: 
  354: <a name="email"></a>
  355: E-mail
  356: ======
  357: 
  358:  The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and
  359:  `lib/smtp.c`.
  360: 
  361: <a name="general"></a>
  362: General
  363: =======
  364: 
  365:  URL encoding and decoding, called escaping and unescaping in the source code,
  366:  is found in `lib/escape.c`.
  367: 
  368:  While transferring data in `Transfer()` a few functions might get used.
  369:  `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and
  370:  more).
  371: 
  372:  `lib/getenv.c` offers `curl_getenv()` which is for reading environment
  373:  variables in a neat platform independent way. That's used in the client, but
  374:  also in `lib/url.c` when checking the proxy environment variables. Note that
  375:  contrary to the normal unix `getenv()`, this returns an allocated buffer that
  376:  must be `free()`ed after use.
  377: 
  378:  `lib/netrc.c` holds the `.netrc` parser.
  379: 
  380:  `lib/timeval.c` features replacement functions for systems that don't have
  381:  `gettimeofday()` and a few support functions for timeval conversions.
  382: 
  383:  A function named `curl_version()` that returns the full curl version string
  384:  is found in `lib/version.c`.
  385: 
  386: <a name="persistent"></a>
  387: Persistent Connections
  388: ======================
  389: 
  390:  The persistent connection support in libcurl requires some considerations on
  391:  how to do things inside of the library.
  392: 
  393:  - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
  394:    must never hold connection-oriented data. It is meant to hold the root data
  395:    as well as all the options etc that the library-user may choose.
  396: 
  397:  - The `Curl_easy` struct holds the "connection cache" (an array of
  398:    pointers to `connectdata` structs).
  399: 
  400:  - This enables the 'curl handle' to be reused on subsequent transfers.
  401: 
  402:  - When libcurl is told to perform a transfer, it first checks for an already
  403:    existing connection in the cache that we can use. Otherwise it creates a
  404:    new one and adds that to the cache. If the cache is full already when a new
  405:    connection is added, it will first close the oldest unused one.
  406: 
  407:  - When the transfer operation is complete, the connection is left
  408:    open. Particular options may tell libcurl not to, and protocols may signal
  409:    closure on connections and then they won't be kept open, of course.
  410: 
  411:  - When `curl_easy_cleanup()` is called, we close all still opened connections,
  412:    unless of course the multi interface "owns" the connections.
  413: 
  414:  The curl handle must be re-used in order for the persistent connections to
  415:  work.
  416: 
  417: <a name="multi"></a>
  418: multi interface/non-blocking
  419: ============================
  420: 
  421:  The multi interface is a non-blocking interface to the library. To make that
  422:  interface work as well as possible, no low-level functions within libcurl
  423:  must be written to work in a blocking manner. (There are still a few spots
  424:  violating this rule.)
  425: 
  426:  One of the primary reasons we introduced c-ares support was to allow the name
  427:  resolve phase to be perfectly non-blocking as well.
  428: 
  429:  The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
  430:  the code to allow non-blocking operations even on multi-stage command-
  431:  response protocols. They are built around state machines that return when
  432:  they would otherwise block waiting for data.  The DICT, LDAP and TELNET
  433:  protocols are crappy examples and they are subject for rewrite in the future
  434:  to better fit the libcurl protocol family.
  435: 
  436: <a name="ssl"></a>
  437: SSL libraries
  438: =============
  439: 
  440:  Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
  441:  extended to its successor OpenSSL but has since also been extended to several
  442:  other SSL/TLS libraries and we expect and hope to further extend the support
  443:  in future libcurl versions.
  444: 
  445:  To deal with this internally in the best way possible, we have a generic SSL
  446:  function API as provided by the `vtls/vtls.[ch]` system, and they are the only
  447:  SSL functions we must use from within libcurl. vtls is then crafted to use
  448:  the appropriate lower-level function calls to whatever SSL library that is in
  449:  use. For example `vtls/openssl.[ch]` for the OpenSSL library.
  450: 
  451: <a name="symbols"></a>
  452: Library Symbols
  453: ===============
  454: 
  455:  All symbols used internally in libcurl must use a `Curl_` prefix if they're
  456:  used in more than a single file. Single-file symbols must be made static.
  457:  Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
  458:  but they are to be changed to follow this pattern in future versions.) Public
  459:  API functions are marked with `CURL_EXTERN` in the public header files so
  460:  that all others can be hidden on platforms where this is possible.
  461: 
  462: <a name="returncodes"></a>
  463: Return Codes and Informationals
  464: ===============================
  465: 
  466:  I've made things simple. Almost every function in libcurl returns a CURLcode,
  467:  that must be `CURLE_OK` if everything is OK or otherwise a suitable error
  468:  code as the `curl/curl.h` include file defines. The very spot that detects an
  469:  error must use the `Curl_failf()` function to set the human-readable error
  470:  description.
  471: 
  472:  In aiding the user to understand what's happening and to debug curl usage, we
  473:  must supply a fair number of informational messages by using the
  474:  `Curl_infof()` function. Those messages are only displayed when the user
  475:  explicitly asks for them. They are best used when revealing information that
  476:  isn't otherwise obvious.
  477: 
  478: <a name="abi"></a>
  479: API/ABI
  480: =======
  481: 
  482:  We make an effort to not export or show internals or how internals work, as
  483:  that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
  484:  for our promise to users.
  485: 
  486: <a name="client"></a>
  487: Client
  488: ======
  489: 
  490:  `main()` resides in `src/tool_main.c`.
  491: 
  492:  `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl
  493:  script to display the complete "manual" and the `src/tool_urlglob.c` file
  494:  holds the functions used for the URL-"globbing" support. Globbing in the
  495:  sense that the `{}` and `[]` expansion stuff is there.
  496: 
  497:  The client mostly sets up its `config` struct properly, then
  498:  it calls the `curl_easy_*()` functions of the library and when it gets back
  499:  control after the `curl_easy_perform()` it cleans up the library, checks
  500:  status and exits.
  501: 
  502:  When the operation is done, the `ourWriteOut()` function in `src/writeout.c`
  503:  may be called to report about the operation. That function is using the
  504:  `curl_easy_getinfo()` function to extract useful information from the curl
  505:  session.
  506: 
  507:  It may loop and do all this several times if many URLs were specified on the
  508:  command line or config file.
  509: 
  510: <a name="memorydebug"></a>
  511: Memory Debugging
  512: ================
  513: 
  514:  The file `lib/memdebug.c` contains debug-versions of a few functions.
  515:  Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that
  516:  somehow deal with resources that might give us problems if we "leak" them.
  517:  The functions in the memdebug system do nothing fancy, they do their normal
  518:  function and then log information about what they just did. The logged data
  519:  can then be analyzed after a complete session,
  520: 
  521:  `memanalyze.pl` is the perl script present in `tests/` that analyzes a log
  522:  file generated by the memory tracking system. It detects if resources are
  523:  allocated but never freed and other kinds of errors related to resource
  524:  management.
  525: 
  526:  Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code
  527:  which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is
  528:  used to differentiate code which is _only_ used for memory
  529:  tracking/debugging.
  530: 
  531:  Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also
  532:  switched on by running configure with `--enable-curldebug`. Use
  533:  `-DDEBUGBUILD` when compiling to enable a debug build or run configure with
  534:  `--enable-debug`.
  535: 
  536:  `curl --version` will list 'Debug' feature for debug enabled builds, and
  537:  will list 'TrackMemory' feature for curl debug memory tracking capable
  538:  builds. These features are independent and can be controlled when running
  539:  the configure script. When `--enable-debug` is given both features will be
  540:  enabled, unless some restriction prevents memory tracking from being used.
  541: 
  542: <a name="test"></a>
  543: Test Suite
  544: ==========
  545: 
  546:  The test suite is placed in its own subdirectory directly off the root in the
  547:  curl archive tree, and it contains a bunch of scripts and a lot of test case
  548:  data.
  549: 
  550:  The main test script is `runtests.pl` that will invoke test servers like
  551:  `httpserver.pl` and `ftpserver.pl` before all the test cases are performed.
  552:  The test suite currently only runs on Unix-like platforms.
  553: 
  554:  You'll find a description of the test suite in the `tests/README` file, and
  555:  the test case data files in the `tests/FILEFORMAT` file.
  556: 
  557:  The test suite automatically detects if curl was built with the memory
  558:  debugging enabled, and if it was, it will detect memory leaks, too.
  559: 
  560: <a name="asyncdns"></a>
  561: Asynchronous name resolves
  562: ==========================
  563: 
  564:  libcurl can be built to do name resolves asynchronously, using either the
  565:  normal resolver in a threaded manner or by using c-ares.
  566: 
  567: <a name="cares"></a>
  568: [c-ares][3]
  569: ------
  570: 
  571: ### Build libcurl to use a c-ares
  572: 
  573: 1. ./configure --enable-ares=/path/to/ares/install
  574: 2. make
  575: 
  576: ### c-ares on win32
  577: 
  578:  First I compiled c-ares. I changed the default C runtime library to be the
  579:  single-threaded rather than the multi-threaded (this seems to be required to
  580:  prevent linking errors later on). Then I simply build the areslib project
  581:  (the other projects adig/ahost seem to fail under MSVC).
  582: 
  583:  Next was libcurl. I opened `lib/config-win32.h` and I added a:
  584:  `#define USE_ARES 1`
  585: 
  586:  Next thing I did was I added the path for the ares includes to the include
  587:  path, and the libares.lib to the libraries.
  588: 
  589:  Lastly, I also changed libcurl to be single-threaded rather than
  590:  multi-threaded, again this was to prevent some duplicate symbol errors. I'm
  591:  not sure why I needed to change everything to single-threaded, but when I
  592:  didn't I got redefinition errors for several CRT functions (`malloc()`,
  593:  `stricmp()`, etc.)
  594: 
  595: <a name="curl_off_t"></a>
  596: `curl_off_t`
  597: ==========
  598: 
  599:  `curl_off_t` is a data type provided by the external libcurl include
  600:  headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
  601:  options that end with LARGE. The type is 64-bit large on most modern
  602:  platforms.
  603: 
  604: <a name="curlx"></a>
  605: curlx
  606: =====
  607: 
  608:  The libcurl source code offers a few functions by source only. They are not
  609:  part of the official libcurl API, but the source files might be useful for
  610:  others so apps can optionally compile/build with these sources to gain
  611:  additional functions.
  612: 
  613:  We provide them through a single header file for easy access for apps:
  614:  `curlx.h`
  615: 
  616: `curlx_strtoofft()`
  617: -------------------
  618:    A macro that converts a string containing a number to a `curl_off_t` number.
  619:    This might use the `curlx_strtoll()` function which is provided as source
  620:    code in strtoofft.c. Note that the function is only provided if no
  621:    `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t`
  622:    is only a 32-bit number on your platform, this macro uses `strtol()`.
  623: 
  624: Future
  625: ------
  626: 
  627:  Several functions will be removed from the public `curl_` name space in a
  628:  future libcurl release. They will then only become available as `curlx_`
  629:  functions instead. To make the transition easier, we already today provide
  630:  these functions with the `curlx_` prefix to allow sources to be built
  631:  properly with the new function names. The concerned functions are:
  632: 
  633:  - `curlx_getenv`
  634:  - `curlx_strequal`
  635:  - `curlx_strnequal`
  636:  - `curlx_mvsnprintf`
  637:  - `curlx_msnprintf`
  638:  - `curlx_maprintf`
  639:  - `curlx_mvaprintf`
  640:  - `curlx_msprintf`
  641:  - `curlx_mprintf`
  642:  - `curlx_mfprintf`
  643:  - `curlx_mvsprintf`
  644:  - `curlx_mvprintf`
  645:  - `curlx_mvfprintf`
  646: 
  647: <a name="contentencoding"></a>
  648: Content Encoding
  649: ================
  650: 
  651: ## About content encodings
  652: 
  653:  [HTTP/1.1][4] specifies that a client may request that a server encode its
  654:  response. This is usually used to compress a response using one (or more)
  655:  encodings from a set of commonly available compression techniques. These
  656:  schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and
  657:  `compress`. A client requests that the server perform an encoding by including
  658:  an `Accept-Encoding` header in the request document. The value of the header
  659:  should be one of the recognized tokens `deflate`, ... (there's a way to
  660:  register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
  661:  the client's encoding request. When a response is encoded, the server
  662:  includes a `Content-Encoding` header in the response. The value of the
  663:  `Content-Encoding` header indicates which encodings were used to encode the
  664:  data, in the order in which they were applied.
  665: 
  666:  It's also possible for a client to attach priorities to different schemes so
  667:  that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
  668:  information on the `Accept-Encoding` header. See sec
  669:  [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding`
  670:  header.
  671: 
  672: ## Supported content encodings
  673: 
  674:  The `deflate`, `gzip` and `br` content encodings are supported by libcurl.
  675:  Both regular and chunked transfers work fine.  The zlib library is required
  676:  for the `deflate` and `gzip` encodings, while the brotli decoding library is
  677:  for the `br` encoding.
  678: 
  679: ## The libcurl interface
  680: 
  681:  To cause libcurl to request a content encoding use:
  682: 
  683:   [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
  684: 
  685:  where string is the intended value of the `Accept-Encoding` header.
  686: 
  687:  Currently, libcurl does support multiple encodings but only
  688:  understands how to process responses that use the `deflate`, `gzip` and/or
  689:  `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
  690:  that will work (besides `identity`, which does nothing) are `deflate`,
  691:  `gzip` and `br`. If a response is encoded using the `compress` or methods,
  692:  libcurl will return an error indicating that the response could
  693:  not be decoded.  If `<string>` is NULL no `Accept-Encoding` header is
  694:  generated. If `<string>` is a zero-length string, then an `Accept-Encoding`
  695:  header containing all supported encodings will be generated.
  696: 
  697:  The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
  698:  content to be automatically decoded.  If it is not set and the server still
  699:  sends encoded content (despite not having been asked), the data is returned
  700:  in its raw form and the `Content-Encoding` type is not checked.
  701: 
  702: ## The curl interface
  703: 
  704:  Use the [`--compressed`][6] option with curl to cause it to ask servers to
  705:  compress responses using any format supported by curl.
  706: 
  707: <a name="hostip"></a>
  708: `hostip.c` explained
  709: ====================
  710: 
  711:  The main compile-time defines to keep in mind when reading the `host*.c`
  712:  source file are these:
  713: 
  714: ## `CURLRES_IPV6`
  715: 
  716:  this host has `getaddrinfo()` and family, and thus we use that. The host may
  717:  not be able to resolve IPv6, but we don't really have to take that into
  718:  account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
  719: 
  720: ## `CURLRES_ARES`
  721: 
  722:  is defined if libcurl is built to use c-ares for asynchronous name
  723:  resolves. This can be Windows or \*nix.
  724: 
  725: ## `CURLRES_THREADED`
  726: 
  727:  is defined if libcurl is built to use threading for asynchronous name
  728:  resolves. The name resolve will be done in a new thread, and the supported
  729:  asynch API will be the same as for ares-builds. This is the default under
  730:  (native) Windows.
  731: 
  732:  If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
  733:  libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
  734:  defined.
  735: 
  736: ## `host*.c` sources
  737: 
  738:  The `host*.c` sources files are split up like this:
  739: 
  740:  - `hostip.c`      - method-independent resolver functions and utility functions
  741:  - `hostasyn.c`    - functions for asynchronous name resolves
  742:  - `hostsyn.c`     - functions for synchronous name resolves
  743:  - `asyn-ares.c`   - functions for asynchronous name resolves using c-ares
  744:  - `asyn-thread.c` - functions for asynchronous name resolves using threads
  745:  - `hostip4.c`     - IPv4 specific functions
  746:  - `hostip6.c`     - IPv6 specific functions
  747: 
  748:  The `hostip.h` is the single united header file for all this. It defines the
  749:  `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines.
  750: 
  751: <a name="memoryleak"></a>
  752: Track Down Memory Leaks
  753: =======================
  754: 
  755: ## Single-threaded
  756: 
  757:   Please note that this memory leak system is not adjusted to work in more
  758:   than one thread. If you want/need to use it in a multi-threaded app. Please
  759:   adjust accordingly.
  760: 
  761: ## Build
  762: 
  763:   Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with
  764:   `--enable-debug` fixes this). `make clean` first, then `make` so that all
  765:   files are actually rebuilt properly. It will also make sense to build
  766:   libcurl with the debug option (usually `-g` to the compiler) so that
  767:   debugging it will be easier if you actually do find a leak in the library.
  768: 
  769:   This will create a library that has memory debugging enabled.
  770: 
  771: ## Modify Your Application
  772: 
  773:   Add a line in your application code:
  774: 
  775:        `curl_dbg_memdebug("dump");`
  776: 
  777:   This will make the malloc debug system output a full trace of all resource
  778:   using functions to the given file name. Make sure you rebuild your program
  779:   and that you link with the same libcurl you built for this purpose as
  780:   described above.
  781: 
  782: ## Run Your Application
  783: 
  784:   Run your program as usual. Watch the specified memory trace file grow.
  785: 
  786:   Make your program exit and use the proper libcurl cleanup functions etc. So
  787:   that all non-leaks are returned/freed properly.
  788: 
  789: ## Analyze the Flow
  790: 
  791:   Use the `tests/memanalyze.pl` perl script to analyze the dump file:
  792: 
  793:     tests/memanalyze.pl dump
  794: 
  795:   This now outputs a report on what resources that were allocated but never
  796:   freed etc. This report is very fine for posting to the list!
  797: 
  798:   If this doesn't produce any output, no leak was detected in libcurl. Then
  799:   the leak is mostly likely to be in your code.
  800: 
  801: <a name="multi_socket"></a>
  802: `multi_socket`
  803: ==============
  804: 
  805:  Implementation of the `curl_multi_socket` API
  806: 
  807:  The main ideas of this API are simply:
  808: 
  809:  1. The application can use whatever event system it likes as it gets info
  810:     from libcurl about what file descriptors libcurl waits for what action
  811:     on. (The previous API returns `fd_sets` which is very
  812:     `select()`-centric).
  813: 
  814:  2. When the application discovers action on a single socket, it calls
  815:     libcurl and informs that there was action on this particular socket and
  816:     libcurl can then act on that socket/transfer only and not care about
  817:     any other transfers. (The previous API always had to scan through all
  818:     the existing transfers.)
  819: 
  820:  The idea is that [`curl_multi_socket_action()`][7] calls a given callback
  821:  with information about what socket to wait for what action on, and the
  822:  callback only gets called if the status of that socket has changed.
  823: 
  824:  We also added a timer callback that makes libcurl call the application when
  825:  the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
  826:  and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
  827:  Internally, there's an added struct to each easy handle in which we store
  828:  an "expire time" (if any). The structs are then "splay sorted" so that we
  829:  can add and remove times from the linked list and yet somewhat swiftly
  830:  figure out both how long there is until the next nearest timer expires
  831:  and which timer (handle) we should take care of now. Of course, the upside
  832:  of all this is that we get a [`curl_multi_timeout()`][8] that should also
  833:  work with old-style applications that use [`curl_multi_perform()`][11].
  834: 
  835:  We created an internal "socket to easy handles" hash table that given
  836:  a socket (file descriptor) returns the easy handle that waits for action on
  837:  that socket.  This hash is made using the already existing hash code
  838:  (previously only used for the DNS cache).
  839: 
  840:  To make libcurl able to report plain sockets in the socket callback, we had
  841:  to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
  842:  the conversion from sockets to `fd_sets` for that function is only done in
  843:  the last step before the data is returned. I also had to extend c-ares to
  844:  get a function that can return plain sockets, as that library too returned
  845:  only `fd_sets` and that is no longer good enough. The changes done to c-ares
  846:  are available in c-ares 1.3.1 and later.
  847: 
  848: <a name="structs"></a>
  849: Structs in libcurl
  850: ==================
  851: 
  852: This section should cover 7.32.0 pretty accurately, but will make sense even
  853: for older and later versions as things don't change drastically that often.
  854: 
  855: <a name="Curl_easy"></a>
  856: ## Curl_easy
  857: 
  858:   The `Curl_easy` struct is the one returned to the outside in the external API
  859:   as a `CURL *`. This is usually known as an easy handle in API documentations
  860:   and examples.
  861: 
  862:   Information and state that is related to the actual connection is in the
  863:   `connectdata` struct. When a transfer is about to be made, libcurl will
  864:   either create a new connection or re-use an existing one. The particular
  865:   connectdata that is used by this handle is pointed out by
  866:   `Curl_easy->easy_conn`.
  867: 
  868:   Data and information that regard this particular single transfer is put in
  869:   the `SingleRequest` sub-struct.
  870: 
  871:   When the `Curl_easy` struct is added to a multi handle, as it must be in
  872:   order to do any transfer, the `->multi` member will point to the `Curl_multi`
  873:   struct it belongs to. The `->prev` and `->next` members will then be used by
  874:   the multi code to keep a linked list of `Curl_easy` structs that are added to
  875:   that same multi handle. libcurl always uses multi so `->multi` *will* point
  876:   to a `Curl_multi` when a transfer is in progress.
  877: 
  878:   `->mstate` is the multi state of this particular `Curl_easy`. When
  879:   `multi_runsingle()` is called, it will act on this handle according to which
  880:   state it is in. The mstate is also what tells which sockets to return for a
  881:   specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
  882: 
  883:   The libcurl source code generally use the name `data` for the variable that
  884:   points to the `Curl_easy`.
  885: 
  886:   When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
  887:   an individual stream, sharing the same connectdata struct. Multiplexing
  888:   makes it even more important to keep things associated with the right thing!
  889: 
  890: <a name="connectdata"></a>
  891: ## connectdata
  892: 
  893:   A general idea in libcurl is to keep connections around in a connection
  894:   "cache" after they have been used in case they will be used again and then
  895:   re-use an existing one instead of creating a new as it creates a significant
  896:   performance boost.
  897: 
  898:   Each `connectdata` identifies a single physical connection to a server. If
  899:   the connection can't be kept alive, the connection will be closed after use
  900:   and then this struct can be removed from the cache and freed.
  901: 
  902:   Thus, the same `Curl_easy` can be used multiple times and each time select
  903:   another `connectdata` struct to use for the connection. Keep this in mind,
  904:   as it is then important to consider if options or choices are based on the
  905:   connection or the `Curl_easy`.
  906: 
  907:   Functions in libcurl will assume that `connectdata->data` points to the
  908:   `Curl_easy` that uses this connection (for the moment).
  909: 
  910:   As a special complexity, some protocols supported by libcurl require a
  911:   special disconnect procedure that is more than just shutting down the
  912:   socket. It can involve sending one or more commands to the server before
  913:   doing so. Since connections are kept in the connection cache after use, the
  914:   original `Curl_easy` may no longer be around when the time comes to shut down
  915:   a particular connection. For this purpose, libcurl holds a special dummy
  916:   `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
  917: 
  918:   FTP uses two TCP connections for a typical transfer but it keeps both in
  919:   this single struct and thus can be considered a single connection for most
  920:   internal concerns.
  921: 
  922:   The libcurl source code generally use the name `conn` for the variable that
  923:   points to the connectdata.
  924: 
  925: <a name="Curl_multi"></a>
  926: ## Curl_multi
  927: 
  928:   Internally, the easy interface is implemented as a wrapper around multi
  929:   interface functions. This makes everything multi interface.
  930: 
  931:   `Curl_multi` is the multi handle struct exposed as `CURLM *` in external
  932:   APIs.
  933: 
  934:   This struct holds a list of `Curl_easy` structs that have been added to this
  935:   handle with [`curl_multi_add_handle()`][13]. The start of the list is
  936:   `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
  937: 
  938:   `->msglist` is a linked list of messages to send back when
  939:   [`curl_multi_info_read()`][14] is called. Basically a node is added to that
  940:   list when an individual `Curl_easy`'s transfer has completed.
  941: 
  942:   `->hostcache` points to the name cache. It is a hash table for looking up
  943:   name to IP. The nodes have a limited life time in there and this cache is
  944:   meant to reduce the time for when the same name is wanted within a short
  945:   period of time.
  946: 
  947:   `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
  948:   until it should be checked - normally some sort of timeout. Each `Curl_easy`
  949:   has one node in the tree.
  950: 
  951:   `->sockhash` is a hash table to allow fast lookups of socket descriptor for
  952:   which `Curl_easy` uses that descriptor. This is necessary for the
  953:   `multi_socket` API.
  954: 
  955:   `->conn_cache` points to the connection cache. It keeps track of all
  956:   connections that are kept after use. The cache has a maximum size.
  957: 
  958:   `->closure_handle` is described in the `connectdata` section.
  959: 
  960:   The libcurl source code generally use the name `multi` for the variable that
  961:   points to the `Curl_multi` struct.
  962: 
  963: <a name="Curl_handler"></a>
  964: ## Curl_handler
  965: 
  966:   Each unique protocol that is supported by libcurl needs to provide at least
  967:   one `Curl_handler` struct. It defines what the protocol is called and what
  968:   functions the main code should call to deal with protocol specific issues.
  969:   In general, there's a source file named `[protocol].c` in which there's a
  970:   `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's
  971:   then the main array with all individual `Curl_handler` structs pointed to
  972:   from a single array which is scanned through when a URL is given to libcurl
  973:   to work with.
  974: 
  975:   `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
  976:   "HTTP" or "FTP" etc. SSL versions of the protocol need their own
  977:   `Curl_handler` setup so HTTPS separate from HTTP.
  978: 
  979:   `->setup_connection` is called to allow the protocol code to allocate
  980:   protocol specific data that then gets associated with that `Curl_easy` for
  981:   the rest of this transfer. It gets freed again at the end of the transfer.
  982:   It will be called before the `connectdata` for the transfer has been
  983:   selected/created. Most protocols will allocate its private
  984:   `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it.
  985: 
  986:   `->connect_it` allows a protocol to do some specific actions after the TCP
  987:   connect is done, that can still be considered part of the connection phase.
  988: 
  989:   Some protocols will alter the `connectdata->recv[]` and
  990:   `connectdata->send[]` function pointers in this function.
  991: 
  992:   `->connecting` is similarly a function that keeps getting called as long as
  993:   the protocol considers itself still in the connecting phase.
  994: 
  995:   `->do_it` is the function called to issue the transfer request. What we call
  996:   the DO action internally. If the DO is not enough and things need to be kept
  997:   getting done for the entire DO sequence to complete, `->doing` is then
  998:   usually also provided. Each protocol that needs to do multiple commands or
  999:   similar for do/doing need to implement their own state machines (see SCP,
 1000:   SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
 1001:   a separate piece of the DO state called `DO_MORE`.
 1002: 
 1003:   `->doing` keeps getting called while issuing the transfer request command(s)
 1004: 
 1005:   `->done` gets called when the transfer is complete and DONE. That's after the
 1006:   main data has been transferred.
 1007: 
 1008:   `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
 1009:   this state when setting up the second connection.
 1010: 
 1011:   `->proto_getsock`
 1012:   `->doing_getsock`
 1013:   `->domore_getsock`
 1014:   `->perform_getsock`
 1015:   Functions that return socket information. Which socket(s) to wait for which
 1016:   action(s) during the particular multi state.
 1017: 
 1018:   `->disconnect` is called immediately before the TCP connection is shutdown.
 1019: 
 1020:   `->readwrite` gets called during transfer to allow the protocol to do extra
 1021:   reads/writes
 1022: 
 1023:   `->defport` is the default report TCP or UDP port this protocol uses
 1024: 
 1025:   `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions
 1026:   have their "base" protocol set and then the SSL variation. Like
 1027:   "HTTP|HTTPS".
 1028: 
 1029:   `->flags` is a bitmask with additional information about the protocol that will
 1030:   make it get treated differently by the generic engine:
 1031: 
 1032:   - `PROTOPT_SSL` - will make it connect and negotiate SSL
 1033: 
 1034:   - `PROTOPT_DUAL` - this protocol uses two connections
 1035: 
 1036:   - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
 1037:     connection. This flag is no longer used by code, yet still set for a bunch
 1038:     of protocol handlers.
 1039: 
 1040:   - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
 1041:     limit which "direction" of socket actions that the main engine will
 1042:     concern itself with.
 1043: 
 1044:   - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`)
 1045: 
 1046:   - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
 1047:     one unless one is provided
 1048: 
 1049:   - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
 1050:     (?foo=bar)
 1051: 
 1052: <a name="conncache"></a>
 1053: ## conncache
 1054: 
 1055:   Is a hash table with connections for later re-use. Each `Curl_easy` has a
 1056:   pointer to its connection cache. Each multi handle sets up a connection
 1057:   cache that all added `Curl_easy`s share by default.
 1058: 
 1059: <a name="Curl_share"></a>
 1060: ## Curl_share
 1061: 
 1062:   The libcurl share API allocates a `Curl_share` struct, exposed to the
 1063:   external API as `CURLSH *`.
 1064: 
 1065:   The idea is that the struct can have a set of its own versions of caches and
 1066:   pools and then by providing this struct in the `CURLOPT_SHARE` option, those
 1067:   specific `Curl_easy`s will use the caches/pools that this share handle
 1068:   holds.
 1069: 
 1070:   Then individual `Curl_easy` structs can be made to share specific things
 1071:   that they otherwise wouldn't, such as cookies.
 1072: 
 1073:   The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
 1074:   session cache.
 1075: 
 1076: <a name="CookieInfo"></a>
 1077: ## CookieInfo
 1078: 
 1079:   This is the main cookie struct. It holds all known cookies and related
 1080:   information. Each `Curl_easy` has its own private `CookieInfo` even when
 1081:   they are added to a multi handle. They can be made to share cookies by using
 1082:   the share API.
 1083: 
 1084: 
 1085: [1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
 1086: [2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
 1087: [3]: https://c-ares.haxx.se/
 1088: [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
 1089: [5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
 1090: [6]: https://curl.haxx.se/docs/manpage.html#--compressed
 1091: [7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
 1092: [8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
 1093: [9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
 1094: [10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
 1095: [11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
 1096: [12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
 1097: [13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
 1098: [14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
 1099: [15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>