Annotation of embedaddon/curl/docs/INTERNALS.md, revision 1.1
1.1 ! misho 1: curl internals
! 2: ==============
! 3:
! 4: - [Intro](#intro)
! 5: - [git](#git)
! 6: - [Portability](#Portability)
! 7: - [Windows vs Unix](#winvsunix)
! 8: - [Library](#Library)
! 9: - [`Curl_connect`](#Curl_connect)
! 10: - [`multi_do`](#multi_do)
! 11: - [`Curl_readwrite`](#Curl_readwrite)
! 12: - [`multi_done`](#multi_done)
! 13: - [`Curl_disconnect`](#Curl_disconnect)
! 14: - [HTTP(S)](#http)
! 15: - [FTP](#ftp)
! 16: - [Kerberos](#kerberos)
! 17: - [TELNET](#telnet)
! 18: - [FILE](#file)
! 19: - [SMB](#smb)
! 20: - [LDAP](#ldap)
! 21: - [E-mail](#email)
! 22: - [General](#general)
! 23: - [Persistent Connections](#persistent)
! 24: - [multi interface/non-blocking](#multi)
! 25: - [SSL libraries](#ssl)
! 26: - [Library Symbols](#symbols)
! 27: - [Return Codes and Informationals](#returncodes)
! 28: - [AP/ABI](#abi)
! 29: - [Client](#client)
! 30: - [Memory Debugging](#memorydebug)
! 31: - [Test Suite](#test)
! 32: - [Asynchronous name resolves](#asyncdns)
! 33: - [c-ares](#cares)
! 34: - [`curl_off_t`](#curl_off_t)
! 35: - [curlx](#curlx)
! 36: - [Content Encoding](#contentencoding)
! 37: - [`hostip.c` explained](#hostip)
! 38: - [Track Down Memory Leaks](#memoryleak)
! 39: - [`multi_socket`](#multi_socket)
! 40: - [Structs in libcurl](#structs)
! 41: - [Curl_easy](#Curl_easy)
! 42: - [connectdata](#connectdata)
! 43: - [Curl_multi](#Curl_multi)
! 44: - [Curl_handler](#Curl_handler)
! 45: - [conncache](#conncache)
! 46: - [Curl_share](#Curl_share)
! 47: - [CookieInfo](#CookieInfo)
! 48:
! 49: <a name="intro"></a>
! 50: Intro
! 51: =====
! 52:
! 53: This project is split in two. The library and the client. The client part
! 54: uses the library, but the library is designed to allow other applications to
! 55: use it.
! 56:
! 57: The largest amount of code and complexity is in the library part.
! 58:
! 59:
! 60: <a name="git"></a>
! 61: git
! 62: ===
! 63:
! 64: All changes to the sources are committed to the git repository as soon as
! 65: they're somewhat verified to work. Changes shall be committed as independently
! 66: as possible so that individual changes can be easily spotted and tracked
! 67: afterwards.
! 68:
! 69: Tagging shall be used extensively, and by the time we release new archives we
! 70: should tag the sources with a name similar to the released version number.
! 71:
! 72: <a name="Portability"></a>
! 73: Portability
! 74: ===========
! 75:
! 76: We write curl and libcurl to compile with C89 compilers. On 32-bit and up
! 77: machines. Most of libcurl assumes more or less POSIX compliance but that's
! 78: not a requirement.
! 79:
! 80: We write libcurl to build and work with lots of third party tools, and we
! 81: want it to remain functional and buildable with these and later versions
! 82: (older versions may still work but is not what we work hard to maintain):
! 83:
! 84: Dependencies
! 85: ------------
! 86:
! 87: - OpenSSL 0.9.7
! 88: - GnuTLS 3.1.10
! 89: - zlib 1.1.4
! 90: - libssh2 0.16
! 91: - c-ares 1.6.0
! 92: - libidn2 2.0.0
! 93: - wolfSSL 2.0.0
! 94: - openldap 2.0
! 95: - MIT Kerberos 1.2.4
! 96: - GSKit V5R3M0
! 97: - NSS 3.14.x
! 98: - Heimdal ?
! 99: - nghttp2 1.12.0
! 100:
! 101: Operating Systems
! 102: -----------------
! 103:
! 104: On systems where configure runs, we aim at working on them all - if they have
! 105: a suitable C compiler. On systems that don't run configure, we strive to keep
! 106: curl running correctly on:
! 107:
! 108: - Windows 98
! 109: - AS/400 V5R3M0
! 110: - Symbian 9.1
! 111: - Windows CE ?
! 112: - TPF ?
! 113:
! 114: Build tools
! 115: -----------
! 116:
! 117: When writing code (mostly for generating stuff included in release tarballs)
! 118: we use a few "build tools" and we make sure that we remain functional with
! 119: these versions:
! 120:
! 121: - GNU Libtool 1.4.2
! 122: - GNU Autoconf 2.57
! 123: - GNU Automake 1.7
! 124: - GNU M4 1.4
! 125: - perl 5.004
! 126: - roffit 0.5
! 127: - groff ? (any version that supports `groff -Tps -man [in] [out]`)
! 128: - ps2pdf (gs) ?
! 129:
! 130: <a name="winvsunix"></a>
! 131: Windows vs Unix
! 132: ===============
! 133:
! 134: There are a few differences in how to program curl the Unix way compared to
! 135: the Windows way. Perhaps the four most notable details are:
! 136:
! 137: 1. Different function names for socket operations.
! 138:
! 139: In curl, this is solved with defines and macros, so that the source looks
! 140: the same in all places except for the header file that defines them. The
! 141: macros in use are `sclose()`, `sread()` and `swrite()`.
! 142:
! 143: 2. Windows requires a couple of init calls for the socket stuff.
! 144:
! 145: That's taken care of by the `curl_global_init()` call, but if other libs
! 146: also do it etc there might be reasons for applications to alter that
! 147: behaviour.
! 148:
! 149: 3. The file descriptors for network communication and file operations are
! 150: not as easily interchangeable as in Unix.
! 151:
! 152: We avoid this by not trying any funny tricks on file descriptors.
! 153:
! 154: 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
! 155: destroying binary data, although you do want that conversion if it is
! 156: text coming through... (sigh)
! 157:
! 158: We set stdout to binary under windows
! 159:
! 160: Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
! 161: conditionals that deal with features *should* instead be in the format
! 162: `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
! 163: we maintain a `curl_config-win32.h` file in lib directory that is supposed to
! 164: look exactly like a `curl_config.h` file would have looked like on a Windows
! 165: machine!
! 166:
! 167: Generally speaking: always remember that this will be compiled on dozens of
! 168: operating systems. Don't walk on the edge!
! 169:
! 170: <a name="Library"></a>
! 171: Library
! 172: =======
! 173:
! 174: (See [Structs in libcurl](#structs) for the separate section describing all
! 175: major internal structs and their purposes.)
! 176:
! 177: There are plenty of entry points to the library, namely each publicly defined
! 178: function that libcurl offers to applications. All of those functions are
! 179: rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
! 180: put in the `lib/easy.c` file.
! 181:
! 182: `curl_global_init()` and `curl_global_cleanup()` should be called by the
! 183: application to initialize and clean up global stuff in the library. As of
! 184: today, it can handle the global SSL initing if SSL is enabled and it can init
! 185: the socket layer on windows machines. libcurl itself has no "global" scope.
! 186:
! 187: All printf()-style functions use the supplied clones in `lib/mprintf.c`. This
! 188: makes sure we stay absolutely platform independent.
! 189:
! 190: [ `curl_easy_init()`][2] allocates an internal struct and makes some
! 191: initializations. The returned handle does not reveal internals. This is the
! 192: `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
! 193: functions. All connections performed will get connect-specific data allocated
! 194: that should be used for things related to particular connections/requests.
! 195:
! 196: [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
! 197: be passed in pairs: the parameter-ID and the parameter-value. The list of
! 198: options is documented in the man page. This function mainly sets things in
! 199: the `Curl_easy` struct.
! 200:
! 201: `curl_easy_perform()` is just a wrapper function that makes use of the multi
! 202: API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
! 203: `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
! 204: and then returns.
! 205:
! 206: Some of the most important key functions in `url.c` are called from
! 207: `multi.c` when certain key steps are to be made in the transfer operation.
! 208:
! 209: <a name="Curl_connect"></a>
! 210: Curl_connect()
! 211: --------------
! 212:
! 213: Analyzes the URL, it separates the different components and connects to the
! 214: remote host. This may involve using a proxy and/or using SSL. The
! 215: `Curl_resolv()` function in `lib/hostip.c` is used for looking up host
! 216: names (it does then use the proper underlying method, which may vary
! 217: between platforms and builds).
! 218:
! 219: When `Curl_connect` is done, we are connected to the remote site. Then it
! 220: is time to tell the server to get a document/file. `Curl_do()` arranges
! 221: this.
! 222:
! 223: This function makes sure there's an allocated and initiated `connectdata`
! 224: struct that is used for this particular connection only (although there may
! 225: be several requests performed on the same connect). A bunch of things are
! 226: inited/inherited from the `Curl_easy` struct.
! 227:
! 228: <a name="multi_do"></a>
! 229: multi_do()
! 230: ---------
! 231:
! 232: `multi_do()` makes sure the proper protocol-specific function is called.
! 233: The functions are named after the protocols they handle.
! 234:
! 235: The protocol-specific functions of course deal with protocol-specific
! 236: negotiations and setup. They have access to the `Curl_sendf()` (from
! 237: `lib/sendf.c`) function to send printf-style formatted data to the remote
! 238: host and when they're ready to make the actual file transfer they call the
! 239: `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the
! 240: transfer and returns.
! 241:
! 242: If this DO function fails and the connection is being re-used, libcurl will
! 243: then close this connection, setup a new connection and re-issue the DO
! 244: request on that. This is because there is no way to be perfectly sure that
! 245: we have discovered a dead connection before the DO function and thus we
! 246: might wrongly be re-using a connection that was closed by the remote peer.
! 247:
! 248: <a name="Curl_readwrite"></a>
! 249: Curl_readwrite()
! 250: ----------------
! 251:
! 252: Called during the transfer of the actual protocol payload.
! 253:
! 254: During transfer, the progress functions in `lib/progress.c` are called at
! 255: frequent intervals (or at the user's choice, a specified callback might get
! 256: called). The speedcheck functions in `lib/speedcheck.c` are also used to
! 257: verify that the transfer is as fast as required.
! 258:
! 259: <a name="multi_done"></a>
! 260: multi_done()
! 261: -----------
! 262:
! 263: Called after a transfer is done. This function takes care of everything
! 264: that has to be done after a transfer. This function attempts to leave
! 265: matters in a state so that `multi_do()` should be possible to call again on
! 266: the same connection (in a persistent connection case). It might also soon
! 267: be closed with `Curl_disconnect()`.
! 268:
! 269: <a name="Curl_disconnect"></a>
! 270: Curl_disconnect()
! 271: -----------------
! 272:
! 273: When doing normal connections and transfers, no one ever tries to close any
! 274: connections so this is not normally called when `curl_easy_perform()` is
! 275: used. This function is only used when we are certain that no more transfers
! 276: are going to be made on the connection. It can be also closed by force, or
! 277: it can be called to make sure that libcurl doesn't keep too many
! 278: connections alive at the same time.
! 279:
! 280: This function cleans up all resources that are associated with a single
! 281: connection.
! 282:
! 283: <a name="http"></a>
! 284: HTTP(S)
! 285: =======
! 286:
! 287: HTTP offers a lot and is the protocol in curl that uses the most lines of
! 288: code. There is a special file `lib/formdata.c` that offers all the
! 289: multipart post functions.
! 290:
! 291: base64-functions for user+password stuff (and more) is in `lib/base64.c`
! 292: and all functions for parsing and sending cookies are found in
! 293: `lib/cookie.c`.
! 294:
! 295: HTTPS uses in almost every case the same procedure as HTTP, with only two
! 296: exceptions: the connect procedure is different and the function used to read
! 297: or write from the socket is different, although the latter fact is hidden in
! 298: the source by the use of `Curl_read()` for reading and `Curl_write()` for
! 299: writing data to the remote server.
! 300:
! 301: `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
! 302: encoding.
! 303:
! 304: An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
! 305: series of functions we use. They append data to one single buffer, and when
! 306: the building is finished the entire request is sent off in one single write.
! 307: This is done this way to overcome problems with flawed firewalls and lame
! 308: servers.
! 309:
! 310: <a name="ftp"></a>
! 311: FTP
! 312: ===
! 313:
! 314: The `Curl_if2ip()` function can be used for getting the IP number of a
! 315: specified network interface, and it resides in `lib/if2ip.c`.
! 316:
! 317: `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
! 318: was made a separate function to prevent us programmers from forgetting that
! 319: they must be CRLF terminated. They must also be sent in one single `write()`
! 320: to make firewalls and similar happy.
! 321:
! 322: <a name="kerberos"></a>
! 323: Kerberos
! 324: ========
! 325:
! 326: Kerberos support is mainly in `lib/krb5.c` and `lib/security.c` but also
! 327: `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
! 328: `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
! 329:
! 330: <a name="telnet"></a>
! 331: TELNET
! 332: ======
! 333:
! 334: Telnet is implemented in `lib/telnet.c`.
! 335:
! 336: <a name="file"></a>
! 337: FILE
! 338: ====
! 339:
! 340: The `file://` protocol is dealt with in `lib/file.c`.
! 341:
! 342: <a name="smb"></a>
! 343: SMB
! 344: ===
! 345:
! 346: The `smb://` protocol is dealt with in `lib/smb.c`.
! 347:
! 348: <a name="ldap"></a>
! 349: LDAP
! 350: ====
! 351:
! 352: Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`.
! 353:
! 354: <a name="email"></a>
! 355: E-mail
! 356: ======
! 357:
! 358: The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and
! 359: `lib/smtp.c`.
! 360:
! 361: <a name="general"></a>
! 362: General
! 363: =======
! 364:
! 365: URL encoding and decoding, called escaping and unescaping in the source code,
! 366: is found in `lib/escape.c`.
! 367:
! 368: While transferring data in `Transfer()` a few functions might get used.
! 369: `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and
! 370: more).
! 371:
! 372: `lib/getenv.c` offers `curl_getenv()` which is for reading environment
! 373: variables in a neat platform independent way. That's used in the client, but
! 374: also in `lib/url.c` when checking the proxy environment variables. Note that
! 375: contrary to the normal unix `getenv()`, this returns an allocated buffer that
! 376: must be `free()`ed after use.
! 377:
! 378: `lib/netrc.c` holds the `.netrc` parser.
! 379:
! 380: `lib/timeval.c` features replacement functions for systems that don't have
! 381: `gettimeofday()` and a few support functions for timeval conversions.
! 382:
! 383: A function named `curl_version()` that returns the full curl version string
! 384: is found in `lib/version.c`.
! 385:
! 386: <a name="persistent"></a>
! 387: Persistent Connections
! 388: ======================
! 389:
! 390: The persistent connection support in libcurl requires some considerations on
! 391: how to do things inside of the library.
! 392:
! 393: - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
! 394: must never hold connection-oriented data. It is meant to hold the root data
! 395: as well as all the options etc that the library-user may choose.
! 396:
! 397: - The `Curl_easy` struct holds the "connection cache" (an array of
! 398: pointers to `connectdata` structs).
! 399:
! 400: - This enables the 'curl handle' to be reused on subsequent transfers.
! 401:
! 402: - When libcurl is told to perform a transfer, it first checks for an already
! 403: existing connection in the cache that we can use. Otherwise it creates a
! 404: new one and adds that to the cache. If the cache is full already when a new
! 405: connection is added, it will first close the oldest unused one.
! 406:
! 407: - When the transfer operation is complete, the connection is left
! 408: open. Particular options may tell libcurl not to, and protocols may signal
! 409: closure on connections and then they won't be kept open, of course.
! 410:
! 411: - When `curl_easy_cleanup()` is called, we close all still opened connections,
! 412: unless of course the multi interface "owns" the connections.
! 413:
! 414: The curl handle must be re-used in order for the persistent connections to
! 415: work.
! 416:
! 417: <a name="multi"></a>
! 418: multi interface/non-blocking
! 419: ============================
! 420:
! 421: The multi interface is a non-blocking interface to the library. To make that
! 422: interface work as well as possible, no low-level functions within libcurl
! 423: must be written to work in a blocking manner. (There are still a few spots
! 424: violating this rule.)
! 425:
! 426: One of the primary reasons we introduced c-ares support was to allow the name
! 427: resolve phase to be perfectly non-blocking as well.
! 428:
! 429: The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
! 430: the code to allow non-blocking operations even on multi-stage command-
! 431: response protocols. They are built around state machines that return when
! 432: they would otherwise block waiting for data. The DICT, LDAP and TELNET
! 433: protocols are crappy examples and they are subject for rewrite in the future
! 434: to better fit the libcurl protocol family.
! 435:
! 436: <a name="ssl"></a>
! 437: SSL libraries
! 438: =============
! 439:
! 440: Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
! 441: extended to its successor OpenSSL but has since also been extended to several
! 442: other SSL/TLS libraries and we expect and hope to further extend the support
! 443: in future libcurl versions.
! 444:
! 445: To deal with this internally in the best way possible, we have a generic SSL
! 446: function API as provided by the `vtls/vtls.[ch]` system, and they are the only
! 447: SSL functions we must use from within libcurl. vtls is then crafted to use
! 448: the appropriate lower-level function calls to whatever SSL library that is in
! 449: use. For example `vtls/openssl.[ch]` for the OpenSSL library.
! 450:
! 451: <a name="symbols"></a>
! 452: Library Symbols
! 453: ===============
! 454:
! 455: All symbols used internally in libcurl must use a `Curl_` prefix if they're
! 456: used in more than a single file. Single-file symbols must be made static.
! 457: Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
! 458: but they are to be changed to follow this pattern in future versions.) Public
! 459: API functions are marked with `CURL_EXTERN` in the public header files so
! 460: that all others can be hidden on platforms where this is possible.
! 461:
! 462: <a name="returncodes"></a>
! 463: Return Codes and Informationals
! 464: ===============================
! 465:
! 466: I've made things simple. Almost every function in libcurl returns a CURLcode,
! 467: that must be `CURLE_OK` if everything is OK or otherwise a suitable error
! 468: code as the `curl/curl.h` include file defines. The very spot that detects an
! 469: error must use the `Curl_failf()` function to set the human-readable error
! 470: description.
! 471:
! 472: In aiding the user to understand what's happening and to debug curl usage, we
! 473: must supply a fair number of informational messages by using the
! 474: `Curl_infof()` function. Those messages are only displayed when the user
! 475: explicitly asks for them. They are best used when revealing information that
! 476: isn't otherwise obvious.
! 477:
! 478: <a name="abi"></a>
! 479: API/ABI
! 480: =======
! 481:
! 482: We make an effort to not export or show internals or how internals work, as
! 483: that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
! 484: for our promise to users.
! 485:
! 486: <a name="client"></a>
! 487: Client
! 488: ======
! 489:
! 490: `main()` resides in `src/tool_main.c`.
! 491:
! 492: `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl
! 493: script to display the complete "manual" and the `src/tool_urlglob.c` file
! 494: holds the functions used for the URL-"globbing" support. Globbing in the
! 495: sense that the `{}` and `[]` expansion stuff is there.
! 496:
! 497: The client mostly sets up its `config` struct properly, then
! 498: it calls the `curl_easy_*()` functions of the library and when it gets back
! 499: control after the `curl_easy_perform()` it cleans up the library, checks
! 500: status and exits.
! 501:
! 502: When the operation is done, the `ourWriteOut()` function in `src/writeout.c`
! 503: may be called to report about the operation. That function is using the
! 504: `curl_easy_getinfo()` function to extract useful information from the curl
! 505: session.
! 506:
! 507: It may loop and do all this several times if many URLs were specified on the
! 508: command line or config file.
! 509:
! 510: <a name="memorydebug"></a>
! 511: Memory Debugging
! 512: ================
! 513:
! 514: The file `lib/memdebug.c` contains debug-versions of a few functions.
! 515: Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that
! 516: somehow deal with resources that might give us problems if we "leak" them.
! 517: The functions in the memdebug system do nothing fancy, they do their normal
! 518: function and then log information about what they just did. The logged data
! 519: can then be analyzed after a complete session,
! 520:
! 521: `memanalyze.pl` is the perl script present in `tests/` that analyzes a log
! 522: file generated by the memory tracking system. It detects if resources are
! 523: allocated but never freed and other kinds of errors related to resource
! 524: management.
! 525:
! 526: Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code
! 527: which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is
! 528: used to differentiate code which is _only_ used for memory
! 529: tracking/debugging.
! 530:
! 531: Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also
! 532: switched on by running configure with `--enable-curldebug`. Use
! 533: `-DDEBUGBUILD` when compiling to enable a debug build or run configure with
! 534: `--enable-debug`.
! 535:
! 536: `curl --version` will list 'Debug' feature for debug enabled builds, and
! 537: will list 'TrackMemory' feature for curl debug memory tracking capable
! 538: builds. These features are independent and can be controlled when running
! 539: the configure script. When `--enable-debug` is given both features will be
! 540: enabled, unless some restriction prevents memory tracking from being used.
! 541:
! 542: <a name="test"></a>
! 543: Test Suite
! 544: ==========
! 545:
! 546: The test suite is placed in its own subdirectory directly off the root in the
! 547: curl archive tree, and it contains a bunch of scripts and a lot of test case
! 548: data.
! 549:
! 550: The main test script is `runtests.pl` that will invoke test servers like
! 551: `httpserver.pl` and `ftpserver.pl` before all the test cases are performed.
! 552: The test suite currently only runs on Unix-like platforms.
! 553:
! 554: You'll find a description of the test suite in the `tests/README` file, and
! 555: the test case data files in the `tests/FILEFORMAT` file.
! 556:
! 557: The test suite automatically detects if curl was built with the memory
! 558: debugging enabled, and if it was, it will detect memory leaks, too.
! 559:
! 560: <a name="asyncdns"></a>
! 561: Asynchronous name resolves
! 562: ==========================
! 563:
! 564: libcurl can be built to do name resolves asynchronously, using either the
! 565: normal resolver in a threaded manner or by using c-ares.
! 566:
! 567: <a name="cares"></a>
! 568: [c-ares][3]
! 569: ------
! 570:
! 571: ### Build libcurl to use a c-ares
! 572:
! 573: 1. ./configure --enable-ares=/path/to/ares/install
! 574: 2. make
! 575:
! 576: ### c-ares on win32
! 577:
! 578: First I compiled c-ares. I changed the default C runtime library to be the
! 579: single-threaded rather than the multi-threaded (this seems to be required to
! 580: prevent linking errors later on). Then I simply build the areslib project
! 581: (the other projects adig/ahost seem to fail under MSVC).
! 582:
! 583: Next was libcurl. I opened `lib/config-win32.h` and I added a:
! 584: `#define USE_ARES 1`
! 585:
! 586: Next thing I did was I added the path for the ares includes to the include
! 587: path, and the libares.lib to the libraries.
! 588:
! 589: Lastly, I also changed libcurl to be single-threaded rather than
! 590: multi-threaded, again this was to prevent some duplicate symbol errors. I'm
! 591: not sure why I needed to change everything to single-threaded, but when I
! 592: didn't I got redefinition errors for several CRT functions (`malloc()`,
! 593: `stricmp()`, etc.)
! 594:
! 595: <a name="curl_off_t"></a>
! 596: `curl_off_t`
! 597: ==========
! 598:
! 599: `curl_off_t` is a data type provided by the external libcurl include
! 600: headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
! 601: options that end with LARGE. The type is 64-bit large on most modern
! 602: platforms.
! 603:
! 604: <a name="curlx"></a>
! 605: curlx
! 606: =====
! 607:
! 608: The libcurl source code offers a few functions by source only. They are not
! 609: part of the official libcurl API, but the source files might be useful for
! 610: others so apps can optionally compile/build with these sources to gain
! 611: additional functions.
! 612:
! 613: We provide them through a single header file for easy access for apps:
! 614: `curlx.h`
! 615:
! 616: `curlx_strtoofft()`
! 617: -------------------
! 618: A macro that converts a string containing a number to a `curl_off_t` number.
! 619: This might use the `curlx_strtoll()` function which is provided as source
! 620: code in strtoofft.c. Note that the function is only provided if no
! 621: `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t`
! 622: is only a 32-bit number on your platform, this macro uses `strtol()`.
! 623:
! 624: Future
! 625: ------
! 626:
! 627: Several functions will be removed from the public `curl_` name space in a
! 628: future libcurl release. They will then only become available as `curlx_`
! 629: functions instead. To make the transition easier, we already today provide
! 630: these functions with the `curlx_` prefix to allow sources to be built
! 631: properly with the new function names. The concerned functions are:
! 632:
! 633: - `curlx_getenv`
! 634: - `curlx_strequal`
! 635: - `curlx_strnequal`
! 636: - `curlx_mvsnprintf`
! 637: - `curlx_msnprintf`
! 638: - `curlx_maprintf`
! 639: - `curlx_mvaprintf`
! 640: - `curlx_msprintf`
! 641: - `curlx_mprintf`
! 642: - `curlx_mfprintf`
! 643: - `curlx_mvsprintf`
! 644: - `curlx_mvprintf`
! 645: - `curlx_mvfprintf`
! 646:
! 647: <a name="contentencoding"></a>
! 648: Content Encoding
! 649: ================
! 650:
! 651: ## About content encodings
! 652:
! 653: [HTTP/1.1][4] specifies that a client may request that a server encode its
! 654: response. This is usually used to compress a response using one (or more)
! 655: encodings from a set of commonly available compression techniques. These
! 656: schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and
! 657: `compress`. A client requests that the server perform an encoding by including
! 658: an `Accept-Encoding` header in the request document. The value of the header
! 659: should be one of the recognized tokens `deflate`, ... (there's a way to
! 660: register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
! 661: the client's encoding request. When a response is encoded, the server
! 662: includes a `Content-Encoding` header in the response. The value of the
! 663: `Content-Encoding` header indicates which encodings were used to encode the
! 664: data, in the order in which they were applied.
! 665:
! 666: It's also possible for a client to attach priorities to different schemes so
! 667: that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
! 668: information on the `Accept-Encoding` header. See sec
! 669: [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding`
! 670: header.
! 671:
! 672: ## Supported content encodings
! 673:
! 674: The `deflate`, `gzip` and `br` content encodings are supported by libcurl.
! 675: Both regular and chunked transfers work fine. The zlib library is required
! 676: for the `deflate` and `gzip` encodings, while the brotli decoding library is
! 677: for the `br` encoding.
! 678:
! 679: ## The libcurl interface
! 680:
! 681: To cause libcurl to request a content encoding use:
! 682:
! 683: [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
! 684:
! 685: where string is the intended value of the `Accept-Encoding` header.
! 686:
! 687: Currently, libcurl does support multiple encodings but only
! 688: understands how to process responses that use the `deflate`, `gzip` and/or
! 689: `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
! 690: that will work (besides `identity`, which does nothing) are `deflate`,
! 691: `gzip` and `br`. If a response is encoded using the `compress` or methods,
! 692: libcurl will return an error indicating that the response could
! 693: not be decoded. If `<string>` is NULL no `Accept-Encoding` header is
! 694: generated. If `<string>` is a zero-length string, then an `Accept-Encoding`
! 695: header containing all supported encodings will be generated.
! 696:
! 697: The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
! 698: content to be automatically decoded. If it is not set and the server still
! 699: sends encoded content (despite not having been asked), the data is returned
! 700: in its raw form and the `Content-Encoding` type is not checked.
! 701:
! 702: ## The curl interface
! 703:
! 704: Use the [`--compressed`][6] option with curl to cause it to ask servers to
! 705: compress responses using any format supported by curl.
! 706:
! 707: <a name="hostip"></a>
! 708: `hostip.c` explained
! 709: ====================
! 710:
! 711: The main compile-time defines to keep in mind when reading the `host*.c`
! 712: source file are these:
! 713:
! 714: ## `CURLRES_IPV6`
! 715:
! 716: this host has `getaddrinfo()` and family, and thus we use that. The host may
! 717: not be able to resolve IPv6, but we don't really have to take that into
! 718: account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
! 719:
! 720: ## `CURLRES_ARES`
! 721:
! 722: is defined if libcurl is built to use c-ares for asynchronous name
! 723: resolves. This can be Windows or \*nix.
! 724:
! 725: ## `CURLRES_THREADED`
! 726:
! 727: is defined if libcurl is built to use threading for asynchronous name
! 728: resolves. The name resolve will be done in a new thread, and the supported
! 729: asynch API will be the same as for ares-builds. This is the default under
! 730: (native) Windows.
! 731:
! 732: If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
! 733: libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
! 734: defined.
! 735:
! 736: ## `host*.c` sources
! 737:
! 738: The `host*.c` sources files are split up like this:
! 739:
! 740: - `hostip.c` - method-independent resolver functions and utility functions
! 741: - `hostasyn.c` - functions for asynchronous name resolves
! 742: - `hostsyn.c` - functions for synchronous name resolves
! 743: - `asyn-ares.c` - functions for asynchronous name resolves using c-ares
! 744: - `asyn-thread.c` - functions for asynchronous name resolves using threads
! 745: - `hostip4.c` - IPv4 specific functions
! 746: - `hostip6.c` - IPv6 specific functions
! 747:
! 748: The `hostip.h` is the single united header file for all this. It defines the
! 749: `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines.
! 750:
! 751: <a name="memoryleak"></a>
! 752: Track Down Memory Leaks
! 753: =======================
! 754:
! 755: ## Single-threaded
! 756:
! 757: Please note that this memory leak system is not adjusted to work in more
! 758: than one thread. If you want/need to use it in a multi-threaded app. Please
! 759: adjust accordingly.
! 760:
! 761: ## Build
! 762:
! 763: Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with
! 764: `--enable-debug` fixes this). `make clean` first, then `make` so that all
! 765: files are actually rebuilt properly. It will also make sense to build
! 766: libcurl with the debug option (usually `-g` to the compiler) so that
! 767: debugging it will be easier if you actually do find a leak in the library.
! 768:
! 769: This will create a library that has memory debugging enabled.
! 770:
! 771: ## Modify Your Application
! 772:
! 773: Add a line in your application code:
! 774:
! 775: `curl_dbg_memdebug("dump");`
! 776:
! 777: This will make the malloc debug system output a full trace of all resource
! 778: using functions to the given file name. Make sure you rebuild your program
! 779: and that you link with the same libcurl you built for this purpose as
! 780: described above.
! 781:
! 782: ## Run Your Application
! 783:
! 784: Run your program as usual. Watch the specified memory trace file grow.
! 785:
! 786: Make your program exit and use the proper libcurl cleanup functions etc. So
! 787: that all non-leaks are returned/freed properly.
! 788:
! 789: ## Analyze the Flow
! 790:
! 791: Use the `tests/memanalyze.pl` perl script to analyze the dump file:
! 792:
! 793: tests/memanalyze.pl dump
! 794:
! 795: This now outputs a report on what resources that were allocated but never
! 796: freed etc. This report is very fine for posting to the list!
! 797:
! 798: If this doesn't produce any output, no leak was detected in libcurl. Then
! 799: the leak is mostly likely to be in your code.
! 800:
! 801: <a name="multi_socket"></a>
! 802: `multi_socket`
! 803: ==============
! 804:
! 805: Implementation of the `curl_multi_socket` API
! 806:
! 807: The main ideas of this API are simply:
! 808:
! 809: 1. The application can use whatever event system it likes as it gets info
! 810: from libcurl about what file descriptors libcurl waits for what action
! 811: on. (The previous API returns `fd_sets` which is very
! 812: `select()`-centric).
! 813:
! 814: 2. When the application discovers action on a single socket, it calls
! 815: libcurl and informs that there was action on this particular socket and
! 816: libcurl can then act on that socket/transfer only and not care about
! 817: any other transfers. (The previous API always had to scan through all
! 818: the existing transfers.)
! 819:
! 820: The idea is that [`curl_multi_socket_action()`][7] calls a given callback
! 821: with information about what socket to wait for what action on, and the
! 822: callback only gets called if the status of that socket has changed.
! 823:
! 824: We also added a timer callback that makes libcurl call the application when
! 825: the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
! 826: and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
! 827: Internally, there's an added struct to each easy handle in which we store
! 828: an "expire time" (if any). The structs are then "splay sorted" so that we
! 829: can add and remove times from the linked list and yet somewhat swiftly
! 830: figure out both how long there is until the next nearest timer expires
! 831: and which timer (handle) we should take care of now. Of course, the upside
! 832: of all this is that we get a [`curl_multi_timeout()`][8] that should also
! 833: work with old-style applications that use [`curl_multi_perform()`][11].
! 834:
! 835: We created an internal "socket to easy handles" hash table that given
! 836: a socket (file descriptor) returns the easy handle that waits for action on
! 837: that socket. This hash is made using the already existing hash code
! 838: (previously only used for the DNS cache).
! 839:
! 840: To make libcurl able to report plain sockets in the socket callback, we had
! 841: to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
! 842: the conversion from sockets to `fd_sets` for that function is only done in
! 843: the last step before the data is returned. I also had to extend c-ares to
! 844: get a function that can return plain sockets, as that library too returned
! 845: only `fd_sets` and that is no longer good enough. The changes done to c-ares
! 846: are available in c-ares 1.3.1 and later.
! 847:
! 848: <a name="structs"></a>
! 849: Structs in libcurl
! 850: ==================
! 851:
! 852: This section should cover 7.32.0 pretty accurately, but will make sense even
! 853: for older and later versions as things don't change drastically that often.
! 854:
! 855: <a name="Curl_easy"></a>
! 856: ## Curl_easy
! 857:
! 858: The `Curl_easy` struct is the one returned to the outside in the external API
! 859: as a `CURL *`. This is usually known as an easy handle in API documentations
! 860: and examples.
! 861:
! 862: Information and state that is related to the actual connection is in the
! 863: `connectdata` struct. When a transfer is about to be made, libcurl will
! 864: either create a new connection or re-use an existing one. The particular
! 865: connectdata that is used by this handle is pointed out by
! 866: `Curl_easy->easy_conn`.
! 867:
! 868: Data and information that regard this particular single transfer is put in
! 869: the `SingleRequest` sub-struct.
! 870:
! 871: When the `Curl_easy` struct is added to a multi handle, as it must be in
! 872: order to do any transfer, the `->multi` member will point to the `Curl_multi`
! 873: struct it belongs to. The `->prev` and `->next` members will then be used by
! 874: the multi code to keep a linked list of `Curl_easy` structs that are added to
! 875: that same multi handle. libcurl always uses multi so `->multi` *will* point
! 876: to a `Curl_multi` when a transfer is in progress.
! 877:
! 878: `->mstate` is the multi state of this particular `Curl_easy`. When
! 879: `multi_runsingle()` is called, it will act on this handle according to which
! 880: state it is in. The mstate is also what tells which sockets to return for a
! 881: specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
! 882:
! 883: The libcurl source code generally use the name `data` for the variable that
! 884: points to the `Curl_easy`.
! 885:
! 886: When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
! 887: an individual stream, sharing the same connectdata struct. Multiplexing
! 888: makes it even more important to keep things associated with the right thing!
! 889:
! 890: <a name="connectdata"></a>
! 891: ## connectdata
! 892:
! 893: A general idea in libcurl is to keep connections around in a connection
! 894: "cache" after they have been used in case they will be used again and then
! 895: re-use an existing one instead of creating a new as it creates a significant
! 896: performance boost.
! 897:
! 898: Each `connectdata` identifies a single physical connection to a server. If
! 899: the connection can't be kept alive, the connection will be closed after use
! 900: and then this struct can be removed from the cache and freed.
! 901:
! 902: Thus, the same `Curl_easy` can be used multiple times and each time select
! 903: another `connectdata` struct to use for the connection. Keep this in mind,
! 904: as it is then important to consider if options or choices are based on the
! 905: connection or the `Curl_easy`.
! 906:
! 907: Functions in libcurl will assume that `connectdata->data` points to the
! 908: `Curl_easy` that uses this connection (for the moment).
! 909:
! 910: As a special complexity, some protocols supported by libcurl require a
! 911: special disconnect procedure that is more than just shutting down the
! 912: socket. It can involve sending one or more commands to the server before
! 913: doing so. Since connections are kept in the connection cache after use, the
! 914: original `Curl_easy` may no longer be around when the time comes to shut down
! 915: a particular connection. For this purpose, libcurl holds a special dummy
! 916: `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
! 917:
! 918: FTP uses two TCP connections for a typical transfer but it keeps both in
! 919: this single struct and thus can be considered a single connection for most
! 920: internal concerns.
! 921:
! 922: The libcurl source code generally use the name `conn` for the variable that
! 923: points to the connectdata.
! 924:
! 925: <a name="Curl_multi"></a>
! 926: ## Curl_multi
! 927:
! 928: Internally, the easy interface is implemented as a wrapper around multi
! 929: interface functions. This makes everything multi interface.
! 930:
! 931: `Curl_multi` is the multi handle struct exposed as `CURLM *` in external
! 932: APIs.
! 933:
! 934: This struct holds a list of `Curl_easy` structs that have been added to this
! 935: handle with [`curl_multi_add_handle()`][13]. The start of the list is
! 936: `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
! 937:
! 938: `->msglist` is a linked list of messages to send back when
! 939: [`curl_multi_info_read()`][14] is called. Basically a node is added to that
! 940: list when an individual `Curl_easy`'s transfer has completed.
! 941:
! 942: `->hostcache` points to the name cache. It is a hash table for looking up
! 943: name to IP. The nodes have a limited life time in there and this cache is
! 944: meant to reduce the time for when the same name is wanted within a short
! 945: period of time.
! 946:
! 947: `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
! 948: until it should be checked - normally some sort of timeout. Each `Curl_easy`
! 949: has one node in the tree.
! 950:
! 951: `->sockhash` is a hash table to allow fast lookups of socket descriptor for
! 952: which `Curl_easy` uses that descriptor. This is necessary for the
! 953: `multi_socket` API.
! 954:
! 955: `->conn_cache` points to the connection cache. It keeps track of all
! 956: connections that are kept after use. The cache has a maximum size.
! 957:
! 958: `->closure_handle` is described in the `connectdata` section.
! 959:
! 960: The libcurl source code generally use the name `multi` for the variable that
! 961: points to the `Curl_multi` struct.
! 962:
! 963: <a name="Curl_handler"></a>
! 964: ## Curl_handler
! 965:
! 966: Each unique protocol that is supported by libcurl needs to provide at least
! 967: one `Curl_handler` struct. It defines what the protocol is called and what
! 968: functions the main code should call to deal with protocol specific issues.
! 969: In general, there's a source file named `[protocol].c` in which there's a
! 970: `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's
! 971: then the main array with all individual `Curl_handler` structs pointed to
! 972: from a single array which is scanned through when a URL is given to libcurl
! 973: to work with.
! 974:
! 975: `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
! 976: "HTTP" or "FTP" etc. SSL versions of the protocol need their own
! 977: `Curl_handler` setup so HTTPS separate from HTTP.
! 978:
! 979: `->setup_connection` is called to allow the protocol code to allocate
! 980: protocol specific data that then gets associated with that `Curl_easy` for
! 981: the rest of this transfer. It gets freed again at the end of the transfer.
! 982: It will be called before the `connectdata` for the transfer has been
! 983: selected/created. Most protocols will allocate its private
! 984: `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it.
! 985:
! 986: `->connect_it` allows a protocol to do some specific actions after the TCP
! 987: connect is done, that can still be considered part of the connection phase.
! 988:
! 989: Some protocols will alter the `connectdata->recv[]` and
! 990: `connectdata->send[]` function pointers in this function.
! 991:
! 992: `->connecting` is similarly a function that keeps getting called as long as
! 993: the protocol considers itself still in the connecting phase.
! 994:
! 995: `->do_it` is the function called to issue the transfer request. What we call
! 996: the DO action internally. If the DO is not enough and things need to be kept
! 997: getting done for the entire DO sequence to complete, `->doing` is then
! 998: usually also provided. Each protocol that needs to do multiple commands or
! 999: similar for do/doing need to implement their own state machines (see SCP,
! 1000: SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
! 1001: a separate piece of the DO state called `DO_MORE`.
! 1002:
! 1003: `->doing` keeps getting called while issuing the transfer request command(s)
! 1004:
! 1005: `->done` gets called when the transfer is complete and DONE. That's after the
! 1006: main data has been transferred.
! 1007:
! 1008: `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
! 1009: this state when setting up the second connection.
! 1010:
! 1011: `->proto_getsock`
! 1012: `->doing_getsock`
! 1013: `->domore_getsock`
! 1014: `->perform_getsock`
! 1015: Functions that return socket information. Which socket(s) to wait for which
! 1016: action(s) during the particular multi state.
! 1017:
! 1018: `->disconnect` is called immediately before the TCP connection is shutdown.
! 1019:
! 1020: `->readwrite` gets called during transfer to allow the protocol to do extra
! 1021: reads/writes
! 1022:
! 1023: `->defport` is the default report TCP or UDP port this protocol uses
! 1024:
! 1025: `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions
! 1026: have their "base" protocol set and then the SSL variation. Like
! 1027: "HTTP|HTTPS".
! 1028:
! 1029: `->flags` is a bitmask with additional information about the protocol that will
! 1030: make it get treated differently by the generic engine:
! 1031:
! 1032: - `PROTOPT_SSL` - will make it connect and negotiate SSL
! 1033:
! 1034: - `PROTOPT_DUAL` - this protocol uses two connections
! 1035:
! 1036: - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
! 1037: connection. This flag is no longer used by code, yet still set for a bunch
! 1038: of protocol handlers.
! 1039:
! 1040: - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
! 1041: limit which "direction" of socket actions that the main engine will
! 1042: concern itself with.
! 1043:
! 1044: - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`)
! 1045:
! 1046: - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
! 1047: one unless one is provided
! 1048:
! 1049: - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
! 1050: (?foo=bar)
! 1051:
! 1052: <a name="conncache"></a>
! 1053: ## conncache
! 1054:
! 1055: Is a hash table with connections for later re-use. Each `Curl_easy` has a
! 1056: pointer to its connection cache. Each multi handle sets up a connection
! 1057: cache that all added `Curl_easy`s share by default.
! 1058:
! 1059: <a name="Curl_share"></a>
! 1060: ## Curl_share
! 1061:
! 1062: The libcurl share API allocates a `Curl_share` struct, exposed to the
! 1063: external API as `CURLSH *`.
! 1064:
! 1065: The idea is that the struct can have a set of its own versions of caches and
! 1066: pools and then by providing this struct in the `CURLOPT_SHARE` option, those
! 1067: specific `Curl_easy`s will use the caches/pools that this share handle
! 1068: holds.
! 1069:
! 1070: Then individual `Curl_easy` structs can be made to share specific things
! 1071: that they otherwise wouldn't, such as cookies.
! 1072:
! 1073: The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
! 1074: session cache.
! 1075:
! 1076: <a name="CookieInfo"></a>
! 1077: ## CookieInfo
! 1078:
! 1079: This is the main cookie struct. It holds all known cookies and related
! 1080: information. Each `Curl_easy` has its own private `CookieInfo` even when
! 1081: they are added to a multi handle. They can be made to share cookies by using
! 1082: the share API.
! 1083:
! 1084:
! 1085: [1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
! 1086: [2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
! 1087: [3]: https://c-ares.haxx.se/
! 1088: [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
! 1089: [5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
! 1090: [6]: https://curl.haxx.se/docs/manpage.html#--compressed
! 1091: [7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
! 1092: [8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
! 1093: [9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
! 1094: [10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
! 1095: [11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
! 1096: [12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
! 1097: [13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
! 1098: [14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
! 1099: [15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>