Annotation of embedaddon/curl/docs/INTERNALS.md, revision 1.1.1.1
1.1 misho 1: curl internals
2: ==============
3:
4: - [Intro](#intro)
5: - [git](#git)
6: - [Portability](#Portability)
7: - [Windows vs Unix](#winvsunix)
8: - [Library](#Library)
9: - [`Curl_connect`](#Curl_connect)
10: - [`multi_do`](#multi_do)
11: - [`Curl_readwrite`](#Curl_readwrite)
12: - [`multi_done`](#multi_done)
13: - [`Curl_disconnect`](#Curl_disconnect)
14: - [HTTP(S)](#http)
15: - [FTP](#ftp)
16: - [Kerberos](#kerberos)
17: - [TELNET](#telnet)
18: - [FILE](#file)
19: - [SMB](#smb)
20: - [LDAP](#ldap)
21: - [E-mail](#email)
22: - [General](#general)
23: - [Persistent Connections](#persistent)
24: - [multi interface/non-blocking](#multi)
25: - [SSL libraries](#ssl)
26: - [Library Symbols](#symbols)
27: - [Return Codes and Informationals](#returncodes)
28: - [AP/ABI](#abi)
29: - [Client](#client)
30: - [Memory Debugging](#memorydebug)
31: - [Test Suite](#test)
32: - [Asynchronous name resolves](#asyncdns)
33: - [c-ares](#cares)
34: - [`curl_off_t`](#curl_off_t)
35: - [curlx](#curlx)
36: - [Content Encoding](#contentencoding)
37: - [`hostip.c` explained](#hostip)
38: - [Track Down Memory Leaks](#memoryleak)
39: - [`multi_socket`](#multi_socket)
40: - [Structs in libcurl](#structs)
41: - [Curl_easy](#Curl_easy)
42: - [connectdata](#connectdata)
43: - [Curl_multi](#Curl_multi)
44: - [Curl_handler](#Curl_handler)
45: - [conncache](#conncache)
46: - [Curl_share](#Curl_share)
47: - [CookieInfo](#CookieInfo)
48:
49: <a name="intro"></a>
50: Intro
51: =====
52:
53: This project is split in two. The library and the client. The client part
54: uses the library, but the library is designed to allow other applications to
55: use it.
56:
57: The largest amount of code and complexity is in the library part.
58:
59:
60: <a name="git"></a>
61: git
62: ===
63:
64: All changes to the sources are committed to the git repository as soon as
65: they're somewhat verified to work. Changes shall be committed as independently
66: as possible so that individual changes can be easily spotted and tracked
67: afterwards.
68:
69: Tagging shall be used extensively, and by the time we release new archives we
70: should tag the sources with a name similar to the released version number.
71:
72: <a name="Portability"></a>
73: Portability
74: ===========
75:
76: We write curl and libcurl to compile with C89 compilers. On 32-bit and up
77: machines. Most of libcurl assumes more or less POSIX compliance but that's
78: not a requirement.
79:
80: We write libcurl to build and work with lots of third party tools, and we
81: want it to remain functional and buildable with these and later versions
82: (older versions may still work but is not what we work hard to maintain):
83:
84: Dependencies
85: ------------
86:
87: - OpenSSL 0.9.7
88: - GnuTLS 3.1.10
89: - zlib 1.1.4
90: - libssh2 0.16
91: - c-ares 1.6.0
92: - libidn2 2.0.0
93: - wolfSSL 2.0.0
94: - openldap 2.0
95: - MIT Kerberos 1.2.4
96: - GSKit V5R3M0
97: - NSS 3.14.x
98: - Heimdal ?
99: - nghttp2 1.12.0
100:
101: Operating Systems
102: -----------------
103:
104: On systems where configure runs, we aim at working on them all - if they have
105: a suitable C compiler. On systems that don't run configure, we strive to keep
106: curl running correctly on:
107:
108: - Windows 98
109: - AS/400 V5R3M0
110: - Symbian 9.1
111: - Windows CE ?
112: - TPF ?
113:
114: Build tools
115: -----------
116:
117: When writing code (mostly for generating stuff included in release tarballs)
118: we use a few "build tools" and we make sure that we remain functional with
119: these versions:
120:
121: - GNU Libtool 1.4.2
122: - GNU Autoconf 2.57
123: - GNU Automake 1.7
124: - GNU M4 1.4
125: - perl 5.004
126: - roffit 0.5
127: - groff ? (any version that supports `groff -Tps -man [in] [out]`)
128: - ps2pdf (gs) ?
129:
130: <a name="winvsunix"></a>
131: Windows vs Unix
132: ===============
133:
134: There are a few differences in how to program curl the Unix way compared to
135: the Windows way. Perhaps the four most notable details are:
136:
137: 1. Different function names for socket operations.
138:
139: In curl, this is solved with defines and macros, so that the source looks
140: the same in all places except for the header file that defines them. The
141: macros in use are `sclose()`, `sread()` and `swrite()`.
142:
143: 2. Windows requires a couple of init calls for the socket stuff.
144:
145: That's taken care of by the `curl_global_init()` call, but if other libs
146: also do it etc there might be reasons for applications to alter that
147: behaviour.
148:
149: 3. The file descriptors for network communication and file operations are
150: not as easily interchangeable as in Unix.
151:
152: We avoid this by not trying any funny tricks on file descriptors.
153:
154: 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
155: destroying binary data, although you do want that conversion if it is
156: text coming through... (sigh)
157:
158: We set stdout to binary under windows
159:
160: Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
161: conditionals that deal with features *should* instead be in the format
162: `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
163: we maintain a `curl_config-win32.h` file in lib directory that is supposed to
164: look exactly like a `curl_config.h` file would have looked like on a Windows
165: machine!
166:
167: Generally speaking: always remember that this will be compiled on dozens of
168: operating systems. Don't walk on the edge!
169:
170: <a name="Library"></a>
171: Library
172: =======
173:
174: (See [Structs in libcurl](#structs) for the separate section describing all
175: major internal structs and their purposes.)
176:
177: There are plenty of entry points to the library, namely each publicly defined
178: function that libcurl offers to applications. All of those functions are
179: rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
180: put in the `lib/easy.c` file.
181:
182: `curl_global_init()` and `curl_global_cleanup()` should be called by the
183: application to initialize and clean up global stuff in the library. As of
184: today, it can handle the global SSL initing if SSL is enabled and it can init
185: the socket layer on windows machines. libcurl itself has no "global" scope.
186:
187: All printf()-style functions use the supplied clones in `lib/mprintf.c`. This
188: makes sure we stay absolutely platform independent.
189:
190: [ `curl_easy_init()`][2] allocates an internal struct and makes some
191: initializations. The returned handle does not reveal internals. This is the
192: `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
193: functions. All connections performed will get connect-specific data allocated
194: that should be used for things related to particular connections/requests.
195:
196: [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
197: be passed in pairs: the parameter-ID and the parameter-value. The list of
198: options is documented in the man page. This function mainly sets things in
199: the `Curl_easy` struct.
200:
201: `curl_easy_perform()` is just a wrapper function that makes use of the multi
202: API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
203: `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
204: and then returns.
205:
206: Some of the most important key functions in `url.c` are called from
207: `multi.c` when certain key steps are to be made in the transfer operation.
208:
209: <a name="Curl_connect"></a>
210: Curl_connect()
211: --------------
212:
213: Analyzes the URL, it separates the different components and connects to the
214: remote host. This may involve using a proxy and/or using SSL. The
215: `Curl_resolv()` function in `lib/hostip.c` is used for looking up host
216: names (it does then use the proper underlying method, which may vary
217: between platforms and builds).
218:
219: When `Curl_connect` is done, we are connected to the remote site. Then it
220: is time to tell the server to get a document/file. `Curl_do()` arranges
221: this.
222:
223: This function makes sure there's an allocated and initiated `connectdata`
224: struct that is used for this particular connection only (although there may
225: be several requests performed on the same connect). A bunch of things are
226: inited/inherited from the `Curl_easy` struct.
227:
228: <a name="multi_do"></a>
229: multi_do()
230: ---------
231:
232: `multi_do()` makes sure the proper protocol-specific function is called.
233: The functions are named after the protocols they handle.
234:
235: The protocol-specific functions of course deal with protocol-specific
236: negotiations and setup. They have access to the `Curl_sendf()` (from
237: `lib/sendf.c`) function to send printf-style formatted data to the remote
238: host and when they're ready to make the actual file transfer they call the
239: `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the
240: transfer and returns.
241:
242: If this DO function fails and the connection is being re-used, libcurl will
243: then close this connection, setup a new connection and re-issue the DO
244: request on that. This is because there is no way to be perfectly sure that
245: we have discovered a dead connection before the DO function and thus we
246: might wrongly be re-using a connection that was closed by the remote peer.
247:
248: <a name="Curl_readwrite"></a>
249: Curl_readwrite()
250: ----------------
251:
252: Called during the transfer of the actual protocol payload.
253:
254: During transfer, the progress functions in `lib/progress.c` are called at
255: frequent intervals (or at the user's choice, a specified callback might get
256: called). The speedcheck functions in `lib/speedcheck.c` are also used to
257: verify that the transfer is as fast as required.
258:
259: <a name="multi_done"></a>
260: multi_done()
261: -----------
262:
263: Called after a transfer is done. This function takes care of everything
264: that has to be done after a transfer. This function attempts to leave
265: matters in a state so that `multi_do()` should be possible to call again on
266: the same connection (in a persistent connection case). It might also soon
267: be closed with `Curl_disconnect()`.
268:
269: <a name="Curl_disconnect"></a>
270: Curl_disconnect()
271: -----------------
272:
273: When doing normal connections and transfers, no one ever tries to close any
274: connections so this is not normally called when `curl_easy_perform()` is
275: used. This function is only used when we are certain that no more transfers
276: are going to be made on the connection. It can be also closed by force, or
277: it can be called to make sure that libcurl doesn't keep too many
278: connections alive at the same time.
279:
280: This function cleans up all resources that are associated with a single
281: connection.
282:
283: <a name="http"></a>
284: HTTP(S)
285: =======
286:
287: HTTP offers a lot and is the protocol in curl that uses the most lines of
288: code. There is a special file `lib/formdata.c` that offers all the
289: multipart post functions.
290:
291: base64-functions for user+password stuff (and more) is in `lib/base64.c`
292: and all functions for parsing and sending cookies are found in
293: `lib/cookie.c`.
294:
295: HTTPS uses in almost every case the same procedure as HTTP, with only two
296: exceptions: the connect procedure is different and the function used to read
297: or write from the socket is different, although the latter fact is hidden in
298: the source by the use of `Curl_read()` for reading and `Curl_write()` for
299: writing data to the remote server.
300:
301: `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
302: encoding.
303:
304: An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
305: series of functions we use. They append data to one single buffer, and when
306: the building is finished the entire request is sent off in one single write.
307: This is done this way to overcome problems with flawed firewalls and lame
308: servers.
309:
310: <a name="ftp"></a>
311: FTP
312: ===
313:
314: The `Curl_if2ip()` function can be used for getting the IP number of a
315: specified network interface, and it resides in `lib/if2ip.c`.
316:
317: `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
318: was made a separate function to prevent us programmers from forgetting that
319: they must be CRLF terminated. They must also be sent in one single `write()`
320: to make firewalls and similar happy.
321:
322: <a name="kerberos"></a>
323: Kerberos
324: ========
325:
326: Kerberos support is mainly in `lib/krb5.c` and `lib/security.c` but also
327: `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
328: `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
329:
330: <a name="telnet"></a>
331: TELNET
332: ======
333:
334: Telnet is implemented in `lib/telnet.c`.
335:
336: <a name="file"></a>
337: FILE
338: ====
339:
340: The `file://` protocol is dealt with in `lib/file.c`.
341:
342: <a name="smb"></a>
343: SMB
344: ===
345:
346: The `smb://` protocol is dealt with in `lib/smb.c`.
347:
348: <a name="ldap"></a>
349: LDAP
350: ====
351:
352: Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`.
353:
354: <a name="email"></a>
355: E-mail
356: ======
357:
358: The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and
359: `lib/smtp.c`.
360:
361: <a name="general"></a>
362: General
363: =======
364:
365: URL encoding and decoding, called escaping and unescaping in the source code,
366: is found in `lib/escape.c`.
367:
368: While transferring data in `Transfer()` a few functions might get used.
369: `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and
370: more).
371:
372: `lib/getenv.c` offers `curl_getenv()` which is for reading environment
373: variables in a neat platform independent way. That's used in the client, but
374: also in `lib/url.c` when checking the proxy environment variables. Note that
375: contrary to the normal unix `getenv()`, this returns an allocated buffer that
376: must be `free()`ed after use.
377:
378: `lib/netrc.c` holds the `.netrc` parser.
379:
380: `lib/timeval.c` features replacement functions for systems that don't have
381: `gettimeofday()` and a few support functions for timeval conversions.
382:
383: A function named `curl_version()` that returns the full curl version string
384: is found in `lib/version.c`.
385:
386: <a name="persistent"></a>
387: Persistent Connections
388: ======================
389:
390: The persistent connection support in libcurl requires some considerations on
391: how to do things inside of the library.
392:
393: - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
394: must never hold connection-oriented data. It is meant to hold the root data
395: as well as all the options etc that the library-user may choose.
396:
397: - The `Curl_easy` struct holds the "connection cache" (an array of
398: pointers to `connectdata` structs).
399:
400: - This enables the 'curl handle' to be reused on subsequent transfers.
401:
402: - When libcurl is told to perform a transfer, it first checks for an already
403: existing connection in the cache that we can use. Otherwise it creates a
404: new one and adds that to the cache. If the cache is full already when a new
405: connection is added, it will first close the oldest unused one.
406:
407: - When the transfer operation is complete, the connection is left
408: open. Particular options may tell libcurl not to, and protocols may signal
409: closure on connections and then they won't be kept open, of course.
410:
411: - When `curl_easy_cleanup()` is called, we close all still opened connections,
412: unless of course the multi interface "owns" the connections.
413:
414: The curl handle must be re-used in order for the persistent connections to
415: work.
416:
417: <a name="multi"></a>
418: multi interface/non-blocking
419: ============================
420:
421: The multi interface is a non-blocking interface to the library. To make that
422: interface work as well as possible, no low-level functions within libcurl
423: must be written to work in a blocking manner. (There are still a few spots
424: violating this rule.)
425:
426: One of the primary reasons we introduced c-ares support was to allow the name
427: resolve phase to be perfectly non-blocking as well.
428:
429: The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
430: the code to allow non-blocking operations even on multi-stage command-
431: response protocols. They are built around state machines that return when
432: they would otherwise block waiting for data. The DICT, LDAP and TELNET
433: protocols are crappy examples and they are subject for rewrite in the future
434: to better fit the libcurl protocol family.
435:
436: <a name="ssl"></a>
437: SSL libraries
438: =============
439:
440: Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
441: extended to its successor OpenSSL but has since also been extended to several
442: other SSL/TLS libraries and we expect and hope to further extend the support
443: in future libcurl versions.
444:
445: To deal with this internally in the best way possible, we have a generic SSL
446: function API as provided by the `vtls/vtls.[ch]` system, and they are the only
447: SSL functions we must use from within libcurl. vtls is then crafted to use
448: the appropriate lower-level function calls to whatever SSL library that is in
449: use. For example `vtls/openssl.[ch]` for the OpenSSL library.
450:
451: <a name="symbols"></a>
452: Library Symbols
453: ===============
454:
455: All symbols used internally in libcurl must use a `Curl_` prefix if they're
456: used in more than a single file. Single-file symbols must be made static.
457: Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
458: but they are to be changed to follow this pattern in future versions.) Public
459: API functions are marked with `CURL_EXTERN` in the public header files so
460: that all others can be hidden on platforms where this is possible.
461:
462: <a name="returncodes"></a>
463: Return Codes and Informationals
464: ===============================
465:
466: I've made things simple. Almost every function in libcurl returns a CURLcode,
467: that must be `CURLE_OK` if everything is OK or otherwise a suitable error
468: code as the `curl/curl.h` include file defines. The very spot that detects an
469: error must use the `Curl_failf()` function to set the human-readable error
470: description.
471:
472: In aiding the user to understand what's happening and to debug curl usage, we
473: must supply a fair number of informational messages by using the
474: `Curl_infof()` function. Those messages are only displayed when the user
475: explicitly asks for them. They are best used when revealing information that
476: isn't otherwise obvious.
477:
478: <a name="abi"></a>
479: API/ABI
480: =======
481:
482: We make an effort to not export or show internals or how internals work, as
483: that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
484: for our promise to users.
485:
486: <a name="client"></a>
487: Client
488: ======
489:
490: `main()` resides in `src/tool_main.c`.
491:
492: `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl
493: script to display the complete "manual" and the `src/tool_urlglob.c` file
494: holds the functions used for the URL-"globbing" support. Globbing in the
495: sense that the `{}` and `[]` expansion stuff is there.
496:
497: The client mostly sets up its `config` struct properly, then
498: it calls the `curl_easy_*()` functions of the library and when it gets back
499: control after the `curl_easy_perform()` it cleans up the library, checks
500: status and exits.
501:
502: When the operation is done, the `ourWriteOut()` function in `src/writeout.c`
503: may be called to report about the operation. That function is using the
504: `curl_easy_getinfo()` function to extract useful information from the curl
505: session.
506:
507: It may loop and do all this several times if many URLs were specified on the
508: command line or config file.
509:
510: <a name="memorydebug"></a>
511: Memory Debugging
512: ================
513:
514: The file `lib/memdebug.c` contains debug-versions of a few functions.
515: Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that
516: somehow deal with resources that might give us problems if we "leak" them.
517: The functions in the memdebug system do nothing fancy, they do their normal
518: function and then log information about what they just did. The logged data
519: can then be analyzed after a complete session,
520:
521: `memanalyze.pl` is the perl script present in `tests/` that analyzes a log
522: file generated by the memory tracking system. It detects if resources are
523: allocated but never freed and other kinds of errors related to resource
524: management.
525:
526: Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code
527: which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is
528: used to differentiate code which is _only_ used for memory
529: tracking/debugging.
530:
531: Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also
532: switched on by running configure with `--enable-curldebug`. Use
533: `-DDEBUGBUILD` when compiling to enable a debug build or run configure with
534: `--enable-debug`.
535:
536: `curl --version` will list 'Debug' feature for debug enabled builds, and
537: will list 'TrackMemory' feature for curl debug memory tracking capable
538: builds. These features are independent and can be controlled when running
539: the configure script. When `--enable-debug` is given both features will be
540: enabled, unless some restriction prevents memory tracking from being used.
541:
542: <a name="test"></a>
543: Test Suite
544: ==========
545:
546: The test suite is placed in its own subdirectory directly off the root in the
547: curl archive tree, and it contains a bunch of scripts and a lot of test case
548: data.
549:
550: The main test script is `runtests.pl` that will invoke test servers like
551: `httpserver.pl` and `ftpserver.pl` before all the test cases are performed.
552: The test suite currently only runs on Unix-like platforms.
553:
554: You'll find a description of the test suite in the `tests/README` file, and
555: the test case data files in the `tests/FILEFORMAT` file.
556:
557: The test suite automatically detects if curl was built with the memory
558: debugging enabled, and if it was, it will detect memory leaks, too.
559:
560: <a name="asyncdns"></a>
561: Asynchronous name resolves
562: ==========================
563:
564: libcurl can be built to do name resolves asynchronously, using either the
565: normal resolver in a threaded manner or by using c-ares.
566:
567: <a name="cares"></a>
568: [c-ares][3]
569: ------
570:
571: ### Build libcurl to use a c-ares
572:
573: 1. ./configure --enable-ares=/path/to/ares/install
574: 2. make
575:
576: ### c-ares on win32
577:
578: First I compiled c-ares. I changed the default C runtime library to be the
579: single-threaded rather than the multi-threaded (this seems to be required to
580: prevent linking errors later on). Then I simply build the areslib project
581: (the other projects adig/ahost seem to fail under MSVC).
582:
583: Next was libcurl. I opened `lib/config-win32.h` and I added a:
584: `#define USE_ARES 1`
585:
586: Next thing I did was I added the path for the ares includes to the include
587: path, and the libares.lib to the libraries.
588:
589: Lastly, I also changed libcurl to be single-threaded rather than
590: multi-threaded, again this was to prevent some duplicate symbol errors. I'm
591: not sure why I needed to change everything to single-threaded, but when I
592: didn't I got redefinition errors for several CRT functions (`malloc()`,
593: `stricmp()`, etc.)
594:
595: <a name="curl_off_t"></a>
596: `curl_off_t`
597: ==========
598:
599: `curl_off_t` is a data type provided by the external libcurl include
600: headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
601: options that end with LARGE. The type is 64-bit large on most modern
602: platforms.
603:
604: <a name="curlx"></a>
605: curlx
606: =====
607:
608: The libcurl source code offers a few functions by source only. They are not
609: part of the official libcurl API, but the source files might be useful for
610: others so apps can optionally compile/build with these sources to gain
611: additional functions.
612:
613: We provide them through a single header file for easy access for apps:
614: `curlx.h`
615:
616: `curlx_strtoofft()`
617: -------------------
618: A macro that converts a string containing a number to a `curl_off_t` number.
619: This might use the `curlx_strtoll()` function which is provided as source
620: code in strtoofft.c. Note that the function is only provided if no
621: `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t`
622: is only a 32-bit number on your platform, this macro uses `strtol()`.
623:
624: Future
625: ------
626:
627: Several functions will be removed from the public `curl_` name space in a
628: future libcurl release. They will then only become available as `curlx_`
629: functions instead. To make the transition easier, we already today provide
630: these functions with the `curlx_` prefix to allow sources to be built
631: properly with the new function names. The concerned functions are:
632:
633: - `curlx_getenv`
634: - `curlx_strequal`
635: - `curlx_strnequal`
636: - `curlx_mvsnprintf`
637: - `curlx_msnprintf`
638: - `curlx_maprintf`
639: - `curlx_mvaprintf`
640: - `curlx_msprintf`
641: - `curlx_mprintf`
642: - `curlx_mfprintf`
643: - `curlx_mvsprintf`
644: - `curlx_mvprintf`
645: - `curlx_mvfprintf`
646:
647: <a name="contentencoding"></a>
648: Content Encoding
649: ================
650:
651: ## About content encodings
652:
653: [HTTP/1.1][4] specifies that a client may request that a server encode its
654: response. This is usually used to compress a response using one (or more)
655: encodings from a set of commonly available compression techniques. These
656: schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and
657: `compress`. A client requests that the server perform an encoding by including
658: an `Accept-Encoding` header in the request document. The value of the header
659: should be one of the recognized tokens `deflate`, ... (there's a way to
660: register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
661: the client's encoding request. When a response is encoded, the server
662: includes a `Content-Encoding` header in the response. The value of the
663: `Content-Encoding` header indicates which encodings were used to encode the
664: data, in the order in which they were applied.
665:
666: It's also possible for a client to attach priorities to different schemes so
667: that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
668: information on the `Accept-Encoding` header. See sec
669: [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding`
670: header.
671:
672: ## Supported content encodings
673:
674: The `deflate`, `gzip` and `br` content encodings are supported by libcurl.
675: Both regular and chunked transfers work fine. The zlib library is required
676: for the `deflate` and `gzip` encodings, while the brotli decoding library is
677: for the `br` encoding.
678:
679: ## The libcurl interface
680:
681: To cause libcurl to request a content encoding use:
682:
683: [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
684:
685: where string is the intended value of the `Accept-Encoding` header.
686:
687: Currently, libcurl does support multiple encodings but only
688: understands how to process responses that use the `deflate`, `gzip` and/or
689: `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
690: that will work (besides `identity`, which does nothing) are `deflate`,
691: `gzip` and `br`. If a response is encoded using the `compress` or methods,
692: libcurl will return an error indicating that the response could
693: not be decoded. If `<string>` is NULL no `Accept-Encoding` header is
694: generated. If `<string>` is a zero-length string, then an `Accept-Encoding`
695: header containing all supported encodings will be generated.
696:
697: The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
698: content to be automatically decoded. If it is not set and the server still
699: sends encoded content (despite not having been asked), the data is returned
700: in its raw form and the `Content-Encoding` type is not checked.
701:
702: ## The curl interface
703:
704: Use the [`--compressed`][6] option with curl to cause it to ask servers to
705: compress responses using any format supported by curl.
706:
707: <a name="hostip"></a>
708: `hostip.c` explained
709: ====================
710:
711: The main compile-time defines to keep in mind when reading the `host*.c`
712: source file are these:
713:
714: ## `CURLRES_IPV6`
715:
716: this host has `getaddrinfo()` and family, and thus we use that. The host may
717: not be able to resolve IPv6, but we don't really have to take that into
718: account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
719:
720: ## `CURLRES_ARES`
721:
722: is defined if libcurl is built to use c-ares for asynchronous name
723: resolves. This can be Windows or \*nix.
724:
725: ## `CURLRES_THREADED`
726:
727: is defined if libcurl is built to use threading for asynchronous name
728: resolves. The name resolve will be done in a new thread, and the supported
729: asynch API will be the same as for ares-builds. This is the default under
730: (native) Windows.
731:
732: If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
733: libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
734: defined.
735:
736: ## `host*.c` sources
737:
738: The `host*.c` sources files are split up like this:
739:
740: - `hostip.c` - method-independent resolver functions and utility functions
741: - `hostasyn.c` - functions for asynchronous name resolves
742: - `hostsyn.c` - functions for synchronous name resolves
743: - `asyn-ares.c` - functions for asynchronous name resolves using c-ares
744: - `asyn-thread.c` - functions for asynchronous name resolves using threads
745: - `hostip4.c` - IPv4 specific functions
746: - `hostip6.c` - IPv6 specific functions
747:
748: The `hostip.h` is the single united header file for all this. It defines the
749: `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines.
750:
751: <a name="memoryleak"></a>
752: Track Down Memory Leaks
753: =======================
754:
755: ## Single-threaded
756:
757: Please note that this memory leak system is not adjusted to work in more
758: than one thread. If you want/need to use it in a multi-threaded app. Please
759: adjust accordingly.
760:
761: ## Build
762:
763: Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with
764: `--enable-debug` fixes this). `make clean` first, then `make` so that all
765: files are actually rebuilt properly. It will also make sense to build
766: libcurl with the debug option (usually `-g` to the compiler) so that
767: debugging it will be easier if you actually do find a leak in the library.
768:
769: This will create a library that has memory debugging enabled.
770:
771: ## Modify Your Application
772:
773: Add a line in your application code:
774:
775: `curl_dbg_memdebug("dump");`
776:
777: This will make the malloc debug system output a full trace of all resource
778: using functions to the given file name. Make sure you rebuild your program
779: and that you link with the same libcurl you built for this purpose as
780: described above.
781:
782: ## Run Your Application
783:
784: Run your program as usual. Watch the specified memory trace file grow.
785:
786: Make your program exit and use the proper libcurl cleanup functions etc. So
787: that all non-leaks are returned/freed properly.
788:
789: ## Analyze the Flow
790:
791: Use the `tests/memanalyze.pl` perl script to analyze the dump file:
792:
793: tests/memanalyze.pl dump
794:
795: This now outputs a report on what resources that were allocated but never
796: freed etc. This report is very fine for posting to the list!
797:
798: If this doesn't produce any output, no leak was detected in libcurl. Then
799: the leak is mostly likely to be in your code.
800:
801: <a name="multi_socket"></a>
802: `multi_socket`
803: ==============
804:
805: Implementation of the `curl_multi_socket` API
806:
807: The main ideas of this API are simply:
808:
809: 1. The application can use whatever event system it likes as it gets info
810: from libcurl about what file descriptors libcurl waits for what action
811: on. (The previous API returns `fd_sets` which is very
812: `select()`-centric).
813:
814: 2. When the application discovers action on a single socket, it calls
815: libcurl and informs that there was action on this particular socket and
816: libcurl can then act on that socket/transfer only and not care about
817: any other transfers. (The previous API always had to scan through all
818: the existing transfers.)
819:
820: The idea is that [`curl_multi_socket_action()`][7] calls a given callback
821: with information about what socket to wait for what action on, and the
822: callback only gets called if the status of that socket has changed.
823:
824: We also added a timer callback that makes libcurl call the application when
825: the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
826: and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
827: Internally, there's an added struct to each easy handle in which we store
828: an "expire time" (if any). The structs are then "splay sorted" so that we
829: can add and remove times from the linked list and yet somewhat swiftly
830: figure out both how long there is until the next nearest timer expires
831: and which timer (handle) we should take care of now. Of course, the upside
832: of all this is that we get a [`curl_multi_timeout()`][8] that should also
833: work with old-style applications that use [`curl_multi_perform()`][11].
834:
835: We created an internal "socket to easy handles" hash table that given
836: a socket (file descriptor) returns the easy handle that waits for action on
837: that socket. This hash is made using the already existing hash code
838: (previously only used for the DNS cache).
839:
840: To make libcurl able to report plain sockets in the socket callback, we had
841: to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
842: the conversion from sockets to `fd_sets` for that function is only done in
843: the last step before the data is returned. I also had to extend c-ares to
844: get a function that can return plain sockets, as that library too returned
845: only `fd_sets` and that is no longer good enough. The changes done to c-ares
846: are available in c-ares 1.3.1 and later.
847:
848: <a name="structs"></a>
849: Structs in libcurl
850: ==================
851:
852: This section should cover 7.32.0 pretty accurately, but will make sense even
853: for older and later versions as things don't change drastically that often.
854:
855: <a name="Curl_easy"></a>
856: ## Curl_easy
857:
858: The `Curl_easy` struct is the one returned to the outside in the external API
859: as a `CURL *`. This is usually known as an easy handle in API documentations
860: and examples.
861:
862: Information and state that is related to the actual connection is in the
863: `connectdata` struct. When a transfer is about to be made, libcurl will
864: either create a new connection or re-use an existing one. The particular
865: connectdata that is used by this handle is pointed out by
866: `Curl_easy->easy_conn`.
867:
868: Data and information that regard this particular single transfer is put in
869: the `SingleRequest` sub-struct.
870:
871: When the `Curl_easy` struct is added to a multi handle, as it must be in
872: order to do any transfer, the `->multi` member will point to the `Curl_multi`
873: struct it belongs to. The `->prev` and `->next` members will then be used by
874: the multi code to keep a linked list of `Curl_easy` structs that are added to
875: that same multi handle. libcurl always uses multi so `->multi` *will* point
876: to a `Curl_multi` when a transfer is in progress.
877:
878: `->mstate` is the multi state of this particular `Curl_easy`. When
879: `multi_runsingle()` is called, it will act on this handle according to which
880: state it is in. The mstate is also what tells which sockets to return for a
881: specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
882:
883: The libcurl source code generally use the name `data` for the variable that
884: points to the `Curl_easy`.
885:
886: When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
887: an individual stream, sharing the same connectdata struct. Multiplexing
888: makes it even more important to keep things associated with the right thing!
889:
890: <a name="connectdata"></a>
891: ## connectdata
892:
893: A general idea in libcurl is to keep connections around in a connection
894: "cache" after they have been used in case they will be used again and then
895: re-use an existing one instead of creating a new as it creates a significant
896: performance boost.
897:
898: Each `connectdata` identifies a single physical connection to a server. If
899: the connection can't be kept alive, the connection will be closed after use
900: and then this struct can be removed from the cache and freed.
901:
902: Thus, the same `Curl_easy` can be used multiple times and each time select
903: another `connectdata` struct to use for the connection. Keep this in mind,
904: as it is then important to consider if options or choices are based on the
905: connection or the `Curl_easy`.
906:
907: Functions in libcurl will assume that `connectdata->data` points to the
908: `Curl_easy` that uses this connection (for the moment).
909:
910: As a special complexity, some protocols supported by libcurl require a
911: special disconnect procedure that is more than just shutting down the
912: socket. It can involve sending one or more commands to the server before
913: doing so. Since connections are kept in the connection cache after use, the
914: original `Curl_easy` may no longer be around when the time comes to shut down
915: a particular connection. For this purpose, libcurl holds a special dummy
916: `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
917:
918: FTP uses two TCP connections for a typical transfer but it keeps both in
919: this single struct and thus can be considered a single connection for most
920: internal concerns.
921:
922: The libcurl source code generally use the name `conn` for the variable that
923: points to the connectdata.
924:
925: <a name="Curl_multi"></a>
926: ## Curl_multi
927:
928: Internally, the easy interface is implemented as a wrapper around multi
929: interface functions. This makes everything multi interface.
930:
931: `Curl_multi` is the multi handle struct exposed as `CURLM *` in external
932: APIs.
933:
934: This struct holds a list of `Curl_easy` structs that have been added to this
935: handle with [`curl_multi_add_handle()`][13]. The start of the list is
936: `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
937:
938: `->msglist` is a linked list of messages to send back when
939: [`curl_multi_info_read()`][14] is called. Basically a node is added to that
940: list when an individual `Curl_easy`'s transfer has completed.
941:
942: `->hostcache` points to the name cache. It is a hash table for looking up
943: name to IP. The nodes have a limited life time in there and this cache is
944: meant to reduce the time for when the same name is wanted within a short
945: period of time.
946:
947: `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
948: until it should be checked - normally some sort of timeout. Each `Curl_easy`
949: has one node in the tree.
950:
951: `->sockhash` is a hash table to allow fast lookups of socket descriptor for
952: which `Curl_easy` uses that descriptor. This is necessary for the
953: `multi_socket` API.
954:
955: `->conn_cache` points to the connection cache. It keeps track of all
956: connections that are kept after use. The cache has a maximum size.
957:
958: `->closure_handle` is described in the `connectdata` section.
959:
960: The libcurl source code generally use the name `multi` for the variable that
961: points to the `Curl_multi` struct.
962:
963: <a name="Curl_handler"></a>
964: ## Curl_handler
965:
966: Each unique protocol that is supported by libcurl needs to provide at least
967: one `Curl_handler` struct. It defines what the protocol is called and what
968: functions the main code should call to deal with protocol specific issues.
969: In general, there's a source file named `[protocol].c` in which there's a
970: `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's
971: then the main array with all individual `Curl_handler` structs pointed to
972: from a single array which is scanned through when a URL is given to libcurl
973: to work with.
974:
975: `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
976: "HTTP" or "FTP" etc. SSL versions of the protocol need their own
977: `Curl_handler` setup so HTTPS separate from HTTP.
978:
979: `->setup_connection` is called to allow the protocol code to allocate
980: protocol specific data that then gets associated with that `Curl_easy` for
981: the rest of this transfer. It gets freed again at the end of the transfer.
982: It will be called before the `connectdata` for the transfer has been
983: selected/created. Most protocols will allocate its private
984: `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it.
985:
986: `->connect_it` allows a protocol to do some specific actions after the TCP
987: connect is done, that can still be considered part of the connection phase.
988:
989: Some protocols will alter the `connectdata->recv[]` and
990: `connectdata->send[]` function pointers in this function.
991:
992: `->connecting` is similarly a function that keeps getting called as long as
993: the protocol considers itself still in the connecting phase.
994:
995: `->do_it` is the function called to issue the transfer request. What we call
996: the DO action internally. If the DO is not enough and things need to be kept
997: getting done for the entire DO sequence to complete, `->doing` is then
998: usually also provided. Each protocol that needs to do multiple commands or
999: similar for do/doing need to implement their own state machines (see SCP,
1000: SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
1001: a separate piece of the DO state called `DO_MORE`.
1002:
1003: `->doing` keeps getting called while issuing the transfer request command(s)
1004:
1005: `->done` gets called when the transfer is complete and DONE. That's after the
1006: main data has been transferred.
1007:
1008: `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
1009: this state when setting up the second connection.
1010:
1011: `->proto_getsock`
1012: `->doing_getsock`
1013: `->domore_getsock`
1014: `->perform_getsock`
1015: Functions that return socket information. Which socket(s) to wait for which
1016: action(s) during the particular multi state.
1017:
1018: `->disconnect` is called immediately before the TCP connection is shutdown.
1019:
1020: `->readwrite` gets called during transfer to allow the protocol to do extra
1021: reads/writes
1022:
1023: `->defport` is the default report TCP or UDP port this protocol uses
1024:
1025: `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions
1026: have their "base" protocol set and then the SSL variation. Like
1027: "HTTP|HTTPS".
1028:
1029: `->flags` is a bitmask with additional information about the protocol that will
1030: make it get treated differently by the generic engine:
1031:
1032: - `PROTOPT_SSL` - will make it connect and negotiate SSL
1033:
1034: - `PROTOPT_DUAL` - this protocol uses two connections
1035:
1036: - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1037: connection. This flag is no longer used by code, yet still set for a bunch
1038: of protocol handlers.
1039:
1040: - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1041: limit which "direction" of socket actions that the main engine will
1042: concern itself with.
1043:
1044: - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`)
1045:
1046: - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1047: one unless one is provided
1048:
1049: - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1050: (?foo=bar)
1051:
1052: <a name="conncache"></a>
1053: ## conncache
1054:
1055: Is a hash table with connections for later re-use. Each `Curl_easy` has a
1056: pointer to its connection cache. Each multi handle sets up a connection
1057: cache that all added `Curl_easy`s share by default.
1058:
1059: <a name="Curl_share"></a>
1060: ## Curl_share
1061:
1062: The libcurl share API allocates a `Curl_share` struct, exposed to the
1063: external API as `CURLSH *`.
1064:
1065: The idea is that the struct can have a set of its own versions of caches and
1066: pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1067: specific `Curl_easy`s will use the caches/pools that this share handle
1068: holds.
1069:
1070: Then individual `Curl_easy` structs can be made to share specific things
1071: that they otherwise wouldn't, such as cookies.
1072:
1073: The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1074: session cache.
1075:
1076: <a name="CookieInfo"></a>
1077: ## CookieInfo
1078:
1079: This is the main cookie struct. It holds all known cookies and related
1080: information. Each `Curl_easy` has its own private `CookieInfo` even when
1081: they are added to a multi handle. They can be made to share cookies by using
1082: the share API.
1083:
1084:
1085: [1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1086: [2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
1087: [3]: https://c-ares.haxx.se/
1088: [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1089: [5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1090: [6]: https://curl.haxx.se/docs/manpage.html#--compressed
1091: [7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1092: [8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1093: [9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1094: [10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1095: [11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
1096: [12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1097: [13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1098: [14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
1099: [15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>