Annotation of elwix/tools/oldlzma/lzma.txt, revision 1.1.1.1
1.1 misho 1: LZMA SDK 4.17
2: -------------
3:
4: LZMA SDK 4.17 Copyright (C) 1999-2005 Igor Pavlov
5:
6: LZMA SDK provides developers with documentation, source code,
7: and sample code necessary to write software that uses LZMA compression.
8:
9: LZMA is default and general compression method of 7z format
10: in 7-Zip compression program (www.7-zip.org). LZMA provides high
11: compression ratio and very fast decompression.
12:
13: LZMA is an improved version of famous LZ77 compression algorithm.
14: It was improved in way of maximum increasing of compression ratio,
15: keeping high decompression speed and low memory requirements for
16: decompressing.
17:
18:
19:
20: LICENSE
21: -------
22:
23: LZMA SDK is licensed under two licenses:
24:
25: 1) GNU Lesser General Public License (GNU LGPL)
26: 2) Common Public License (CPL)
27:
28: It means that you can select one of these two licenses and
29: follow rules of that license.
30:
31: SPECIAL EXCEPTION
32: Igor Pavlov, as the author of this code, expressly permits you
33: to statically or dynamically link your code (or bind by name)
34: to the files from LZMA SDK without subjecting your linked
35: code to the terms of the CPL or GNU LGPL.
36: Any modifications or additions to files from LZMA SDK, however,
37: are subject to the GNU LGPL or CPL terms.
38:
39:
40: GNU LGPL and CPL licenses are pretty similar and both these
41: licenses are classified as
42:
43: 1) "Free software licenses" at http://www.gnu.org/
44: 2) "OSI-approved" at http://www.opensource.org/
45:
46:
47: LZMA SDK also can be available under a proprietary license for
48: those who cannot use the GNU LGPL or CPL in their code. To request
49: such proprietary license or any additional consultations,
50: send email message from that page:
51: http://www.7-zip.org/support.html
52:
53:
54: You should have received a copy of the GNU Lesser General Public
55: License along with this library; if not, write to the Free Software
56: Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
57:
58: You should have received a copy of the Common Public License
59: along with this library.
60:
61:
62: LZMA SDK Contents
63: -----------------
64:
65: LZMA SDK includes:
66:
67: - C++ source code of LZMA Encoder and Decoder
68: - C++ source code for file->file LZMA compressing and decompressing
69: - ANSI-C compatible source code for LZMA decompressing
70: - Compiled file->file LZMA compressing/decompressing program for Windows system
71:
72: ANSI-C LZMA decompression code was ported from original C++ sources to C.
73: Also it was simplified and optimized for code size.
74: But it is fully compatible with LZMA from 7-Zip.
75:
76:
77: UNIX/Linux version
78: ------------------
79: To compile C++ version of file->file LZMA, go to directory
80: SRC/7zip/Compress/LZMA_Alone
81: and type "make" or "make clean all" to recompile all.
82:
83: In some UNIX/Linux versions you must compile LZMA with static libraries.
84: To compile with static libraries, change string in makefile
85: LIB = -lm
86: to string
87: LIB = -lm -static
88:
89:
90: Files
91: ---------------------
92: SRC - directory with source code
93: lzma.txt - LZMA SDK description (this file)
94: 7zFormat.txt - 7z Format description
95: 7zC.txt - 7z ANSI-C Decoder description (this file)
96: methods.txt - Compression method IDs for .7z
97: LGPL.txt - GNU Lesser General Public License
98: CPL.html - Common Public License
99: lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
100: history.txt - history of the LZMA SDK
101:
102:
103: Source code structure
104: ---------------------
105:
106: SRC
107: Common - common files for C++ projects
108: Windows - common files for Windows related code
109: 7zip - files related to 7-Zip Project
110: Common - common files for 7-Zip
111: Compress - files related to compression/decompression
112: LZ - files related to LZ (Lempel-Ziv) compression algorithm
113: BinTree - Binary Tree Match Finder for LZ algorithm
114: HashChain - Hash Chain Match Finder for LZ algorithm
115: Patricia - Patricia Match Finder for LZ algorithm
116: RangeCoder - Range Coder (special code of compression/decompression)
117: LZMA - LZMA compression/decompression on C++
118: LZMA_Alone - file->file LZMA compression/decompression
119: LZMA_C - ANSI-C compatible LZMA decompressor
120: LzmaDecode.h - interface for LZMA decoding on ANSI-C
121: LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version)
122: LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version)
123: LzmaTest.c - test application that decodes LZMA encoded file
124: Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
125: Archive - files related to archiving
126: 7z_C - 7z ANSI-C Decoder
127:
128: Source code of LZMA SDK is only part of big 7-Zip project. That is
129: why LZMA SDK uses such complex source code structure.
130:
131: You can find ANSI-C LZMA decompressing code at folder
132: SRC/7zip/Compress/LZMA_C
133: 7-Zip doesn't use that ANSI-C LZMA code and that code was developed
134: specially for this SDK. And files from LZMA_C do not need files from
135: other directories of SDK for compiling.
136:
137: 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
138:
139: http://sourceforge.net/projects/sevenzip/
140:
141:
142: LZMA Decompression features
143: ---------------------------
144: - Variable dictionary size (up to 256 MB)
145: - Estimated compressing speed: about 500 KB/s on 1 GHz CPU
146: - Estimated decompressing speed:
147: - 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon
148: - 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC
149: - Small memory requirements for decompressing (8-32 KB + DictionarySize)
150: - Small code size for decompressing: 2-8 KB (depending from
151: speed optimizations)
152:
153: LZMA decoder uses only integer operations and can be
154: implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
155:
156: Some critical operations that affect to speed of LZMA decompression:
157: 1) 32*16 bit integer multiply
158: 2) Misspredicted branches (penalty mostly depends from pipeline length)
159: 3) 32-bit shift and arithmetic operations
160:
161: Speed of LZMA decompression mostly depends from CPU speed.
162: Memory speed has no big meaning. But if your CPU has small data cache,
163: overall weight of memory speed will slightly increase.
164:
165:
166: How To Use
167: ----------
168:
169: Using LZMA encoder/decoder executable
170: --------------------------------------
171:
172: Usage: LZMA <e|d> inputFile outputFile [<switches>...]
173:
174: e: encode file
175:
176: d: decode file
177:
178: b: Benchmark. There are two tests: compressing and decompressing
179: with LZMA method. Benchmark shows rating in MIPS (million
180: instructions per second). Rating value is calculated from
181: measured speed and it is normalized with AMD Athlon XP CPU
182: results. Also Benchmark checks possible hardware errors (RAM
183: errors in most cases). Benchmark uses these settings:
184: (-a1, -d21, -fb32, -mfbt4). You can change only -d. Also you
185: can change number of iterations. Example for 30 iterations:
186: LZMA b 30
187: Default number of iterations is 10.
188:
189: <Switches>
190:
191:
192: -a{N}: set compression mode 0 = fast, 1 = normal, 2 = max
193: default: 2 (max)
194:
195: d{N}: Sets Dictionary size - [0, 28], default: 23 (8MB)
196: The maximum value for dictionary size is 256 MB = 2^28 bytes.
197: Dictionary size is calculated as DictionarySize = 2^N bytes.
198: For decompressing file compressed by LZMA method with dictionary
199: size D = 2^N you need about D bytes of memory (RAM).
200:
201: -fb{N}: set number of fast bytes - [5, 255], default: 128
202: Usually big number gives a little bit better compression ratio
203: and slower compression process.
204:
205: -lc{N}: set number of literal context bits - [0, 8], default: 3
206: Sometimes lc=4 gives gain for big files.
207:
208: -lp{N}: set number of literal pos bits - [0, 4], default: 0
209: lp switch is intended for periodical data when period is
210: equal 2^N. For example, for 32-bit (4 bytes)
211: periodical data you can use lp=2. Often it's better to set lc0,
212: if you change lp switch.
213:
214: -pb{N}: set number of pos bits - [0, 4], default: 2
215: pb switch is intended for periodical data
216: when period is equal 2^N.
217:
218: -mf{MF_ID}: set Match Finder. Default: bt4.
219: Compression ratio for all bt* and pat* almost the same.
220: Algorithms from hc* group doesn't provide good compression
221: ratio, but they often works pretty fast in combination with
222: fast mode (-a0). Methods from bt* group require less memory
223: than methods from pat* group. Usually bt4 works faster than
224: any pat*, but for some types of files pat* can work faster.
225:
226: Memory requirements depend from dictionary size
227: (parameter "d" in table below).
228:
229: MF_ID Memory Description
230:
231: bt2 d*9.5 + 1MB Binary Tree with 2 bytes hashing.
232: bt3 d*9.5 + 65MB Binary Tree with 2-3(full) bytes hashing.
233: bt4 d*9.5 + 6MB Binary Tree with 2-3-4 bytes hashing.
234: bt4b d*9.5 + 34MB Binary Tree with 2-3-4(big) bytes hashing.
235: pat2r d*26 + 1MB Patricia Tree with 2-bits nodes, removing.
236: pat2 d*38 + 1MB Patricia Tree with 2-bits nodes.
237: pat2h d*38 + 77MB Patricia Tree with 2-bits nodes, 2-3 bytes hashing.
238: pat3h d*62 + 85MB Patricia Tree with 3-bits nodes, 2-3 bytes hashing.
239: pat4h d*110 +101MB Patricia Tree with 4-bits nodes, 2-3 bytes hashing.
240: hc3 d*5.5 + 1MB Hash Chain with 2-3 bytes hashing.
241: hc4 d*5.5 + 6MB Hash Chain with 2-3-4 bytes hashing.
242:
243: -eos: write End Of Stream marker. By default LZMA doesn't write
244: eos marker, since LZMA decoder knows uncompressed size
245: stored in .lzma file header.
246:
247: -si: Read data from stdin (it will write End Of Stream marker).
248: -so: Write data to stdout
249:
250:
251: Examples:
252:
253: 1) LZMA e file.bin file.lzma -d16 -lc0
254:
255: compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
256: and 0 literal context bits. -lc0 allows to reduce memory requirements
257: for decompression.
258:
259:
260: 2) LZMA e file.bin file.lzma -lc0 -lp2
261:
262: compresses file.bin to file.lzma with settings suitable
263: for 32-bit periodical data (for example, ARM or MIPS code).
264:
265: 3) LZMA d file.lzma file.bin
266:
267: decompresses file.lzma to file.bin.
268:
269:
270: Compression ratio hints
271: -----------------------
272:
273: Recommendations
274: ---------------
275:
276: To increase compression ratio for LZMA compressing it's desirable
277: to have aligned data (if it's possible) and also it's desirable to locate
278: data in such order, where code is grouped in one place and data is
279: grouped in other place (it's better than such mixing: code, data, code,
280: data, ...).
281:
282:
283: Using Filters
284: -------------
285: You can increase compression ratio for some data types, using
286: special filters before compressing. For example, it's possible to
287: increase compression ratio on 5-10% for code for those CPU ISAs:
288: x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
289:
290: You can find C/C++ source code of such filters in folder "7zip/Compress/Branch"
291:
292: You can check compression ratio gain of these filters with such
293: 7-Zip commands (example for ARM code):
294: No filter:
295: 7z a a1.7z a.bin -m0=lzma
296:
297: With filter for little-endian ARM code:
298: 7z a a2.7z a.bin -m0=bc_arm -m1=lzma
299:
300: With filter for big-endian ARM code (using additional Swap4 filter):
301: 7z a a3.7z a.bin -m0=swap4 -m1=bc_arm -m2=lzma
302:
303: It works in such manner:
304: Compressing = Filter_encoding + LZMA_encoding
305: Decompressing = LZMA_decoding + Filter_decoding
306:
307: Compressing and decompressing speed of such filters is very high,
308: so it will not increase decompressing time too much.
309: Moreover, it reduces decompression time for LZMA_decoding,
310: since compression ratio with filtering is higher.
311:
312: These filters convert CALL (calling procedure) instructions
313: from relative offsets to absolute addresses, so such data becomes more
314: compressible. Source code of these CALL filters is pretty simple
315: (about 20 lines of C++), so you can convert it from C++ version yourself.
316:
317: For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
318:
319:
320: LZMA compressed file format
321: ---------------------------
322: Offset Size Description
323: 0 1 Special LZMA properties for compressed data
324: 1 4 Dictionary size (little endian)
325: 5 8 Uncompressed size (little endian). -1 means unknown size
326: 13 Compressed data
327:
328:
329: ANSI-C LZMA Decoder
330: ~~~~~~~~~~~~~~~~~~~
331:
332: To use ANSI-C LZMA Decoder you need to files:
333: LzmaDecode.h and one of the following two files:
334: 1) LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version)
335: 2) LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version)
336: use LzmaDecode.c, if you need fastest code.
337:
338:
339: Memory requirements for LZMA decoding
340: -------------------------------------
341:
342: LZMA decoder doesn't allocate memory itself, so you must
343: calculate required memory, allocate it and send it to LZMA.
344:
345: Stack usage of LZMA function for local variables is not
346: larger than 200 bytes.
347:
348: Memory requirements for decompression depend
349: from interface that you want to use:
350:
351: a) Memory to memory decompression:
352:
353: M1 = (inputSize + outputSize + lzmaInternalSize).
354:
355: b) Decompression with buffering:
356:
357: M2 = (inputBufferSize + outputBufferSize + dictionarySize + lzmaInternalSize)
358:
359:
360: How To decompress data
361: ----------------------
362:
363: 1) Read first byte of properties for LZMA compressed stream,
364: check that it has correct value and calculate three
365: LZMA property variables:
366:
367: int lc, lp, pb;
368: unsigned char prop0 = properties[0];
369: if (prop0 >= (9*5*5))
370: {
371: sprintf(rs + strlen(rs), "\n properties error");
372: return 1;
373: }
374: for (pb = 0; prop0 >= (9 * 5);
375: pb++, prop0 -= (9 * 5));
376: for (lp = 0; prop0 >= 9;
377: lp++, prop0 -= 9);
378: lc = prop0;
379:
380: 2) Calculate required amount for LZMA lzmaInternalSize:
381:
382: lzmaInternalSize = (LZMA_BASE_SIZE + (LZMA_LIT_SIZE << (lc + lp))) *
383: sizeof(CProb)
384:
385: LZMA_BASE_SIZE = 1846
386: LZMA_LIT_SIZE = 768
387:
388: LZMA decoder uses array of CProb variables as internal structure.
389: By default, CProb is (unsigned short)
390: But you can define _LZMA_PROB32 to make it (unsigned int)
391: It can increase speed on some 32-bit CPUs, but memory usage will
392: be doubled in that case.
393:
394:
395: 2b) If you use Decompression with buffering, add 100 bytes to
396: lzmaInternalSize:
397:
398: #ifdef _LZMA_OUT_READ
399: lzmaInternalSize += 100;
400: #endif
401:
402: 3) Allocate that memory with malloc or some other function:
403:
404: lzmaInternalData = malloc(lzmaInternalSize);
405:
406:
407: 4) Decompress data:
408:
409: 4a) If you use simple memory to memory decompression:
410:
411: int result = LzmaDecode(lzmaInternalData, lzmaInternalSize,
412: lc, lp, pb,
413: unsigned char *inStream, unsigned int inSize,
414: unsigned char *outStream, unsigned int outSize,
415: &outSizeProcessed);
416:
417: 4b) If you use Decompression with buffering
418:
419: 4.1) Read dictionary size from properties
420:
421: unsigned int dictionarySize = 0;
422: int i;
423: for (i = 0; i < 4; i++)
424: dictionarySize += (unsigned int)(b) << (i * 8);
425:
426: 4.2) Allocate memory for dictionary
427:
428: unsigned char *dictionary = malloc(dictionarySize);
429:
430: 4.3) Initialize LZMA decoder:
431:
432: LzmaDecoderInit((unsigned char *)lzmaInternalData, lzmaInternalSize,
433: lc, lp, pb,
434: dictionary, dictionarySize,
435: &bo.ReadCallback);
436:
437: 4.4) In loop call LzmaDecoderCode function:
438:
439: for (nowPos = 0; nowPos < outSize;)
440: {
441: unsigned int blockSize = outSize - nowPos;
442: unsigned int kBlockSize = 0x10000;
443: if (blockSize > kBlockSize)
444: blockSize = kBlockSize;
445: res = LzmaDecode((unsigned char *)lzmaInternalData,
446: ((unsigned char *)outStream) + nowPos, blockSize, &outSizeProcessed);
447: if (res != 0)
448: {
449: printf("\nerror = %d\n", res);
450: break;
451: }
452: nowPos += outSizeProcessed;
453: if (outSizeProcessed == 0)
454: {
455: outSize = nowPos;
456: break;
457: }
458: }
459:
460:
461: EXIT codes
462: -----------
463:
464: LZMA decoder can return one of the following codes:
465:
466: #define LZMA_RESULT_OK 0
467: #define LZMA_RESULT_DATA_ERROR 1
468: #define LZMA_RESULT_NOT_ENOUGH_MEM 2
469:
470: If you use callback function for input data and you return some
471: error code, LZMA Decoder also returns that code.
472:
473:
474:
475: LZMA Defines
476: ------------
477:
478: _LZMA_IN_CB - Use callback for input data
479:
480: _LZMA_OUT_READ - Use read function for output data
481:
482: _LZMA_LOC_OPT - Enable local speed optimizations inside code.
483: _LZMA_LOC_OPT is only for LzmaDecodeSize.c (size-optimized version).
484: _LZMA_LOC_OPT doesn't affect LzmaDecode.c (speed-optimized version)
485:
486: _LZMA_PROB32 - It can increase speed on some 32-bit CPUs,
487: but memory usage will be doubled in that case
488:
489: _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler
490: and long is 32-bit.
491:
492:
493: NOTES
494: -----
495: 1) please note that LzmaTest.c doesn't free allocated memory in some cases.
496: But in your real applicaions you must free memory after decompression.
497:
498: 2) All numbers above were calculated for case when int is not more than
499: 32-bit in your compiler. If in your compiler int is 64-bit or larger
500: probably LZMA can require more memory for some structures.
501:
502:
503:
504: C++ LZMA Encoder/Decoder
505: ~~~~~~~~~~~~~~~~~~~~~~~~
506: C++ LZMA code use COM-like interfaces. So if you want to use it,
507: you can study basics of COM/OLE.
508:
509: By default, LZMA Encoder contains all Match Finders.
510: But for compressing it's enough to have just one of them.
511: So for reducing size of compressing code you can define:
512: #define COMPRESS_MF_BT
513: #define COMPRESS_MF_BT4
514: and it will use only bt4 match finder.
515:
516:
517: ---
518:
519: http://www.7-zip.org
520: http://www.7-zip.org/support.html
521:
522:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>