Annotation of elwix/tools/oldlzma/lzma.txt, revision 1.1
1.1 ! misho 1: LZMA SDK 4.17
! 2: -------------
! 3:
! 4: LZMA SDK 4.17 Copyright (C) 1999-2005 Igor Pavlov
! 5:
! 6: LZMA SDK provides developers with documentation, source code,
! 7: and sample code necessary to write software that uses LZMA compression.
! 8:
! 9: LZMA is default and general compression method of 7z format
! 10: in 7-Zip compression program (www.7-zip.org). LZMA provides high
! 11: compression ratio and very fast decompression.
! 12:
! 13: LZMA is an improved version of famous LZ77 compression algorithm.
! 14: It was improved in way of maximum increasing of compression ratio,
! 15: keeping high decompression speed and low memory requirements for
! 16: decompressing.
! 17:
! 18:
! 19:
! 20: LICENSE
! 21: -------
! 22:
! 23: LZMA SDK is licensed under two licenses:
! 24:
! 25: 1) GNU Lesser General Public License (GNU LGPL)
! 26: 2) Common Public License (CPL)
! 27:
! 28: It means that you can select one of these two licenses and
! 29: follow rules of that license.
! 30:
! 31: SPECIAL EXCEPTION
! 32: Igor Pavlov, as the author of this code, expressly permits you
! 33: to statically or dynamically link your code (or bind by name)
! 34: to the files from LZMA SDK without subjecting your linked
! 35: code to the terms of the CPL or GNU LGPL.
! 36: Any modifications or additions to files from LZMA SDK, however,
! 37: are subject to the GNU LGPL or CPL terms.
! 38:
! 39:
! 40: GNU LGPL and CPL licenses are pretty similar and both these
! 41: licenses are classified as
! 42:
! 43: 1) "Free software licenses" at http://www.gnu.org/
! 44: 2) "OSI-approved" at http://www.opensource.org/
! 45:
! 46:
! 47: LZMA SDK also can be available under a proprietary license for
! 48: those who cannot use the GNU LGPL or CPL in their code. To request
! 49: such proprietary license or any additional consultations,
! 50: send email message from that page:
! 51: http://www.7-zip.org/support.html
! 52:
! 53:
! 54: You should have received a copy of the GNU Lesser General Public
! 55: License along with this library; if not, write to the Free Software
! 56: Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! 57:
! 58: You should have received a copy of the Common Public License
! 59: along with this library.
! 60:
! 61:
! 62: LZMA SDK Contents
! 63: -----------------
! 64:
! 65: LZMA SDK includes:
! 66:
! 67: - C++ source code of LZMA Encoder and Decoder
! 68: - C++ source code for file->file LZMA compressing and decompressing
! 69: - ANSI-C compatible source code for LZMA decompressing
! 70: - Compiled file->file LZMA compressing/decompressing program for Windows system
! 71:
! 72: ANSI-C LZMA decompression code was ported from original C++ sources to C.
! 73: Also it was simplified and optimized for code size.
! 74: But it is fully compatible with LZMA from 7-Zip.
! 75:
! 76:
! 77: UNIX/Linux version
! 78: ------------------
! 79: To compile C++ version of file->file LZMA, go to directory
! 80: SRC/7zip/Compress/LZMA_Alone
! 81: and type "make" or "make clean all" to recompile all.
! 82:
! 83: In some UNIX/Linux versions you must compile LZMA with static libraries.
! 84: To compile with static libraries, change string in makefile
! 85: LIB = -lm
! 86: to string
! 87: LIB = -lm -static
! 88:
! 89:
! 90: Files
! 91: ---------------------
! 92: SRC - directory with source code
! 93: lzma.txt - LZMA SDK description (this file)
! 94: 7zFormat.txt - 7z Format description
! 95: 7zC.txt - 7z ANSI-C Decoder description (this file)
! 96: methods.txt - Compression method IDs for .7z
! 97: LGPL.txt - GNU Lesser General Public License
! 98: CPL.html - Common Public License
! 99: lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
! 100: history.txt - history of the LZMA SDK
! 101:
! 102:
! 103: Source code structure
! 104: ---------------------
! 105:
! 106: SRC
! 107: Common - common files for C++ projects
! 108: Windows - common files for Windows related code
! 109: 7zip - files related to 7-Zip Project
! 110: Common - common files for 7-Zip
! 111: Compress - files related to compression/decompression
! 112: LZ - files related to LZ (Lempel-Ziv) compression algorithm
! 113: BinTree - Binary Tree Match Finder for LZ algorithm
! 114: HashChain - Hash Chain Match Finder for LZ algorithm
! 115: Patricia - Patricia Match Finder for LZ algorithm
! 116: RangeCoder - Range Coder (special code of compression/decompression)
! 117: LZMA - LZMA compression/decompression on C++
! 118: LZMA_Alone - file->file LZMA compression/decompression
! 119: LZMA_C - ANSI-C compatible LZMA decompressor
! 120: LzmaDecode.h - interface for LZMA decoding on ANSI-C
! 121: LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version)
! 122: LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version)
! 123: LzmaTest.c - test application that decodes LZMA encoded file
! 124: Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
! 125: Archive - files related to archiving
! 126: 7z_C - 7z ANSI-C Decoder
! 127:
! 128: Source code of LZMA SDK is only part of big 7-Zip project. That is
! 129: why LZMA SDK uses such complex source code structure.
! 130:
! 131: You can find ANSI-C LZMA decompressing code at folder
! 132: SRC/7zip/Compress/LZMA_C
! 133: 7-Zip doesn't use that ANSI-C LZMA code and that code was developed
! 134: specially for this SDK. And files from LZMA_C do not need files from
! 135: other directories of SDK for compiling.
! 136:
! 137: 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
! 138:
! 139: http://sourceforge.net/projects/sevenzip/
! 140:
! 141:
! 142: LZMA Decompression features
! 143: ---------------------------
! 144: - Variable dictionary size (up to 256 MB)
! 145: - Estimated compressing speed: about 500 KB/s on 1 GHz CPU
! 146: - Estimated decompressing speed:
! 147: - 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon
! 148: - 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC
! 149: - Small memory requirements for decompressing (8-32 KB + DictionarySize)
! 150: - Small code size for decompressing: 2-8 KB (depending from
! 151: speed optimizations)
! 152:
! 153: LZMA decoder uses only integer operations and can be
! 154: implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
! 155:
! 156: Some critical operations that affect to speed of LZMA decompression:
! 157: 1) 32*16 bit integer multiply
! 158: 2) Misspredicted branches (penalty mostly depends from pipeline length)
! 159: 3) 32-bit shift and arithmetic operations
! 160:
! 161: Speed of LZMA decompression mostly depends from CPU speed.
! 162: Memory speed has no big meaning. But if your CPU has small data cache,
! 163: overall weight of memory speed will slightly increase.
! 164:
! 165:
! 166: How To Use
! 167: ----------
! 168:
! 169: Using LZMA encoder/decoder executable
! 170: --------------------------------------
! 171:
! 172: Usage: LZMA <e|d> inputFile outputFile [<switches>...]
! 173:
! 174: e: encode file
! 175:
! 176: d: decode file
! 177:
! 178: b: Benchmark. There are two tests: compressing and decompressing
! 179: with LZMA method. Benchmark shows rating in MIPS (million
! 180: instructions per second). Rating value is calculated from
! 181: measured speed and it is normalized with AMD Athlon XP CPU
! 182: results. Also Benchmark checks possible hardware errors (RAM
! 183: errors in most cases). Benchmark uses these settings:
! 184: (-a1, -d21, -fb32, -mfbt4). You can change only -d. Also you
! 185: can change number of iterations. Example for 30 iterations:
! 186: LZMA b 30
! 187: Default number of iterations is 10.
! 188:
! 189: <Switches>
! 190:
! 191:
! 192: -a{N}: set compression mode 0 = fast, 1 = normal, 2 = max
! 193: default: 2 (max)
! 194:
! 195: d{N}: Sets Dictionary size - [0, 28], default: 23 (8MB)
! 196: The maximum value for dictionary size is 256 MB = 2^28 bytes.
! 197: Dictionary size is calculated as DictionarySize = 2^N bytes.
! 198: For decompressing file compressed by LZMA method with dictionary
! 199: size D = 2^N you need about D bytes of memory (RAM).
! 200:
! 201: -fb{N}: set number of fast bytes - [5, 255], default: 128
! 202: Usually big number gives a little bit better compression ratio
! 203: and slower compression process.
! 204:
! 205: -lc{N}: set number of literal context bits - [0, 8], default: 3
! 206: Sometimes lc=4 gives gain for big files.
! 207:
! 208: -lp{N}: set number of literal pos bits - [0, 4], default: 0
! 209: lp switch is intended for periodical data when period is
! 210: equal 2^N. For example, for 32-bit (4 bytes)
! 211: periodical data you can use lp=2. Often it's better to set lc0,
! 212: if you change lp switch.
! 213:
! 214: -pb{N}: set number of pos bits - [0, 4], default: 2
! 215: pb switch is intended for periodical data
! 216: when period is equal 2^N.
! 217:
! 218: -mf{MF_ID}: set Match Finder. Default: bt4.
! 219: Compression ratio for all bt* and pat* almost the same.
! 220: Algorithms from hc* group doesn't provide good compression
! 221: ratio, but they often works pretty fast in combination with
! 222: fast mode (-a0). Methods from bt* group require less memory
! 223: than methods from pat* group. Usually bt4 works faster than
! 224: any pat*, but for some types of files pat* can work faster.
! 225:
! 226: Memory requirements depend from dictionary size
! 227: (parameter "d" in table below).
! 228:
! 229: MF_ID Memory Description
! 230:
! 231: bt2 d*9.5 + 1MB Binary Tree with 2 bytes hashing.
! 232: bt3 d*9.5 + 65MB Binary Tree with 2-3(full) bytes hashing.
! 233: bt4 d*9.5 + 6MB Binary Tree with 2-3-4 bytes hashing.
! 234: bt4b d*9.5 + 34MB Binary Tree with 2-3-4(big) bytes hashing.
! 235: pat2r d*26 + 1MB Patricia Tree with 2-bits nodes, removing.
! 236: pat2 d*38 + 1MB Patricia Tree with 2-bits nodes.
! 237: pat2h d*38 + 77MB Patricia Tree with 2-bits nodes, 2-3 bytes hashing.
! 238: pat3h d*62 + 85MB Patricia Tree with 3-bits nodes, 2-3 bytes hashing.
! 239: pat4h d*110 +101MB Patricia Tree with 4-bits nodes, 2-3 bytes hashing.
! 240: hc3 d*5.5 + 1MB Hash Chain with 2-3 bytes hashing.
! 241: hc4 d*5.5 + 6MB Hash Chain with 2-3-4 bytes hashing.
! 242:
! 243: -eos: write End Of Stream marker. By default LZMA doesn't write
! 244: eos marker, since LZMA decoder knows uncompressed size
! 245: stored in .lzma file header.
! 246:
! 247: -si: Read data from stdin (it will write End Of Stream marker).
! 248: -so: Write data to stdout
! 249:
! 250:
! 251: Examples:
! 252:
! 253: 1) LZMA e file.bin file.lzma -d16 -lc0
! 254:
! 255: compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
! 256: and 0 literal context bits. -lc0 allows to reduce memory requirements
! 257: for decompression.
! 258:
! 259:
! 260: 2) LZMA e file.bin file.lzma -lc0 -lp2
! 261:
! 262: compresses file.bin to file.lzma with settings suitable
! 263: for 32-bit periodical data (for example, ARM or MIPS code).
! 264:
! 265: 3) LZMA d file.lzma file.bin
! 266:
! 267: decompresses file.lzma to file.bin.
! 268:
! 269:
! 270: Compression ratio hints
! 271: -----------------------
! 272:
! 273: Recommendations
! 274: ---------------
! 275:
! 276: To increase compression ratio for LZMA compressing it's desirable
! 277: to have aligned data (if it's possible) and also it's desirable to locate
! 278: data in such order, where code is grouped in one place and data is
! 279: grouped in other place (it's better than such mixing: code, data, code,
! 280: data, ...).
! 281:
! 282:
! 283: Using Filters
! 284: -------------
! 285: You can increase compression ratio for some data types, using
! 286: special filters before compressing. For example, it's possible to
! 287: increase compression ratio on 5-10% for code for those CPU ISAs:
! 288: x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
! 289:
! 290: You can find C/C++ source code of such filters in folder "7zip/Compress/Branch"
! 291:
! 292: You can check compression ratio gain of these filters with such
! 293: 7-Zip commands (example for ARM code):
! 294: No filter:
! 295: 7z a a1.7z a.bin -m0=lzma
! 296:
! 297: With filter for little-endian ARM code:
! 298: 7z a a2.7z a.bin -m0=bc_arm -m1=lzma
! 299:
! 300: With filter for big-endian ARM code (using additional Swap4 filter):
! 301: 7z a a3.7z a.bin -m0=swap4 -m1=bc_arm -m2=lzma
! 302:
! 303: It works in such manner:
! 304: Compressing = Filter_encoding + LZMA_encoding
! 305: Decompressing = LZMA_decoding + Filter_decoding
! 306:
! 307: Compressing and decompressing speed of such filters is very high,
! 308: so it will not increase decompressing time too much.
! 309: Moreover, it reduces decompression time for LZMA_decoding,
! 310: since compression ratio with filtering is higher.
! 311:
! 312: These filters convert CALL (calling procedure) instructions
! 313: from relative offsets to absolute addresses, so such data becomes more
! 314: compressible. Source code of these CALL filters is pretty simple
! 315: (about 20 lines of C++), so you can convert it from C++ version yourself.
! 316:
! 317: For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
! 318:
! 319:
! 320: LZMA compressed file format
! 321: ---------------------------
! 322: Offset Size Description
! 323: 0 1 Special LZMA properties for compressed data
! 324: 1 4 Dictionary size (little endian)
! 325: 5 8 Uncompressed size (little endian). -1 means unknown size
! 326: 13 Compressed data
! 327:
! 328:
! 329: ANSI-C LZMA Decoder
! 330: ~~~~~~~~~~~~~~~~~~~
! 331:
! 332: To use ANSI-C LZMA Decoder you need to files:
! 333: LzmaDecode.h and one of the following two files:
! 334: 1) LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version)
! 335: 2) LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version)
! 336: use LzmaDecode.c, if you need fastest code.
! 337:
! 338:
! 339: Memory requirements for LZMA decoding
! 340: -------------------------------------
! 341:
! 342: LZMA decoder doesn't allocate memory itself, so you must
! 343: calculate required memory, allocate it and send it to LZMA.
! 344:
! 345: Stack usage of LZMA function for local variables is not
! 346: larger than 200 bytes.
! 347:
! 348: Memory requirements for decompression depend
! 349: from interface that you want to use:
! 350:
! 351: a) Memory to memory decompression:
! 352:
! 353: M1 = (inputSize + outputSize + lzmaInternalSize).
! 354:
! 355: b) Decompression with buffering:
! 356:
! 357: M2 = (inputBufferSize + outputBufferSize + dictionarySize + lzmaInternalSize)
! 358:
! 359:
! 360: How To decompress data
! 361: ----------------------
! 362:
! 363: 1) Read first byte of properties for LZMA compressed stream,
! 364: check that it has correct value and calculate three
! 365: LZMA property variables:
! 366:
! 367: int lc, lp, pb;
! 368: unsigned char prop0 = properties[0];
! 369: if (prop0 >= (9*5*5))
! 370: {
! 371: sprintf(rs + strlen(rs), "\n properties error");
! 372: return 1;
! 373: }
! 374: for (pb = 0; prop0 >= (9 * 5);
! 375: pb++, prop0 -= (9 * 5));
! 376: for (lp = 0; prop0 >= 9;
! 377: lp++, prop0 -= 9);
! 378: lc = prop0;
! 379:
! 380: 2) Calculate required amount for LZMA lzmaInternalSize:
! 381:
! 382: lzmaInternalSize = (LZMA_BASE_SIZE + (LZMA_LIT_SIZE << (lc + lp))) *
! 383: sizeof(CProb)
! 384:
! 385: LZMA_BASE_SIZE = 1846
! 386: LZMA_LIT_SIZE = 768
! 387:
! 388: LZMA decoder uses array of CProb variables as internal structure.
! 389: By default, CProb is (unsigned short)
! 390: But you can define _LZMA_PROB32 to make it (unsigned int)
! 391: It can increase speed on some 32-bit CPUs, but memory usage will
! 392: be doubled in that case.
! 393:
! 394:
! 395: 2b) If you use Decompression with buffering, add 100 bytes to
! 396: lzmaInternalSize:
! 397:
! 398: #ifdef _LZMA_OUT_READ
! 399: lzmaInternalSize += 100;
! 400: #endif
! 401:
! 402: 3) Allocate that memory with malloc or some other function:
! 403:
! 404: lzmaInternalData = malloc(lzmaInternalSize);
! 405:
! 406:
! 407: 4) Decompress data:
! 408:
! 409: 4a) If you use simple memory to memory decompression:
! 410:
! 411: int result = LzmaDecode(lzmaInternalData, lzmaInternalSize,
! 412: lc, lp, pb,
! 413: unsigned char *inStream, unsigned int inSize,
! 414: unsigned char *outStream, unsigned int outSize,
! 415: &outSizeProcessed);
! 416:
! 417: 4b) If you use Decompression with buffering
! 418:
! 419: 4.1) Read dictionary size from properties
! 420:
! 421: unsigned int dictionarySize = 0;
! 422: int i;
! 423: for (i = 0; i < 4; i++)
! 424: dictionarySize += (unsigned int)(b) << (i * 8);
! 425:
! 426: 4.2) Allocate memory for dictionary
! 427:
! 428: unsigned char *dictionary = malloc(dictionarySize);
! 429:
! 430: 4.3) Initialize LZMA decoder:
! 431:
! 432: LzmaDecoderInit((unsigned char *)lzmaInternalData, lzmaInternalSize,
! 433: lc, lp, pb,
! 434: dictionary, dictionarySize,
! 435: &bo.ReadCallback);
! 436:
! 437: 4.4) In loop call LzmaDecoderCode function:
! 438:
! 439: for (nowPos = 0; nowPos < outSize;)
! 440: {
! 441: unsigned int blockSize = outSize - nowPos;
! 442: unsigned int kBlockSize = 0x10000;
! 443: if (blockSize > kBlockSize)
! 444: blockSize = kBlockSize;
! 445: res = LzmaDecode((unsigned char *)lzmaInternalData,
! 446: ((unsigned char *)outStream) + nowPos, blockSize, &outSizeProcessed);
! 447: if (res != 0)
! 448: {
! 449: printf("\nerror = %d\n", res);
! 450: break;
! 451: }
! 452: nowPos += outSizeProcessed;
! 453: if (outSizeProcessed == 0)
! 454: {
! 455: outSize = nowPos;
! 456: break;
! 457: }
! 458: }
! 459:
! 460:
! 461: EXIT codes
! 462: -----------
! 463:
! 464: LZMA decoder can return one of the following codes:
! 465:
! 466: #define LZMA_RESULT_OK 0
! 467: #define LZMA_RESULT_DATA_ERROR 1
! 468: #define LZMA_RESULT_NOT_ENOUGH_MEM 2
! 469:
! 470: If you use callback function for input data and you return some
! 471: error code, LZMA Decoder also returns that code.
! 472:
! 473:
! 474:
! 475: LZMA Defines
! 476: ------------
! 477:
! 478: _LZMA_IN_CB - Use callback for input data
! 479:
! 480: _LZMA_OUT_READ - Use read function for output data
! 481:
! 482: _LZMA_LOC_OPT - Enable local speed optimizations inside code.
! 483: _LZMA_LOC_OPT is only for LzmaDecodeSize.c (size-optimized version).
! 484: _LZMA_LOC_OPT doesn't affect LzmaDecode.c (speed-optimized version)
! 485:
! 486: _LZMA_PROB32 - It can increase speed on some 32-bit CPUs,
! 487: but memory usage will be doubled in that case
! 488:
! 489: _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler
! 490: and long is 32-bit.
! 491:
! 492:
! 493: NOTES
! 494: -----
! 495: 1) please note that LzmaTest.c doesn't free allocated memory in some cases.
! 496: But in your real applicaions you must free memory after decompression.
! 497:
! 498: 2) All numbers above were calculated for case when int is not more than
! 499: 32-bit in your compiler. If in your compiler int is 64-bit or larger
! 500: probably LZMA can require more memory for some structures.
! 501:
! 502:
! 503:
! 504: C++ LZMA Encoder/Decoder
! 505: ~~~~~~~~~~~~~~~~~~~~~~~~
! 506: C++ LZMA code use COM-like interfaces. So if you want to use it,
! 507: you can study basics of COM/OLE.
! 508:
! 509: By default, LZMA Encoder contains all Match Finders.
! 510: But for compressing it's enough to have just one of them.
! 511: So for reducing size of compressing code you can define:
! 512: #define COMPRESS_MF_BT
! 513: #define COMPRESS_MF_BT4
! 514: and it will use only bt4 match finder.
! 515:
! 516:
! 517: ---
! 518:
! 519: http://www.7-zip.org
! 520: http://www.7-zip.org/support.html
! 521:
! 522:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>