Annotation of elwix/tools/oldlzma/lzma.txt, revision 1.1.1.1

1.1       misho       1: LZMA SDK 4.17 
                      2: -------------
                      3: 
                      4: LZMA SDK 4.17  Copyright (C) 1999-2005 Igor Pavlov
                      5: 
                      6: LZMA SDK provides developers with documentation, source code,
                      7: and sample code necessary to write software that uses LZMA compression. 
                      8: 
                      9: LZMA is default and general compression method of 7z format
                     10: in 7-Zip compression program (www.7-zip.org). LZMA provides high 
                     11: compression ratio and very fast decompression.
                     12: 
                     13: LZMA is an improved version of famous LZ77 compression algorithm. 
                     14: It was improved in way of maximum increasing of compression ratio,
                     15: keeping high decompression speed and low memory requirements for 
                     16: decompressing.
                     17: 
                     18: 
                     19: 
                     20: LICENSE
                     21: -------
                     22: 
                     23: LZMA SDK is licensed under two licenses:
                     24: 
                     25: 1) GNU Lesser General Public License (GNU LGPL)
                     26: 2) Common Public License (CPL)
                     27: 
                     28: It means that you can select one of these two licenses and 
                     29: follow rules of that license.
                     30: 
                     31: SPECIAL EXCEPTION
                     32: Igor Pavlov, as the author of this code, expressly permits you 
                     33: to statically or dynamically link your code (or bind by name) 
                     34: to the files from LZMA SDK without subjecting your linked 
                     35: code to the terms of the CPL or GNU LGPL. 
                     36: Any modifications or additions to files from LZMA SDK, however, 
                     37: are subject to the GNU LGPL or CPL terms.
                     38: 
                     39: 
                     40: GNU LGPL and CPL licenses are pretty similar and both these
                     41: licenses are classified as 
                     42: 
                     43: 1) "Free software licenses" at http://www.gnu.org/ 
                     44: 2) "OSI-approved" at http://www.opensource.org/
                     45: 
                     46: 
                     47: LZMA SDK also can be available under a proprietary license for 
                     48: those who cannot use the GNU LGPL or CPL in their code. To request
                     49: such proprietary license or any additional consultations,
                     50: send email message from that page:
                     51: http://www.7-zip.org/support.html
                     52: 
                     53: 
                     54: You should have received a copy of the GNU Lesser General Public
                     55: License along with this library; if not, write to the Free Software
                     56: Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
                     57: 
                     58: You should have received a copy of the Common Public License
                     59: along with this library.
                     60: 
                     61: 
                     62: LZMA SDK Contents
                     63: -----------------
                     64: 
                     65: LZMA SDK includes:
                     66: 
                     67:   - C++ source code of LZMA Encoder and Decoder
                     68:   - C++ source code for file->file LZMA compressing and decompressing
                     69:   - ANSI-C compatible source code for LZMA decompressing
                     70:   - Compiled file->file LZMA compressing/decompressing program for Windows system
                     71: 
                     72: ANSI-C LZMA decompression code was ported from original C++ sources to C.
                     73: Also it was simplified and optimized for code size. 
                     74: But it is fully compatible with LZMA from 7-Zip.
                     75: 
                     76: 
                     77: UNIX/Linux version 
                     78: ------------------
                     79: To compile C++ version of file->file LZMA, go to directory
                     80: SRC/7zip/Compress/LZMA_Alone 
                     81: and type "make" or "make clean all" to recompile all.
                     82: 
                     83: In some UNIX/Linux versions you must compile LZMA with static libraries.
                     84: To compile with static libraries, change string in makefile
                     85: LIB = -lm
                     86: to string  
                     87: LIB = -lm -static
                     88: 
                     89: 
                     90: Files
                     91: ---------------------
                     92: SRC      - directory with source code
                     93: lzma.txt - LZMA SDK description (this file)
                     94: 7zFormat.txt - 7z Format description
                     95: 7zC.txt  - 7z ANSI-C Decoder description (this file)
                     96: methods.txt  - Compression method IDs for .7z
                     97: LGPL.txt - GNU Lesser General Public License
                     98: CPL.html - Common Public License
                     99: lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
                    100: history.txt - history of the LZMA SDK
                    101: 
                    102: 
                    103: Source code structure
                    104: ---------------------
                    105: 
                    106: SRC
                    107:   Common  - common files for C++ projects
                    108:   Windows - common files for Windows related code
                    109:   7zip    - files related to 7-Zip Project
                    110:     Common   - common files for 7-Zip
                    111:     Compress - files related to compression/decompression
                    112:       LZ     - files related to LZ (Lempel-Ziv) compression algorithm
                    113:         BinTree    - Binary Tree Match Finder for LZ algorithm
                    114:         HashChain  - Hash Chain Match Finder for LZ algorithm
                    115:         Patricia   - Patricia Match Finder for LZ algorithm
                    116:       RangeCoder   - Range Coder (special code of compression/decompression)
                    117:       LZMA         - LZMA compression/decompression on C++
                    118:       LZMA_Alone   - file->file LZMA compression/decompression
                    119:       LZMA_C       - ANSI-C compatible LZMA decompressor
                    120:         LzmaDecode.h  - interface for LZMA decoding on ANSI-C
                    121:         LzmaDecode.c      - LZMA decoding on ANSI-C (new fastest version)
                    122:         LzmaDecodeSize.c  - LZMA decoding on ANSI-C (old size-optimized version)
                    123:         LzmaTest.c    - test application that decodes LZMA encoded file
                    124:       Branch       - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
                    125:     Archive - files related to archiving
                    126:       7z_C     - 7z ANSI-C Decoder
                    127: 
                    128: Source code of LZMA SDK is only part of big 7-Zip project. That is 
                    129: why LZMA SDK uses such complex source code structure. 
                    130: 
                    131: You can find ANSI-C LZMA decompressing code at folder 
                    132:   SRC/7zip/Compress/LZMA_C
                    133: 7-Zip doesn't use that ANSI-C LZMA code and that code was developed 
                    134: specially for this SDK. And files from LZMA_C do not need files from 
                    135: other directories of SDK for compiling.
                    136: 
                    137: 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
                    138: 
                    139:   http://sourceforge.net/projects/sevenzip/
                    140: 
                    141: 
                    142: LZMA Decompression features
                    143: ---------------------------
                    144:   - Variable dictionary size (up to 256 MB)
                    145:   - Estimated compressing speed: about 500 KB/s on 1 GHz CPU
                    146:   - Estimated decompressing speed: 
                    147:       - 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon
                    148:       - 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC
                    149:   - Small memory requirements for decompressing (8-32 KB + DictionarySize)
                    150:   - Small code size for decompressing: 2-8 KB (depending from 
                    151:     speed optimizations) 
                    152: 
                    153: LZMA decoder uses only integer operations and can be 
                    154: implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
                    155: 
                    156: Some critical operations that affect to speed of LZMA decompression:
                    157:   1) 32*16 bit integer multiply
                    158:   2) Misspredicted branches (penalty mostly depends from pipeline length)
                    159:   3) 32-bit shift and arithmetic operations
                    160: 
                    161: Speed of LZMA decompression mostly depends from CPU speed.
                    162: Memory speed has no big meaning. But if your CPU has small data cache, 
                    163: overall weight of memory speed will slightly increase.
                    164: 
                    165: 
                    166: How To Use
                    167: ----------
                    168: 
                    169: Using LZMA encoder/decoder executable
                    170: --------------------------------------
                    171: 
                    172: Usage:  LZMA <e|d> inputFile outputFile [<switches>...]
                    173: 
                    174:   e: encode file
                    175: 
                    176:   d: decode file
                    177: 
                    178:   b: Benchmark. There are two tests: compressing and decompressing 
                    179:      with LZMA method. Benchmark shows rating in MIPS (million 
                    180:      instructions per second). Rating value is calculated from 
                    181:      measured speed and it is normalized with AMD Athlon XP CPU
                    182:      results. Also Benchmark checks possible hardware errors (RAM 
                    183:      errors in most cases). Benchmark uses these settings:
                    184:      (-a1, -d21, -fb32, -mfbt4). You can change only -d. Also you 
                    185:      can change number of iterations. Example for 30 iterations:
                    186:        LZMA b 30
                    187:      Default number of iterations is 10.
                    188: 
                    189: <Switches>
                    190:   
                    191: 
                    192:   -a{N}:  set compression mode 0 = fast, 1 = normal, 2 = max
                    193:           default: 2 (max)
                    194: 
                    195:   d{N}:   Sets Dictionary size - [0, 28], default: 23 (8MB)
                    196:           The maximum value for dictionary size is 256 MB = 2^28 bytes.
                    197:           Dictionary size is calculated as DictionarySize = 2^N bytes. 
                    198:           For decompressing file compressed by LZMA method with dictionary 
                    199:           size D = 2^N you need about D bytes of memory (RAM).
                    200: 
                    201:   -fb{N}: set number of fast bytes - [5, 255], default: 128
                    202:           Usually big number gives a little bit better compression ratio 
                    203:           and slower compression process.
                    204: 
                    205:   -lc{N}: set number of literal context bits - [0, 8], default: 3
                    206:           Sometimes lc=4 gives gain for big files.
                    207: 
                    208:   -lp{N}: set number of literal pos bits - [0, 4], default: 0
                    209:           lp switch is intended for periodical data when period is 
                    210:           equal 2^N. For example, for 32-bit (4 bytes) 
                    211:           periodical data you can use lp=2. Often it's better to set lc0, 
                    212:           if you change lp switch.
                    213: 
                    214:   -pb{N}: set number of pos bits - [0, 4], default: 2
                    215:           pb switch is intended for periodical data 
                    216:           when period is equal 2^N.
                    217: 
                    218:   -mf{MF_ID}: set Match Finder. Default: bt4. 
                    219:               Compression ratio for all bt* and pat* almost the same.
                    220:               Algorithms from hc* group doesn't provide good compression 
                    221:               ratio, but they often works pretty fast in combination with 
                    222:               fast mode (-a0). Methods from bt* group require less memory 
                    223:               than methods from pat* group. Usually bt4 works faster than 
                    224:               any pat*, but for some types of files pat* can work faster. 
                    225: 
                    226:               Memory requirements depend from dictionary size 
                    227:               (parameter "d" in table below). 
                    228: 
                    229:                MF_ID     Memory                   Description
                    230: 
                    231:                 bt2    d*9.5 +  1MB  Binary Tree with 2 bytes hashing.
                    232:                 bt3    d*9.5 + 65MB  Binary Tree with 2-3(full) bytes hashing.
                    233:                 bt4    d*9.5 +  6MB  Binary Tree with 2-3-4 bytes hashing.
                    234:                 bt4b   d*9.5 + 34MB  Binary Tree with 2-3-4(big) bytes hashing.
                    235:                 pat2r  d*26  +  1MB  Patricia Tree with 2-bits nodes, removing.
                    236:                 pat2   d*38  +  1MB  Patricia Tree with 2-bits nodes.
                    237:                 pat2h  d*38  + 77MB  Patricia Tree with 2-bits nodes, 2-3 bytes hashing.
                    238:                 pat3h  d*62  + 85MB  Patricia Tree with 3-bits nodes, 2-3 bytes hashing.
                    239:                 pat4h  d*110 +101MB  Patricia Tree with 4-bits nodes, 2-3 bytes hashing.
                    240:                 hc3    d*5.5 +  1MB  Hash Chain with 2-3 bytes hashing.
                    241:                 hc4    d*5.5 +  6MB  Hash Chain with 2-3-4 bytes hashing.
                    242: 
                    243:   -eos:   write End Of Stream marker. By default LZMA doesn't write 
                    244:           eos marker, since LZMA decoder knows uncompressed size 
                    245:           stored in .lzma file header.
                    246: 
                    247:   -si:    Read data from stdin (it will write End Of Stream marker).
                    248:   -so:    Write data to stdout
                    249: 
                    250: 
                    251: Examples:
                    252: 
                    253: 1) LZMA e file.bin file.lzma -d16 -lc0 
                    254: 
                    255: compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)  
                    256: and 0 literal context bits. -lc0 allows to reduce memory requirements 
                    257: for decompression.
                    258: 
                    259: 
                    260: 2) LZMA e file.bin file.lzma -lc0 -lp2
                    261: 
                    262: compresses file.bin to file.lzma with settings suitable 
                    263: for 32-bit periodical data (for example, ARM or MIPS code).
                    264: 
                    265: 3) LZMA d file.lzma file.bin
                    266: 
                    267: decompresses file.lzma to file.bin.
                    268: 
                    269: 
                    270: Compression ratio hints
                    271: -----------------------
                    272: 
                    273: Recommendations
                    274: ---------------
                    275: 
                    276: To increase compression ratio for LZMA compressing it's desirable 
                    277: to have aligned data (if it's possible) and also it's desirable to locate
                    278: data in such order, where code is grouped in one place and data is 
                    279: grouped in other place (it's better than such mixing: code, data, code,
                    280: data, ...).
                    281: 
                    282: 
                    283: Using Filters
                    284: -------------
                    285: You can increase compression ratio for some data types, using
                    286: special filters before compressing. For example, it's possible to 
                    287: increase compression ratio on 5-10% for code for those CPU ISAs: 
                    288: x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
                    289: 
                    290: You can find C/C++ source code of such filters in folder "7zip/Compress/Branch"
                    291: 
                    292: You can check compression ratio gain of these filters with such 
                    293: 7-Zip commands (example for ARM code):
                    294: No filter:
                    295:   7z a a1.7z a.bin -m0=lzma
                    296: 
                    297: With filter for little-endian ARM code:
                    298:   7z a a2.7z a.bin -m0=bc_arm -m1=lzma        
                    299: 
                    300: With filter for big-endian ARM code (using additional Swap4 filter):
                    301:   7z a a3.7z a.bin -m0=swap4 -m1=bc_arm -m2=lzma
                    302: 
                    303: It works in such manner:
                    304: Compressing    = Filter_encoding + LZMA_encoding
                    305: Decompressing  = LZMA_decoding + Filter_decoding
                    306: 
                    307: Compressing and decompressing speed of such filters is very high,
                    308: so it will not increase decompressing time too much.
                    309: Moreover, it reduces decompression time for LZMA_decoding, 
                    310: since compression ratio with filtering is higher.
                    311: 
                    312: These filters convert CALL (calling procedure) instructions 
                    313: from relative offsets to absolute addresses, so such data becomes more 
                    314: compressible. Source code of these CALL filters is pretty simple
                    315: (about 20 lines of C++), so you can convert it from C++ version yourself.
                    316: 
                    317: For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
                    318: 
                    319: 
                    320: LZMA compressed file format
                    321: ---------------------------
                    322: Offset Size Description
                    323:   0     1   Special LZMA properties for compressed data
                    324:   1     4   Dictionary size (little endian)
                    325:   5     8   Uncompressed size (little endian). -1 means unknown size
                    326:  13         Compressed data
                    327: 
                    328: 
                    329: ANSI-C LZMA Decoder
                    330: ~~~~~~~~~~~~~~~~~~~
                    331: 
                    332: To use ANSI-C LZMA Decoder you need to files:
                    333: LzmaDecode.h and one of the following two files:
                    334: 1) LzmaDecode.c      - LZMA decoding on ANSI-C (new fastest version)
                    335: 2) LzmaDecodeSize.c  - LZMA decoding on ANSI-C (old size-optimized version)
                    336: use LzmaDecode.c, if you need fastest code.
                    337: 
                    338: 
                    339: Memory requirements for LZMA decoding
                    340: -------------------------------------
                    341: 
                    342: LZMA decoder doesn't allocate memory itself, so you must 
                    343: calculate required memory, allocate it and send it to LZMA.
                    344: 
                    345: Stack usage of LZMA function for local variables is not 
                    346: larger than 200 bytes.
                    347: 
                    348: Memory requirements for decompression depend 
                    349: from interface that you want to use:
                    350: 
                    351:   a) Memory to memory decompression:
                    352:     
                    353:     M1 = (inputSize + outputSize + lzmaInternalSize).
                    354: 
                    355:   b) Decompression with buffering:
                    356: 
                    357:     M2 = (inputBufferSize + outputBufferSize + dictionarySize + lzmaInternalSize)
                    358: 
                    359: 
                    360: How To decompress data
                    361: ----------------------
                    362: 
                    363: 1) Read first byte of properties for LZMA compressed stream, 
                    364:    check that it has correct value and calculate three 
                    365:    LZMA property variables:
                    366: 
                    367:   int lc, lp, pb;
                    368:   unsigned char prop0 = properties[0];
                    369:   if (prop0 >= (9*5*5))
                    370:   {
                    371:     sprintf(rs + strlen(rs), "\n properties error");
                    372:     return 1;
                    373:   }
                    374:   for (pb = 0; prop0 >= (9 * 5); 
                    375:     pb++, prop0 -= (9 * 5));
                    376:   for (lp = 0; prop0 >= 9; 
                    377:     lp++, prop0 -= 9);
                    378:   lc = prop0;
                    379: 
                    380: 2) Calculate required amount for LZMA lzmaInternalSize:
                    381: 
                    382:   lzmaInternalSize = (LZMA_BASE_SIZE + (LZMA_LIT_SIZE << (lc + lp))) * 
                    383:      sizeof(CProb)
                    384: 
                    385:   LZMA_BASE_SIZE = 1846
                    386:   LZMA_LIT_SIZE = 768
                    387: 
                    388:   LZMA decoder uses array of CProb variables as internal structure.
                    389:   By default, CProb is (unsigned short)
                    390:   But you can define _LZMA_PROB32 to make it (unsigned int)
                    391:   It can increase speed on some 32-bit CPUs, but memory usage will 
                    392:   be doubled in that case.
                    393: 
                    394: 
                    395:   2b) If you use Decompression with buffering, add 100 bytes to 
                    396:       lzmaInternalSize:
                    397:      
                    398:       #ifdef _LZMA_OUT_READ
                    399:       lzmaInternalSize += 100;
                    400:       #endif
                    401: 
                    402: 3) Allocate that memory with malloc or some other function:
                    403: 
                    404:   lzmaInternalData = malloc(lzmaInternalSize);
                    405: 
                    406: 
                    407: 4) Decompress data:
                    408: 
                    409:   4a) If you use simple memory to memory decompression:
                    410: 
                    411:     int result = LzmaDecode(lzmaInternalData, lzmaInternalSize,
                    412:         lc, lp, pb,
                    413:         unsigned char *inStream, unsigned int inSize,
                    414:         unsigned char *outStream, unsigned int outSize, 
                    415:         &outSizeProcessed);
                    416: 
                    417:   4b) If you use Decompression with buffering
                    418: 
                    419:     4.1) Read dictionary size from properties
                    420: 
                    421:       unsigned int dictionarySize = 0;
                    422:       int i;
                    423:       for (i = 0; i < 4; i++)
                    424:         dictionarySize += (unsigned int)(b) << (i * 8);
                    425: 
                    426:     4.2) Allocate memory for dictionary
                    427: 
                    428:       unsigned char *dictionary = malloc(dictionarySize);
                    429: 
                    430:     4.3) Initialize LZMA decoder:
                    431: 
                    432:     LzmaDecoderInit((unsigned char *)lzmaInternalData, lzmaInternalSize,
                    433:         lc, lp, pb,
                    434:         dictionary, dictionarySize,
                    435:         &bo.ReadCallback);
                    436: 
                    437:     4.4) In loop call LzmaDecoderCode function:
                    438: 
                    439:     for (nowPos = 0; nowPos < outSize;)
                    440:     {
                    441:       unsigned int blockSize = outSize - nowPos;
                    442:       unsigned int kBlockSize = 0x10000;
                    443:       if (blockSize > kBlockSize)
                    444:         blockSize = kBlockSize;
                    445:       res = LzmaDecode((unsigned char *)lzmaInternalData, 
                    446:       ((unsigned char *)outStream) + nowPos, blockSize, &outSizeProcessed);
                    447:       if (res != 0)
                    448:       {
                    449:         printf("\nerror = %d\n", res);
                    450:         break;
                    451:       }
                    452:       nowPos += outSizeProcessed;
                    453:       if (outSizeProcessed == 0)
                    454:       {
                    455:         outSize = nowPos;
                    456:         break;
                    457:       }
                    458:     }
                    459: 
                    460: 
                    461: EXIT codes
                    462: -----------
                    463: 
                    464: LZMA decoder can return one of the following codes:
                    465: 
                    466: #define LZMA_RESULT_OK 0
                    467: #define LZMA_RESULT_DATA_ERROR 1
                    468: #define LZMA_RESULT_NOT_ENOUGH_MEM 2
                    469: 
                    470: If you use callback function for input data and you return some 
                    471: error code, LZMA Decoder also returns that code.
                    472: 
                    473: 
                    474: 
                    475: LZMA Defines
                    476: ------------
                    477: 
                    478: _LZMA_IN_CB    - Use callback for input data
                    479: 
                    480: _LZMA_OUT_READ - Use read function for output data
                    481: 
                    482: _LZMA_LOC_OPT  - Enable local speed optimizations inside code.
                    483:                  _LZMA_LOC_OPT is only for LzmaDecodeSize.c (size-optimized version).
                    484:                  _LZMA_LOC_OPT doesn't affect LzmaDecode.c (speed-optimized version)
                    485: 
                    486: _LZMA_PROB32   - It can increase speed on some 32-bit CPUs, 
                    487:                  but memory usage will be doubled in that case
                    488: 
                    489: _LZMA_UINT32_IS_ULONG  - Define it if int is 16-bit on your compiler
                    490:                          and long is 32-bit.
                    491: 
                    492: 
                    493: NOTES
                    494: -----
                    495: 1) please note that LzmaTest.c doesn't free allocated memory in some cases. 
                    496: But in your real applicaions you must free memory after decompression.
                    497: 
                    498: 2) All numbers above were calculated for case when int is not more than 
                    499:   32-bit in your compiler. If in your compiler int is 64-bit or larger 
                    500:   probably LZMA can require more memory for some structures.
                    501: 
                    502: 
                    503: 
                    504: C++ LZMA Encoder/Decoder 
                    505: ~~~~~~~~~~~~~~~~~~~~~~~~
                    506: C++ LZMA code use COM-like interfaces. So if you want to use it, 
                    507: you can study basics of COM/OLE.
                    508: 
                    509: By default, LZMA Encoder contains all Match Finders.
                    510: But for compressing it's enough to have just one of them.
                    511: So for reducing size of compressing code you can define:
                    512:   #define COMPRESS_MF_BT
                    513:   #define COMPRESS_MF_BT4
                    514: and it will use only bt4 match finder.
                    515: 
                    516: 
                    517: ---
                    518: 
                    519: http://www.7-zip.org
                    520: http://www.7-zip.org/support.html
                    521: 
                    522: 

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>