by pts@fazekas.hu at Tue Jan 31 11:15:10 CET 2017

system calls: (errno) chmod fchmod utimes gettimeofday umask symlink unlink mkdir open read write _llseek close lstat

Ideas to reduce file size:

* Measure how much slower the new (non-table) CrcCalc is and how much it
  contributes to the total execution time.
* Merge Byte flags CSzFileItem into a single-byte bitset? Makes the
  executable larger.

Ideas already implemented:

* DONE: Use GCC 4.8 or later, it has better -Os support.
* DONE: Replace CrcCalc with a non-table-based implementation.
* DONE: Get rid of lstat, use O_EXCL flag in open.
* DONE: Loop in LookInStream_Read2 many not be needed.
* DONE: Remove unused CrcUpdateT8 and CpuArch.c.
* DONE: Remove redundancy between SetMTime and ConvertFileTimeToString.
* DONE: use a tiny libc with system calls only
* DONE: memcmp is used only for k7zSignature, try to eliminate or inline it
* DONE: strcmp is used only for argv parsing, try to eliminate or inline it
* DONE: add output buffering for stdout (stdio.h messages)
* DONE: eliminate __divdi3 (use __udivdi3 instead) by converting signed
  divisions to unsinged
* DONE: Replace __udivdi3 with a shorter implementation on __i386__ if the
  divisor is 32-bit.
* DONE: Implement __umoddi3 in terms of __udivdi3.
* do we really need __udivdi3, can't we do without them?
* DONE: fwrite() is called only once per file, use write(2) instead
* DONE: use read() instead of fread() for reading: the file reading pattern is
  already buffered well enough: LookToRead_Look_Exact reads 16 kB most of
  the time (large enough buffer)
* DONE: FileInStream_CreateVTable can be eliminated
* DONE: Once we use SZ_SEEK_SET only, we can eliminate ftell, use lseek, and
  fail if seeking wasn't possible.
* DONE: malloc()+free() is used in a predictable way, only 1 malloc/free pair per
  file (probably the filename); we can optimize it by making free() a no-op
  execpt if it's free()ing the last malloc().
  See malloc.log malloc_perfile.log malloc_perfile2.log
  input .7z file size: 10887666 bytes
  big malloc()s: 100192, 466238, 521688, 21810108, 26192641
  The malloc(21810108) happens after the free() of 26192641. Can we optimize it?
  How much is the memory usage, who do we need the big malloc for?
  How does it change if we duplicate all files twice in the .7z?
  LZMA extraction of a big file (400000002) seems to need:
  DYNAMIC ALLOC 400000002 = 0xdfa57008
  DYNAMIC ALLOC 15980 = 0x91f1a68
  DYNAMIC FREE 0x91f1a68 was 15980
  DYNAMIC FREE 0xdfa57008 was 400000002
* DONE: Use the lseek64 library call (or _llseek system call) for 64-bit seeks
  in the archive in c-minidiet.sh
* DONE: Inline some syscall functions in minidiet/minidiet.c.
* DONE: Pick a shorter (but slower) CRC32 implementation. It's compatible.
  http://www.hackersdelight.org/hdcodetxt/crc.c.txt
* DONE: Try some more command-line flags for gcc (http://tiny.cc/tinygcc).
* DONE: Eliminate multiple calls to Utf16_To_Char in 7zMain.c.
* DONE: Reuse the output buffer for Utf16_To_Char.
* DONE: Use a malloc stack, make sure it plays well with allocations in
  SzArEx_Extract.
* DONE: Get rid of the memcpy in SzArEx_GetFileNameUtf16 (and the entire function),
  for that FileNamesInHeaderBufPtr needs to be on a 2-aligned address on
  some architectures.
* DONE: Remove LZMA2 optionally (-UUSE_LZMA2).
  * .7z archives by default seem to use LZMA. Why is LZMA2 better?
  * On Wikipedia: LZMA2 is a simple container format that can include both
    uncompressed data and LZMA data, possibly with multiple different LZMA
    encoding parameters. LZMA2 supports arbitrarily scalable multithreaded
    compression and decompression and efficient compression of data which is
    partially incompressible.
   * LZMA2 is faster for 4-threads, if you compress big file (more than 256 MB), so 7-Zip will be able to split it to blocks.
   * LZMA2 was created for XZ format and it includes changes that are good for that stream compression format.
   * Also LZMA2 is better than LZMA, if you compress already compressed data.
  * Lzma2Dec.c reuses code from LzmaDec.c
* DONE: Get rid of FILE_ATTRIBUTE_DIRECTORY, use f->IsDir instead.

Should we compress the tiny7zx with LZMA and reuse the LZMA decoder?

* Probably not worth writing, won't be smaller than the state of the art
  tiny7zx.upx.
* Sizes: 
  lzmadec (7z LzmaDec.c) is 7853 bytes
  -rw-r----- 1 pts eng 14070 Feb  2 08:48 tiny7zx.7z
  -rwxr-x--- 1 pts eng 47488 Jan 19  2014 tiny7zx.dynamic
  -rw-r----- 1 pts eng 14373 Feb  2 08:48 tiny7zx.lzma
  -rwxr-xr-x 1 pts eng 24396 Feb  1 17:35 tiny7zx.unc
  -rwxr-xr-x 1 pts eng 16640 Feb  1 17:35 tiny7zx.upx
  -rwxr-xr-x 1 pts eng 18424 Feb  1 17:35 tiny7zx.upx_lzma
  -rwxr-x--- 1 pts eng 32208 Jan 19  2014 tiny7zx.xstatic
  -rw-r----- 1 pts eng 14392 Feb  2 08:48 tiny7zx.xz
  -rw-r----- 1 pts eng 14384 Feb  2 08:47 tiny7zx.xz7z
* Try1:
** Estimated size of the LZMA decoder (LzmaDec.c): 7600 bytes.
** Estimated size of the ELF stub running the LZMA decoder: 300 bytes.
** Estimated size of tiny7zx.unc without the LZMA decoder: 24400 - 7400 == 17000 bytes.
** Estimated size of tiny7zx.lzma without the LZMA decoder: 17000 * (14400 / 24400) = 10000 bytes.
** Estimated size of the output ELF: 7600 + 300 + 10000 == 17900 bytes.
** ... So probably not worth writing, because tiny7zx.upx is already 16640 bytes.
* Try2:
** Estimated size of the LZMA decoder from UPX, in assembly: 2900 bytes.
** Estimated size of the ELF stub running the LZMA decoder: 300 bytes.
** Estimated size of tiny7zx.unc without the LZMA decoder: 24400 - 7400 == 17000 bytes.
** Estimated size of tiny7zx.lzma without the LZMA decoder: 17000 * (14400 / 24400) = 10000 bytes.
** Estimated size of the output ELF: 2900 + 300 + 10000 == 13200 bytes.
** ... So probably worth douing, because tiny7zx.upx is about 16640 bytes, we could save about 3400 bytes
   -- but only if the LZMA decoder from UPX can be used modularly, e.g. in
   Lzma2Dec.c.


Typical values:

* pts-letsencrypt-0.5.0.2.sfx.7z (solid archive with 3131 files+directories and 2 solid blocks): NumPackStreams=2 NumFolders=2 NumFiles=3131
* manyf1.7z (solid archive with 100000 empty files and 1 directory): NumPackStreams=0 NumFolders=0 NumFiles=100001
* hello_unwritable_nosolid.7z (non-solid archive with 5 nonempty files and 2 directories): NumPackStreams=5 NumFolders=5 NumFiles=7
* NumPackStreams = NumFolders = is_solid ? number_of_solid_blocks : number_of_nonempty_files;
* NumFiles = number_of_files + number_of_directories;

__END__
