2012年5月10日 星期四

How to Write a Simple Packer/Unpacker with a Self-Extractor (SFX)


Introduction

In this article I will show how to write a file packer/unpacker and how to make a self-extracting version of the archive (SFX).
Please note this article and code has been written for learning purposes and not for complex functionality, thus the following limitations apply:
  • Only packing of files (binding them into one file) and no compression
  • Packer doesn't pack files in subdirectories
  • Packer header is not really optimized - just enough for our purposes
  • All code presented here compiles as a console application and no GUI version is provided

The Archive File Format

The idea is to build a structure/format that will allow us to hold a file list and file contents in one file in such a way that we will be able to restore the files to their original state.
Thus this design of the pack header:
  • Signature - Offset 0x02/DWORD
    This will occupy the first 4 bytes of the header. It will contain a simple signature that will allow us to identify our packed files.
  • NumOfFiles - Offset 0x04/DWORD
    Here we stored a DWORD holding the number of files in a subject.
  • FilesInfo - Offset 0x08/sizeof(packdata_t)
    Here we start storing the file information in a sequence defined as the array packdata_t FileInfo[NumOfFiles].
    The packdata_t structure is defined as:
    struct packdata_t
    {
      char FileName[MAX_PATH];
      long filesize;
    }
    As you noticed, we simply save the file's size and name. The packdata_t structure is not the optimal way of storing file names or information, because we could have used a variable length packdata_t struct defined as
    struct packdata_t
    {
      long filesize;
      // Other file info, such as creation date , attributes, ...
      char filenameLength;
      char FileName[1];
    }
    But, of course, managing this last struct is beyond the scope of this article.
After the pack header we have the files' contents stored in sequence. So the whole archive file format will look like this:
Signature
NumOfFiles
packdata_t Files[NumOfFiles]
File1 content
File2 content
.
.
.
File(NumOfFiles) content

Writing the Packer

In order to make the code a little extensible, I have defined a structure that will hold callback functions triggered from inside the packer/unpacker routines. These callbacks are used for visual notifications and updates.
The callback struct is defined as:
typedef struct
{
  void (*newfile)(char *name, long size);
  void (*fileprogress)(long pos);
} packcallbacks_t;
The newfile() callback is called whenever the packer/unpacker encounters or processes a new file. It will be passed the file's name and size.
The fileprogress() callback is called whenever an operation is in progress. It will be passed the current position that the packer/unpacker is currently processing.
Now, let us define the packfiles function prototype:
int packfilesEx(char *path, char *mask, char *archive,
  packcallbacks_t * pcb = NULL);
  • We need a path that will designate the source directory.
  • The mask which will tell us what files to search for and pack.
  • The archive which will hold the archive file name.
  • An optional pcb which will hold a list of callbacks used for visual notifications.
Before going to the code, here is the packfilesEx() code flow:
  1. Build packdata_t array of all files to be packed (storing their names and size)
  2. Create the archive file and write in it the Signature and file count
  3. Write the packdata_t array into the archive
  4. Start reading every file and write its content in the archive
  5. Loop (4) until all files are stored
  6. Close the archive file
This operation is enough to pack all files into one single archive file. Now we go straight to the code:
int packfilesEx(char *path, char *mask, char *archive, packcallbacks_t *pcb)
{
  TCHAR szCurDir[MAX_PATH];

  // define a vector that will hold the packdata_t array.
  // STL Vectors are stored in contiquous memory.
  std::vector<packdata_t> filesList;
  
  // make sure the current source directory is valid 
  // and change working directory to it if so.

  // save current directory
  GetCurrentDirectory(MAX_PATH, szCurDir);

  // go to new working directory
  if (!SetCurrentDirectory(path))
    return packerrorPath;
    
  WIN32_FIND_DATA fd;
  HANDLE findHandle;
  packdata_t pdata;

  findHandle = FindFirstFile(mask, &fd);
  if (findHandle == INVALID_HANDLE_VALUE)
    return packerrorNoFiles;

  long lTemp;

  // this loop is for storing file's headers only
  // directories are omitted
  do
  {
    // skip directory entries
    if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
      == FILE_ATTRIBUTE_DIRECTORY)
      continue;

    // clear record
    memset(&pdata, 0, sizeof(pdata));

    // fill packdata entry
    strcpy(pdata.filename, fd.cFileName);
    pdata.filesize = fd.nFileSizeLow;

    // save entry
    filesList.push_back(pdata);
  } while(FindNextFile(findHandle, &fd));
  FindClose(findHandle);

  FILE *fpArchive = fopen(archive, "wb");
  if (!fpArchive)
    return packerrorCannotCreateArchive;

  // write signature
  lTemp = 'KCPL'; // lallous pack! (L-PCK)
  fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);

  // write entries count
  lTemp = filesList.size();
  fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);

  // store files entries (since std::vector stores elements
  // in a linear manner)
  fwrite(&filesList[0], sizeof(pdata), filesList.size(), fpArchive);

  // process all files to copy
  for (unsigned int cnt=0;cnt<filesList.size();cnt++)
  {
      FILE *inFile = fopen(filesList[cnt].filename, "rb");
    long size = filesList[cnt].filesize;

    // if callback assigned then trigger it
    if (pcb && pcb->newfile)
      pcb->newfile(filesList[cnt].filename, size);

    // copy file name
    long pos = 0;
    while (size > 0)
    {
      char buffer[4096];
      long toread = size > sizeof(buffer) ? sizeof(buffer) : size;
      fread(buffer, toread, 1, inFile);
      fwrite(buffer, toread, 1, fpArchive);
      pos += toread;
      size -= toread;
      if (pcb && pcb->fileprogress)
        pcb->fileprogress(pos);
    }
    fclose(inFile);
  }

  // close archive and restore working directory
  fclose(fpArchive);

  SetCurrentDirectory(szCurDir);
  return packerrorSuccess;
}

Writing the Unpacker

As the packing process has been explained in details, the unpacking part become more obvious; therefore, only the code flow will be presented:
  1. Open archive file
  2. Read pack header
  3. Verify signature - if not valid - report and exit
  4. Having read the pack header (SignatureNumOfFilespackdata_t array) start extracting the files
  5. Create a new file named packdata_t[idx].FileName and write its contents from the archive file
  6. Process next file
  7. close archive file and exit
int unpackfileEx(char *archive, char *dest, packcallbacks_t * pcb,
  long startPos)
{
  FILE *fpArchive = fopen(archive, "rb");

  // failed to open archive?
  if (!fpArchive)
    return packerrorCouldNotOpenArchive;

  long nFiles;

  if (startPos)
    fseek(fpArchive, startPos, SEEK_SET);

  // read signature
  fread(&nFiles, sizeof(nFiles), 1, fpArchive);
  if (nFiles != 'KCPL')
    return (fclose(fpArchive), packerrorNotAPackedFile);

  // read files entries count
  fread(&nFiles, sizeof(nFiles), 1, fpArchive);

  // no files?
  if (!nFiles)
    return (fclose(fpArchive), packerrorNoFiles);

  // read all files entries
  std::vector<packdata_t> filesList(nFiles);
  fread(&filesList[0], sizeof(packdata_t), nFiles, fpArchive);

  // loop in all files
  for (unsigned int i=0;i<filesList.size();i++)
  {
    FILE *fpOut;
    char Buffer[4096];
    packdata_t *pdata = &filesList[i];

    // trigger callback
    if (pcb && pcb->newfile)
      pcb->newfile(pdata->filename, pdata->filesize);

    strcpy(Buffer, dest);
    strcat(Buffer, pdata->filename);
    fpOut = fopen(Buffer, "wb");
    if (!fpOut)
      return (fclose(fpArchive), packerrorExtractError);

    // how many chunks of Buffer_Size is there is in filesize?
    long size = pdata->filesize;
    long pos = 0;
    while (size > 0)
    {
      long toread =  size > sizeof(Buffer) ? sizeof(Buffer) : size;
      fread(Buffer, toread, 1, fpArchive);
      fwrite(Buffer, toread, 1, fpOut);
      pos += toread;
      size -= toread;
      if (pcb && pcb->fileprogress)
        pcb->fileprogress(pos);
    }
    fclose(fpOut);
    nFiles--;
  }
  fclose(fpArchive);
  return packerrorSuccess;
}

Writing the Self-Extractor (SFX)

The SFX is simply a special version of the unpacker (we will call it UnpackerStub) that instead of taking the archive file as command line it will look for an archive file that is embedded into it.
If you are a math geek you can think of an SFX as "UnpackerStub.exe + Archive.bin = UnpackerArchive.exe".
Now how to embed the archive file into the unpacker to form an SFX?
In order to do that we need to write some information in the UnpackerStub that will help it locate the Archive.binbody.
For this purpose I use the e_res2 field in the IMAGE_DOS_HEADER to store a pointer to the archive data inside the unpacker stub.
Every executable has a well documented and defined format that will instruct and tell the OS how to load/run it. TheIMAGE_DOS_HEADER (defined in WINNT.H) is located at offset zero of every exectuable and has the following fields:
typedef struct _IMAGE_DOS_HEADER {    // DOS .EXE header
  WORD   e_magic;                     // Magic number
  WORD   e_cblp;                      // Bytes on last page of file
  WORD   e_cp;                        // Pages in file
  WORD   e_crlc;                      // Relocations
  WORD   e_cparhdr;                   // Size of header in paragraphs
  WORD   e_minalloc;                  // Minimum extra paragraphs needed
  WORD   e_maxalloc;                  // Maximum extra paragraphs needed
  WORD   e_ss;                        // Initial (relative) SS value
  WORD   e_sp;                        // Initial SP value
  WORD   e_csum;                      // Checksum
  WORD   e_ip;                        // Initial IP value
  WORD   e_cs;                        // Initial (relative) CS value
  WORD   e_lfarlc;                    // File address of relocation table
  WORD   e_ovno;                      // Overlay number
  WORD   e_res[4];                    // Reserved words
  WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
  WORD   e_oeminfo;                   // OEM information; e_oemid specific
  WORD   e_res2[10];                  // Reserved words
  LONG   e_lfanew;                    // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
I store a pointer to the archive file address into the e_res2 field which is large enough to hold a DWORD. After storing the pointer to the archive, I make sure to append the archive content into the UnpackerStub at that pointer location.
Two functions has been written to get/store the pointer of the archive data:
int SfxSetInsertPos(char *filename, long pos)
{
  FILE *fp = fopen(filename, "rb+");
  if (fp == NULL)               
    return packerrorCouldNotOpenArchive;

  IMAGE_DOS_HEADER idh;

  // read dos header
  fread((void *)&idh, sizeof(idh), 1, fp);

  // adjust position value in an unused MZ field
  *(long *)&idh.e_res2[0] = pos;

  // update header
  rewind(fp);
  fwrite((void *)&idh, sizeof(idh), 1, fp);
  fclose(fp);
  return packerrorSuccess;
}
This function will store the pointer. First it reads the header, updates the e_res2 field then writes the header back again.
int SfxGetInsertPos(char *filename, long *pos)
{
  FILE *fp = fopen(filename, "rb");
  if (fp == NULL)
    return packerrorCouldNotOpenArchive;

  IMAGE_DOS_HEADER idh;

  fread((void *)&idh, sizeof(idh), 1, fp);
  fclose(fp);
  *pos = *(long *)&idh.e_res2[0];
  return packerrorSuccess;
}
This function will read the header and extract the value from the e_res2 field.
In short, the unpacker stub works like this:
  1. Call SfxGetInsertPos() to get the position of the archive file
  2. Call the UnpackFilesEx() while passing the position (start of embedded archive.bin) of the archive file and the archive filename which is itself (computed by calling GetModuleFileName(NULL, ...)
Now I continue to describe how the Packer builds the SFX:
// check if unpackerstub.exe exists
  if (GetFileAttributes(sfxStubFile) == (DWORD)-1)
    {
      printf("SFX stub file not found!");
      return 1;
    }

    // open archive file
    FILE *fpArc = fopen(argv[3], "rb");
    if (!fpArc)
    {
      printf("Failed to open archive!\n");
      return 1;
    }
    // get archive size
    fseek(fpArc, 0, SEEK_END);
    long arcSize = ftell(fpArc);
    rewind(fpArc);

    // form output sfx file name
    char sfxName[MAX_PATH];
    strcpy(sfxName, argv[3]);
    strcat(sfxName, ".sfx.exe");

    // take a copy from SFX
    if (!CopyFile(sfxStubFile, sfxName, FALSE))
    {
      fclose(fpArc);
      printf("Could not create SFX file!\n");
      return 1;
    }

    // append data to SFX
    FILE *fpSfx = fopen(sfxName, "rb+");
    fseek(fpSfx, 0, SEEK_END);

    // get SFX size before archive appending
    long sfxSize = ftell(fpSfx);

    // start appending from archive file to the end of SFX file
    char Buffer[4096 * 2];
    while (arcSize > 0)
    {
      long rw = arcSize > sizeof(Buffer) ? sizeof(Buffer) : arcSize;
      fread(Buffer, rw, 1, fpArc);
      fwrite(Buffer, rw, 1, fpSfx);
      arcSize -= rw;
    }
    fclose(fpArc);
    fclose(fpSfx);

    // mark archive data position inside SFX
    SfxSetInsertPos(sfxName, sfxSize);

    // delete archive file while keeping only the SFX
    DeleteFile(argv[3]);

    printf("SFX created: %s\n", sfxName);
That's all!

沒有留言:

張貼留言