'=========================================================================== ' Subject: HUFFMAN COMPRESSION ROUTINE Date: 12-18-97 (20:04) ' Author: Scott Spiker Code: QB, QBasic, PDS ' Origin: sspiker@hypercom.com Packet: ALGOR.ABC '=========================================================================== 'HUFF.BAS, by Scott Spiker (sspiker@hypercom.com) '12/15/97 'This file is set up to be called from a QB45 main module. Using the QB45 'editor, use File, Load to read in the file. The two subs to use are 'CompressFile and ExpandFile. All other routines are called from these two 'primary subs. ' 'This is a basic Huffman compression routine, written from the excellent book 'The Data Compression Book, second edition. It is basically a translation 'from 'C' code to Basic. ' 'Please feel free to use any or all of this code, and pass it along to 'others interested in data compression. I would suggest getting the Data 'Compression Book if you are interested in following along to get a better 'understanding of how the trees work. I will not explain Huffman encoding 'as there are plenty of good explainations on the Net. All I ask is for 'feedback at sspiker@hypercom.com if you find this code of any use. DEFINT A-Z DECLARE FUNCTION BuildTree% () DECLARE FUNCTION GetCodeBit% (FileHan%) DECLARE SUB CompressFile (InFile$, OutFile$, OrigSize&, CompSize&) DECLARE SUB CountBytes (FileHan%, FileLen&) DECLARE SUB ExpandFile (OrigFile$, CompressedFile$) DECLARE SUB GetBuf (FileHan%) DECLARE SUB OutputBits (code%, Bits%, OutFileHan%) DECLARE SUB PrintNodes () DECLARE SUB ScaleBytes () DECLARE SUB TreeToCode (CodeSoFar%, Bits%, Node%) DECLARE SUB WriteBuf (FileHan%, LastByte%) DECLARE SUB WriteCodes (FileHan%, FileLen&, OutFileHan%) TYPE NodeType 'Define a node Count AS LONG 'char count SavedCount AS LONG 'Use this to build header Child0 AS INTEGER 'pointer to 0 brach Child1 AS INTEGER 'pointer to 1 brach END TYPE TYPE CodeType 'Define a code code AS INTEGER 'Code value CodeBits AS INTEGER 'Number of bits END TYPE CONST BufSize = 1024 'disk buffer size CONST EndOfStream = 256 'used to mark end of file DIM SHARED Node(0 TO 513) AS NodeType DIM SHARED Codes(0 TO 256) AS CodeType DIM SHARED ByteCount(256) AS LONG DIM SHARED OutBuf(BufSize) AS INTEGER FUNCTION BuildTree% 'Building the Huffman tree Node(513).Count = &HFFFF& FOR NextFree = EndOfStream + 1 TO 512 Min1 = 513 Min2 = 513 FOR x = 0 TO NextFree - 1 IF Node(x).Count > 0 THEN IF Node(x).Count < Node(Min1).Count THEN Min2 = Min1 Min1 = x ELSEIF Node(x).Count < Node(Min2).Count THEN Min2 = x END IF END IF NEXT x IF Min2 = 513 THEN 'Only one min found, means EXIT FOR ' this is top node END IF Node(NextFree).Count = Node(Min1).Count + Node(Min2).Count Node(Min1).SavedCount = Node(Min1).Count Node(Min1).Count = 0 Node(Min2).SavedCount = Node(Min2).Count Node(Min2).Count = 0 Node(NextFree).Child0 = Min1 Node(NextFree).Child1 = Min2 NEXT NextFree NextFree = NextFree - 1 Node(NextFree).SavedCount = Node(NextFree).Count BuildTree = NextFree END FUNCTION SUB CompressFile (IFile$, OFile$, OrigSize&, CompSize&) IFileHan = FREEFILE 'Get next available file handle OPEN IFile$ FOR BINARY AS IFileHan 'Open the file in binary mode IFileLen& = LOF(IFileHan) 'Get the input file length CountBytes IFileHan, IFileLen& 'Get character counts ScaleBytes 'Scale to single byte lengths RootNode = BuildTree 'Find the root node TreeToCode 0, 0, RootNode 'Calculate codes OFileHan = FREEFILE 'Get next available file handle OPEN OFile$ FOR OUTPUT AS OFileHan 'Forces file to 0 bytes CLOSE OFileHan ' OPEN OFile$ FOR BINARY AS OFileHan 'Open output in binary mode WriteCodes IFileHan, IFileLen&, OFileHan 'Output huffman codes OrigSize& = LOF(IFileHan) 'Return original file len CompSize& = LOF(OFileHan) 'Return compreesed file len CLOSE IFileHan 'Close the files CLOSE OFileHan END SUB SUB CountBytes (FileHan, FileLen&) ERASE ByteCount, Node, Codes InBuf$ = SPACE$(BufSize) 'Set up a buffer FOR b& = 1 TO FileLen& 'Count bytes thru entire file IF BufPtr = 0 THEN 'Used to trigger buf read GET #FileHan, , InBuf$ 'Read a buffer BufPtr = 1 'set the pointer END IF Byte = ASC(MID$(InBuf$, BufPtr, 1)) 'Get the byte value ByteCount&(Byte) = ByteCount&(Byte) + 1 'Inc byte count BufPtr = BufPtr + 1 'Move the buf pointer IF BufPtr > BufSize THEN 'If past end, reset ptr BufPtr = 0 'Causes new buf to be read END IF NEXT b& SEEK FileHan, 1 'Restore file pointer END SUB SUB ExpandFile (OrigFile$, CompFile$) CompFileHan = FREEFILE 'Get next file handle OPEN CompFile$ FOR BINARY AS #CompFileHan 'Open the compressed file OrigFileHan = FREEFILE 'Get next file handle OPEN OrigFile$ FOR OUTPUT AS #OrigFileHan 'Force file to 0 bytes CLOSE OrigFileHan OPEN OrigFile$ FOR BINARY AS #OrigFileHan 'Open the output file Header$ = SPACE$(257) 'Holds header GET #CompFileHan, , Header$ 'Read header from file FOR x = 2 TO 257 Node(x - 2).Count = ASC(MID$(Header$, x, 1)) NEXT x Node(EndOfStream).Count = 1 'Set up EndOfStream count RootNode = BuildTree 'Find root node DO IF BufPtr = 0 THEN 'If 0, read buffer Buf$ = SPACE$(BufSize) 'Make a buffer BufPtr = 1 'Set pointer END IF Node = RootNode DO IF GetCodeBit(CompFileHan) THEN Node = Node(Node).Child1 ELSE Node = Node(Node).Child0 END IF LOOP WHILE Node > EndOfStream IF Node = EndOfStream THEN Buf$ = MID$(Buf$, 1, BufPtr - 1) PUT OrigFileHan, , Buf$ EXIT DO END IF MID$(Buf$, BufPtr, 1) = CHR$(Node) BufPtr = BufPtr + 1 IF BufPtr > BufSize THEN BufPtr = 0 PUT OrigFileHan, , Buf$ END IF LOOP d = GetCodeBit(-1) 'Reset pointers CLOSE IFileHan CLOSE OFileHan END SUB FUNCTION GetCodeBit (FileHan) STATIC Bit AS INTEGER, BytePtr AS INTEGER, Byte AS INTEGER STATIC Buf AS STRING IF FileHan < 0 THEN 'Set to -1 to reset buffer BytePtr = 0 'reset pointer Buf$ = "" 'Delete buffer EXIT FUNCTION 'All we do here END IF IF BytePtr = 0 THEN 'Is 0 only on first entry Buf$ = SPACE$(BufSize) 'Allocate buffer GET FileHan, , Buf$ 'Get a buffer Bit = -1 'Reset the bit END IF IF Bit = -1 THEN 'Get new byte BytePtr = BytePtr + 1 'Inc BytePtr in buffer IF BytePtr > BufSize THEN 'Need a new buffer GET FileHan, , Buf$ BytePtr = 1 'Reset pointer END IF Byte = ASC(MID$(Buf$, BytePtr, 1)) 'Get the byte Bit = 7 'set test bit END IF IF Byte AND 2 ^ Bit THEN 'test for a 1 GetCodeBit = 1 'set return if 1 END IF Bit = Bit - 1 'dec to test next bit END FUNCTION SUB OutputBits (code%, Bits%, OutFileHan%) STATIC BitPtr IF Bits = -1 THEN 'Flush Buffer if -1 IF BitPtr THEN 'Is there a buf to write? BytePtr = (BitPtr - 1) \ 8 + 1 'Calc which byte we are on WriteBuf OutFileHan, BytePtr 'Send buf to disk BitPtr = 0 'Reset END IF EXIT SUB END IF FOR b = Bits - 1 TO 0 STEP -1 IF BitPtr = 0 THEN ERASE OutBuf 'Sets array to 0's BitPtr = 1 'Set bit pointer END IF IF (code AND 2 ^ b) THEN 'Write a 1 here BytePtr = (BitPtr - 1) \ 8 + 1 'Calc which byte we are on Mask = 7 - ((BitPtr - 1) MOD 8) 'Calc the bit map OutBuf(BytePtr) = OutBuf(BytePtr) OR 2 ^ Mask END IF BitPtr = BitPtr + 1 IF BitPtr > (BufSize * 8) THEN 'Past end of buffer WriteBuf OutFileHan, BufSize 'Write to disk BitPtr = 0 'Reset pointer END IF NEXT b END SUB SUB PrintNodes 'Debug routine, called to disply node table. File$ = "NODES" 'Can write to file instead File$ = "SCRN:" 'Causes output to write to disp FileHan = FREEFILE 'get next file handle OPEN File$ FOR OUTPUT AS #FileHan 'open output file FOR x = 0 TO 256 h$ = HEX$(x) 'hex value IF LEN(h$) = 1 THEN 'Adjust to 2 nibbles h$ = "0" + h$ END IF IF Node(x).Count OR Node(x).SavedCount <> 0 THEN PRINT #FileHan, USING "### "; x; IF x > &H20 AND x < &HFF THEN PRINT #FileHan, CHR$(x); ELSE PRINT #FileHan, " "; END IF PRINT #FileHan, USING " ####"; Node(x).Count; PRINT #FileHan, USING " ####"; Node(x).SavedCount; PRINT #FileHan, USING " ####"; Node(x).Child0; PRINT #FileHan, USING " ####"; Node(x).Child1; IF x <= 256 THEN PRINT #FileHan, " "; FOR b = Codes(x).CodeBits - 1 TO 0 STEP -1 IF Codes(x).code AND 2 ^ b THEN PRINT #FileHan, " 1"; ELSE PRINT #FileHan, " 0"; END IF NEXT b PRINT #FileHan, "" ELSE PRINT #FileHan, "" END IF END IF NEXT x CLOSE FileHan END SUB SUB ScaleBytes FOR x = 0 TO 255 'Find max count in array IF ByteCount&(x) > MaxCount& THEN MaxCount& = ByteCount&(x) END IF NEXT x MaxCount& = MaxCount& / 255 'Adjust scaling factor MaxCount& = MaxCount& + 1 FOR x = 0 TO 255 'write scaled count to nodes Node(x).Count = ByteCount&(x) \ MaxCount& IF Node(x).Count = 0 AND ByteCount(x) > 0 THEN Node(x).Count = 1 'don't let it go to 0 END IF NEXT x Node(EndOfStream).Count = 1 'Set end of stream flag END SUB SUB TreeToCode (CodeSoFar%, Bits%, Node%) IF Node <= EndOfStream THEN Codes(Node).code = CodeSoFar Codes(Node).CodeBits = Bits EXIT SUB END IF CodeSoFar = CodeSoFar * 2 'Shift left 1 bit Bits = Bits + 1 SaveCodeSoFar = CodeSoFar 'Have to save these becuase SaveBits = Bits ' they are changed during ' recursion TreeToCode CodeSoFar, Bits, Node(Node).Child0 TreeToCode (SaveCodeSoFar OR 1), SaveBits, Node(Node).Child1 END SUB SUB WriteBuf (FileHan%, LastByte%) Buf$ = SPACE$(LastByte) 'Build output buffer FOR x = 1 TO LastByte MID$(Buf$, x, 1) = CHR$(OutBuf(x)) 'move byte to buffer NEXT x PUT #FileHan, , Buf$ 'write the buffer to disk END SUB SUB WriteCodes (IFileHan%, IFileLen&, OFileHan%) InBuf$ = SPACE$(BufSize) 'Set up the buffer Header$ = CHR$(0) 'File format, reserved for 'future use. FOR x = 0 TO 255 Header$ = Header$ + CHR$(Node(x).SavedCount) NEXT x PUT #OFileHan, , Header$ 'Write the header FOR b& = 1 TO IFileLen& IF BufPtr = 0 THEN 'Used to trigger buf read GET #IFileHan, , InBuf$ BufPtr = 1 'set the pointer END IF Byte = ASC(MID$(InBuf$, BufPtr, 1)) 'Get the byte value BufPtr = BufPtr + 1 'Move the buf pointer IF BufPtr > BufSize THEN 'If past end, reset ptr BufPtr = 0 'Causes new buf to read END IF OutputBits Codes(Byte).code, Codes(Byte).CodeBits, OFileHan NEXT b& 'Now send out the end of stream character, the decoder uses this to 'know when the decoding is finished. OutputBits Codes(EndOfStream).code, Codes(EndOfStream).CodeBits, OFileHan OutputBits 0, -1, OFileHan 'Flushes buffer END SUB