Fixes for issue #110 by miller-alex · Pull Request #113 · adplug/adplug

miller-alex · 2020-05-19T16:59:04Z

I just wanted to show some WIP on fixing #110. For now, I've

added all the fuzzing samples (redundant samples can be removed/disabled later if tests are too slow),
fixed the crashes in dtm.cpp and cleaned up the file a bit,
fixed some crashes in a2m.cpp (not finished yet) and did some cleanup/refactoring there.

There's still a lot to do. If somebody wants to help or has comments/suggestions, just add your comments below.

Lionel Debroux found multiple additional errors while fuzzing with AFL++. He provided 136 samples triggering errors in various source files, apparently mostly OOB reads of various sizes, at least one overlapping memcpy, and at least one wild read. Add all samples to the tests. Bug: adplug#110 Reported-by: Lionel Debroux <lionel_debroux@yahoo.fr>

CdtmLoader::rewind() assigns a default instrument to each channel matching the channel number, so at least 9 instruments are needed to prevent illegal memory accesses. This fixes crashes for the following fuzzing samples: i-110_001.dtm, i-110_002.dtm, i-110_003.dtm, i-110_004.dtm, i-110_031.dtm, i-110_032.dtm, i-110_039.dtm, i-110_040.dtm, i-110_052.dtm, i-110_053.dtm, i-110_054.dtm, i-110_055.dtm, i-110_059.dtm, i-110_061.dtm, i-110_063.dtm, i-110_064.dtm, i-110_100.dtm, i-110_110.dtm, i-110_111.dtm, i-110_112.dtm, i-110_113.dtm, i-110_114.dtm, i-110_115.dtm, i-110_116.dtm, i-110_117.dtm, i-110_118.dtm, i-110_119.dtm, i-110_120.dtm, i-110_121.dtm, i-110_122.dtm, i-110_125.dtm, i-110_133.dtm. Bug: adplug#110

Improve readability of the code: * Make conversion tables static. * Replace many literal numbers with sizeof or enums. Add a bunch of enums to the class in the header for that purpose. * Use binistream::readString() instead of reading bytes in a loop. * Declare loop counters locally in the loops. * Make pattern a dtm_event[...][...] array with automatic storage. Cast it to (unsigned char *) only for unpack_pattern() and avoid all size and offset calculations. * Add a variable to replace multiple copies of (event->byte1 & 15) during pattern conversion. * Update track counter t along with j directly in for loop. * Update formatting in a few places (mostly for loops; add spaces and move braces into the same line). * Remove unneccessary input/output aliases in unpack_pattern(). * Use memset to repeat bytes for RLE decoding. No functional changes.

* Refuse to load a file with zero patterns. * Fail if an instrument name is too long. * Ignore instrument events with invalid instrument number. * Fail if order list contains invalid pattern numbers or restart position. This commit fixes invalid memory accesses for the following fuzzing samples: i-110_060.dtm, i-110_128.dtm. Bug: adplug#110

Rework unpack_pattern() such that the unpacked pattern is read directly from a file instead of from a memory buffer. Also return a bool indicating success instead of the unpacked length (which is known in advance). The complexity of the method doesn't change much, but its usage becomes much simpler, and the repeated buffer allocation/deallocation goes away. No functional changes.

Lionel Debroux added another batch of fuzzing samples to issue adplug#110 yesterday. Hook up the 43 new samples in stresstest, too. Crashes for the following new fuzzing samples have already been fixed by the last few commits, though: i-110_139.dtm, i-110_143.dtm, i-110_164.dtm, i-110_165.dtm, i-110_166.dtm, i-110_176.dtm, i-110_178.dtm, i-110_179.dtm. Bug: adplug#110 Reported-by: Lionel Debroux <lionel_debroux@yahoo.fr>

In Ca2mLoader::inputcode() and Ca2mLoader::uncompress(), ibufcount should be compated to input_size, not MAXBUF. Moreover, the former method had a type and compared the wrong variable. Divide input_size by two to make it a count of shorts, not bytes. Check for empty compressed input early. Wrapping around the input/output buffer counters on overflow still looks bogus. It would probably better to abort decoding instead. This commit fixes crashes for the following fuzzing samples: i-110_006.a2m, i-110_010.a2m, i-110_044.a2m, i-110_074.a2m, i-110_076.a2m, i-110_108.a2m, i-110_132.a2m, i-110_137.a2m, i-110_170.a2m. i-110_008.a2m loads now, but doesn't finish to play in a reasonable time frame; i-110_042.a2m and i-110_073.a2m hang during load instead of crashing. The following samples now crash in a different place: i-110_005.a2m, i-110_007.a2m, i-110_011.a2m, i-110_041.a2m, i-110_075.a2m, i-110_077.a2m, i-110_078.a2m, i-110_091.a2m. Bug: adplug#110

Due to integer division with wrong rounding, one extra element of len would be added to the sum when numpats was a multiple of 8 (version >= 5) or 16. That was mostly benign, but for a version >= 5 file with 64 patterns that would read past the end of the len array. Fix the computation and simplify block handling by introducing a variable holding the number of blocks. While at it, simplify block decoding substantially by converting the unrolled mess to a loop, fixing the misleading formatting (or missin braces) along the way. Also check that there is at least one pattern in the file. This commit fixes an OOB read for fuzzing sample i-110_129.a2m. Bug: adplug#110

Fix transposed digits in MAXDISTANCE. The correct value is 21839, not 21389. (That was hard to spot!) This typo could cause k to wrap around to a value near 64k, way out of bounds to access buf. Also stop decoding when input or output buffers are exhausted instead of wrapping around to the beginning to avoid possibly endless hangs in such cases. This commit fixes crashes and hangs for the following fuzzing samples: i-110_005.a2m, i-110_009.a2m, i-110_038.a2m, i-110_041.a2m, i-110_042.a2m, i-110_045.a2m, i-110_073.a2m, i-110_107.a2m, i-110_124.a2m, i-110_126.a2m, i-110_138.a2m, i-110_157.a2m, i-110_159.a2m, i-110_168.a2m. The following fuzzing samples now load successfully, but crash or hang during playback: i-110_006.a2m, i-110_007.a2m, i-110_010.a2m, i-110_011.a2m, i-110_075.a2m, i-110_076.a2m, i-110_077.a2m, i-110_078.a2m, i-110_091.a2m. Bug: adplug#110

Fuzzing samples may cause hangs, then stresstest will never complete. Stop playback if we reach the one hour mark in a song or kill the process if a sample runs for a minute real time. Also show a warning for invalid refresh values and improve misleading wording when a sample doesn't load successfully.

Move the class members and methods used for decoding compressed files to a new subclass. This subclass is only instantiated in static method sixdepak::decode() (formerly sixdepak()), so decompression data doesn't clutter the player class. Note that the sixdepak subclass could easily moved out of Ca2mLoader class and made local to the .cpp file or moved to its own header. Remove all preprocessor constants and const members along the way and replace them with enum constants. Replace the lookup tables with static member functions (one of them is still implemented with a lookup table, but the other two are simpler and faster with arithmetic). The old decode() method is renamed to do_decode() and returns the decoded length now. The name is now used for the entry point, and the code from the old sixdepak() partially went into the constructor, do_decode(), and the new decode(). It allocates a sixdepak instance dynamically, so buf became an array an is no longer allocated separately. Note that output_size is now unused. Future changes should use it for the size of the destination buffer instead of hardcoded MAXBUF. No functional changes.

It's possible now to specify a length for the output buffer passed to sixdepak::decode() (and the sixdepak constructor). Change buffer lengths to size_t while at it. Allocated buffer sizes have not changed yet.

It's known in advance how much data is needed when decoding packed blocks. Allocate destination buffers for sixdepak::decode() just large enough to hold the required amount of data. Decoding will stop when the buffer is full and any excess data is ignored.

Using a sliding window buffer to copy from makes only sense for streaming output. Since the sixdepak class writes its output entirely to a memory buffer, we can copy directly from the output buffer. Just remove the buf array and the corresponding code from do_decode(). Also make obufcount a local variable since it's not used outside of do_decode().

…l change)

* Add fixstringlength() helper that limits the length byte of pascal style strings to the available space. This seems to be useful for other players and may be used to a common utilities header later. Replace the manual checks. * Replace some literal numbers with sizeof(...) or enum constants. * Declare some variables where they are used first. * Whitespace changes.

Out of range instrument numbers cause invalid memory accesses during playback. Replace them with zeroes while converting track data in Ca2mLoader::load(); This fixes crashes in CmodPlayer::update() for the fuzzing samples below, but they still access notetable[-1] in CmodPlayer::setnote() and the first 2 now start to write out of range (negative) data to the OPL registers: i-110_006.a2m, i-110_075.a2m, i-110_076.a2m. Bug: adplug#110

With some effects it's possible that setnote() is called with note < 1. Clamp to 1 in that case. Simplify code a bit flow while at it. This fixes invalid array accesses of notetable with negative index for the following fuzzing samples: i-110_006.a2m, i-110_007.a2m, i-110_010.a2m, i-110_075.a2m, i-110_076.a2m. Bug: adplug#110

If all 64 orders are used, length is never initialized, leading to a crash in CmodPlayer::rewind(). Fix that by adding the missing initialization. This fixes loading for the following fuzzing samples, but both still crash during playback: i-110_016.cff, i-110_097.cff. Bug: adplug#110

There is only enough space for 36 patterns. Trying to load more will access memory past the end of the allocated arrays. Fix that by adding a check that the number is valid (also rejecting modules that are too short for the number of patterns or that have no patterns). This commit fixes OOB memory accesses while loading track data for the following fuzzing samples: i-110_012.cff, i-110_013.cff, i-110_014.cff, i-110_015.cff, i-110_017.cff, i-110_018.cff, i-110_019.cff, i-110_020.cff, i-110_021.cff, i-110_022.cff, i-110_023.cff, i-110_024.cff, i-110_092.cff, i-110_093.cff, i-110_101.cff, i-110_102.cff, i-110_103.cff. It also fixes the crashes in CmodPlayer::update() during playback for i-110_016.cff and i-110_097.cff. Bug: adplug#110

Don't even try to unpack the module if the recorded size is less than the unpacker's header id. This should fix the issues with fuzzing samples i-110_048.cff and i-110_105.cff that don't reproduce in my setup (due to different ASAN version, probably). Bug: adplug#110

* Issuing many 0x02 codepoints could increase code_length arbitrarily, leading to undefined behavior in get_code() and effectively ignoring large chunks of the input. Limit code_length to 16 bits, which is enough for a dictionary size of 0x8000. (We might choose to allow 32 bits instead; although not useful, RLE can use 32 bit counts anyway.) * Fix undefined shifts in get_code() and prevent an overflow in unpack() for 32 bit RLE counts when long is 32 bit. * Fail if RLE wants to repeat data beginning before the start of the output buffer. * Handle invalid codes gracefully in translate_code(). * Avoid to overflow heap in expand_dictionary() (also prevents dictionary overflow since the heap is exhausted first).

4 bytes of padding ain't enough to reliably detect end of data before reading past the end of the packed_module buffer. If the data runs out while processing an RLE code, up to 32 bits for repeat_counter an two more code points (up to 16 bits each) will be read before unpacking terminates.

* Replace a few literal numbers with sizeof(...); there are still a lot of numbers that should be replaced. * Change buffer allocation for module loading: allocate only header.size plus padding and swap buffers for packed files. Check for minimum module size before accessing data. * Reorder some blocks/statements to clarify or simplify things, merge identical code, or eliminate branches. * Simplify event pointer address calculation for track conversion by introducing a pointer to multi-dimensional array variable. * Initialize and update track counter in for loops headers. * Merge adjacent string literals. * Shrink scope of variable definitions. * Replace malloc/free with new/delete, use memcpy instead of open coded copy, drop unneccessary memset or use initialization instead. * Use a switch statement in cff_unpacker::unpack(). Differentiate between success and failure by using different labels when jumping out of the surrounding loop. * Remove dead stores to old_code. It's no longer used at all. * Eliminate superfluous temporary string and double copy from cff_unpacker::translate_code(). * Spacing, line breaks, and indentation changes. * Update a few comments.

Malvineous · 2020-05-24T05:29:55Z

I'm happy to merge WIP changes, but it would be nice if we could do it in such a way as the tests don't fail in the master branch, as it makes it difficult to pick up other unrelated failures coming in as PRs. I guess keeping the failing tests in separate branches and then rebase-merging them onto the master branch, along with the fixes when they are ready, would be a way to achieve this.

@miller-alex If this sounds good to you, let me know and I'll give you commit access to the project so you don't have to open PRs all the time.

The string "CUD-FM-File - SEND A POSTCARD -" should be present also in uncompressed files, I suppose (at least it is in the few examples I've seen). Move the check that was executed for packed files only into code that's common for both cases.

Move xadplayer_load() from the header to the .cpp file. Move reading the header and setting psi.instr_table and psi.seq_table from xadplayer_rewind() to xadplayer_load(). Add checks that the table offsets themselves and the table entries don't point past the end of tune. This fixes invalid reads when dereferencing those wild pointers in xadplayer_rewind() for the following fuzzing samples: i-110_030.xad, i-110_035.xad, i-110_036.xad, i-110_089.xad, i-110_096.xad, i-110_144.xad. Bug: adplug#110

…r_update()

The pointers to the channel's positions are updated directly in the tune data in xadplayer_update(). This is a questionable approach. Moreover, they are not reset to the original values in xadplayer_rewind(), so rewinding doesn't work as expected. Use a separate copy of channel pointers which is initialized in xadplayer_rewind() and modified in xadplayer_update() to rectify the situation.

* All arrays are sized for 9 channels, but the player uses only 8. Make the number (8) an enum constant and use that consistently (but leave the unused 9th row of the registers table unchanged). * Convert psi.looping to a scalar, used as bitfield. * Remove the lookup tables from the class interface. Declare them as static variables in the methods where they are used. Change their type to simplify access. * In the xadplayer_update() loop, remove a nesting level by moving the delay check up and using continue. * Make ptr a reference so it needn't be saved explicitly. * Whitespace and comments changes.

…ords

Add a check for minimal tune size to xadplayer_load(), so xadplayer_rewind() doesn't try to access data after the end of tune. Also fix the end check in xadplayer_update(). This fixes crahes in xadplayer_rewind() for the following fuzzing samples: i-110_034.xad, i-110_058.xad, i-110_123.xad, i-110_146.xad, i-110_161.xad, i-110_173.xad. Bug: adplug#110

CxadratPlayer accesses and copies data without checking tune_size, leading to OOB reads for truncated or manipulated files. Fix that by checking tune_size before accessing data and rejecting files that are too short. While at it, also check whether the number of channels is valid. This coommit fixes crashes for the fuzzing samples i-110_037.xad and i-110_095.xad. Bug: adplug#110

* Remove lookup tables and volume helper from class interface. Make the tables static local variables of xadplayer_update() and convert the helper method to a static function. All of them need to be moved to different places for that. Drop the rat prefix from their names, too. * Use memcmp instead of strncmp in xadplayer_load() and simplify a memeset call in xadplayer_rewind(). * Copy a whole row at once when copying data to the "track" array in xadplayer_load(). This gets rid of one loop nesting level and reduces the number of memcpy() calls. * Don't copy the current event in xadplayer_update(), but make event a reference to the array entry. Use new variable "pattern" to improve readability by avoiding nested array subscripts. * Eliminate variable old_order_pos from effects processing by reordering some code. * Whitespace and comment changes.

Lionel Debroux added a 3rd batch with 21 new fuzzing samples to issue adplug#110 yesterday. Hook them up in stresstest, too. That brings the total number of samples from this fuzzing round to 200. Most of the crashes caused by the new samples have already been fixed by previous commits. There are only two new samples still crashing, i-110_188.mtk and i-110_197.sa2. Bug: adplug#110 Reported-by: Lionel Debroux <lionel_debroux@yahoo.fr>

CxadflashPlayer doesn't check tune_size when it tries to access data, so it can read out of bounds when setting instruments in xadplayer_rewind() or when reading events during playback. Check that tune_size isn't below the minimum in xadplayer_load() and handle reaching the end of tune in xadplayer_update(). Ignore "set instrument" commands with bad instrument number and handle missing end marker in order list, too. This commit fixes OOB reads for the fuzzing samples i-110_070.xad and i-110_090.xad. Bug: adplug#110

* Remove lookup tables from class interface and mark them static for the file. * Convert flash_adlib_registers to a two-dimensional array to simplify indexing in the code. * Remove flash_notes_encoded which is a simple division (by 12) table (with some unused entries at the end). Calculate quotient and remainder on the fly. * Simplify control flow in xadplayer_update() a bit: Use continue statement in set instrument event branch to get rid of the big else block. Handle all effects in the switch statement; store frequency slide in a variable and apply it later to maintain correct oder; look up old channel frequency only when applying slide. * Declare a pointer to the order list to clarify accesses to it. * There is space for 128 instruments in the file, so change the return value of xadplayer_getinstruments() accordingly. * Whitespace and formatting changes.

If the order list doesn't contain an end marker, length doesn't get set at all. Fix that by using length as loop counter instead of i. While at it, also add a check for invalid pattern numbers in the order list and ensure at least one pattern has been loaded. This commit fixes OOB reads in rewind() for the following fuzzing samples: i-110_047.sng, i-110_066.sng, i-110_080.sng, i-110_079.sng. Bug: adplug#110

The number of patterns, order length, and restart position are read from the file and used without checking. Add a check that rejects the file if these values anr not in the expected range to avoid OOB accesses or other unwanted behavior later. Also the track data is read until the end of the file, possibly overflowing the tracks array. Add checks to stop reading when the allocated number of patterns is filled. These changes fix crashes in rewind() for the follwing fuzzing samples: i-110_049.sa2, i-110_050.sa2, i-110_067.sat, i-110_081.sa2, i-110_082.sa2, i-110_106.sat, i-110_134.sat, i-110_140.sat, i-110_142.sat, i-110_148.sa2, i-110_150.sa2, i-110_162.sa2, i-110_163.sa2, i-110_167.sa2, i-110_197.sa2, and a crash in load() for sample i-110_149.sa2. Bug: adplug#110

… gettitle()

If the order list contains an entry >= 64 it can lead to an OOB access of the tracks array during playback. Add a check in load() to prevent that (being more strict, enforce entries to be < nop). Note that a similar check for track orders isn't necessary since the number of allocated tracks is greater than the maximal value of a byte.

* Simplify instruments loading, get rid of local insts struct. * Fix spelling of "ARPEGGIO". * Clean up headers (remove duplicate, prefer c++ headers). * A few formatting changes.

Some of the fuzzing samples can be loaded by CmidPlayer, too. So fix the problems that occur in that case as well. Prevent accessing a lookup table with negative index and improve excessively long running tests on broken files that could require billions of update() calls to wait for the next event: * Ignore negative note values in midi_fm_playnote(). * Limit variable length numbers to allowed range. * Fix wait calculation, initital value wasn't big enough. * Limit track length to file size.

A position jump back to the current order entry doesn't set the songend flag. Fix the off-by-one error to avoid playing the pattern endlessly without ever reporting the end of the song. Also return always false in update() when speed is zero, as songend is set only when speed changes to zero, but not if it's already zero initially after rewind().

While uncompressing data in CmtkLoader::load(), memcpy() may be called with overlapping source and destination buffers as the byte count can be larger than the offset. The result of such a call is undefined, so copy byte for byte in a for loop instead. Fixes: adplug#110

Don't load the whole file into memory before decompression, read compressed data directly from the file. This simplifies the code a bit and gets rid of a dynamic memory allocation. Move a few bits around and add some gotos in the switch statement to reduce code duplication. Note that error checking is stricter than before. The size of the decompressed data must match header.size exactly now. It could be made more lenient if existing files need it.

* Remove patterns from struct mtkdata and split unused initial bytes from names. This simplifies size calculations and makes string copying clearer. (The type of patterns was wrong anyway.) * Replace some literal numbers with sizeof(...). * Don't use memset when setting the terminating NUL byte suffices. * Whitespace changes.

miller-alex · 2020-06-10T00:21:04Z

All problems I've seen with the test samples from issue 110 are fixed, with the following two exceptions.

A few files write out of range data to OPL registers. That's an old issue and not really a problem since the high bits are simply discarded, so I haven't looked into it.
Some protracker-based players allow setting tempo=0, and then getrefresh() returns 0. That's no problem for the tests, but other consumers may run into problems with an infinite update interval. (Sample i-110_010.a2m exhibits this issue.) Maybe this should be addressed, but I'm not sure what's the expected behavior (interestingly, in CmodPlayer::update(), command 18 explicitly checks for zero and sets tempo, command 15 sets tempo only for values >= 50, and command 7 doesn't check at all).

I didn't keep separate branches for test, fixes, and cleanup, but the commits will end up in the same master branch anyway. I don't think it makes sense to split them.

From my side that's it for the current batch of fixes. (I can add a commit for the tempo=0 issue if you tell me how that should be handled.) I know there are more problems in the code base, but those have to wait for another PR. @Malvineous I think you can merge this PR now unless you have some change requests.

Malvineous · 2020-06-10T01:16:41Z

Excellent work, thanks so much!

miller-alex added 16 commits May 15, 2020 18:47

a2m.{h,cpp}: Add destination length parameter to sixdepak::decode()

3856827

It's possible now to specify a length for the output buffer passed to sixdepak::decode() (and the sixdepak constructor). Change buffer lengths to size_t while at it. Allocated buffer sizes have not changed yet.

a2m.cpp: Rearrange sixdepak code to improve readability (no functiona…

d004d11

…l change)

miller-alex force-pushed the issue-110 branch from e3a0bac to 663bdc8 Compare May 19, 2020 17:16

miller-alex added 10 commits May 22, 2020 04:09

a2m.cpp: Clean up track conversion part of Ca2mLoader::load()

2efecb1

cff.cpp: Ignore invalid instrument numbers in track data

6ffb533

miller-alex added 2 commits May 24, 2020 20:19

cff.cpp: Don't crash in getinstrument() when instrument index is too big

4cabca6

a2m.cpp: Add check for invalid pattern numbers in order list

24a5143

miller-alex force-pushed the issue-110 branch from d17b1c8 to 32023f0 Compare May 29, 2020 22:04

miller-alex added 22 commits May 30, 2020 13:59

psi.cpp: Check for event pointer reaching the end of tune in xadplaye…

857bf15

…r_update()

psi.cpp: Use helper function to access data as 16 bit little endian w…

b3f2f8b

…ords

rat.cpp: Handle invalid pattern numbers in order list gracefully

428cf4f

sa2.{h,cpp}: Fix string constructor usage in getinstrument(); rewrite…

2e34dd5

… gettitle()

sa2.cpp: Simplify instruments loading and some other clean up

a781982

* Simplify instruments loading, get rid of local insts struct. * Fix spelling of "ARPEGGIO". * Clean up headers (remove duplicate, prefer c++ headers). * A few formatting changes.

miller-alex force-pushed the issue-110 branch from 1db9c6c to f1db24c Compare June 9, 2020 22:53

miller-alex marked this pull request as ready for review June 10, 2020 00:21

Malvineous merged commit 5498d47 into adplug:master Jun 10, 2020

miller-alex deleted the issue-110 branch November 17, 2020 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes for issue #110#113

Fixes for issue #110#113
Malvineous merged 62 commits into
adplug:masterfrom
miller-alex:issue-110

miller-alex commented May 19, 2020

Uh oh!

Malvineous commented May 24, 2020

Uh oh!

miller-alex commented Jun 10, 2020

Uh oh!

Malvineous commented Jun 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

miller-alex commented May 19, 2020

Uh oh!

Malvineous commented May 24, 2020

Uh oh!

miller-alex commented Jun 10, 2020

Uh oh!

Malvineous commented Jun 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants