Re: combinations of 4

From: Valdis.Kletnieks@vt.edu
Date: Tue Apr 09 2002 - 00:10:57 EDT


On Mon, 08 Apr 2002 23:08:02 EDT, Michael Greenberg said:

> > 1) "do not contain null" - that's *different* than "A..Z". There's a
> > *lot* of instructions that contain non-printable non-null characters.
> > Most of them, in fact. For a 32-bit instruction, there are 2^32 (or
> > 4,294,967,296, of which 4,228,250,625 (or 255^4) do *not* contain nulls.
> > Only 66,716,671 (or about 1.5%) of all possible 32-bit instructions *DO*
> > contain a null.
>
> While your other comments are insightful (and incredibly interesting),
> I believe KF intended to simply insert all of these generated sequences
> into a call to __asm__() or a similar function and then 'sort the rest
> out in gdb', that is, see what calls were valid. While your method
> will get you better results, isn't a kludge, and will probably work
> while his will not, that's not quite what he was asking.

Well.. Hmm.. 4 billion or so possibilities, at 4 bytes a pop - that's
16G for the .o file - and the .c/.s will be bigger (all those "__asm__()"
chew it up - plus any dorking around you need to get gcc, xlc, /bin/as
or whatever to *create* a 16G .o file (hint - most Power chips are 32-bit ;)

I hope you have a *big* /tmp and lots of time... ;)

But you already *know* that of the 4G entries, only 66M or so are
"interesting" in that they contain a null - so turning it around and
making a list of "everything that DOES have a null" and saying "If
it's not in this list, it doesn't have a null" is a *lot* faster....

What *might* be interesting is generating *just those 66M* bit
patterns, and getting gdb to disassemble them, so you have a list of
66M instructions to avoid... and that's almost plausible to do.

for (a=0;a<256;a++)
  for (b=0;b<256;b++)
    for (c=0;c<256;c++)
      print "\0$a$b$c $a\0$b$c $a$b\0$c $a$b$c\0";

Yes, this generates some duplicates. Deal with it - /bin/uniq ;)

My analysis of instruction formats took me all of 15 minutes with 2
IBM AIX manuals - and hopefully produced a lot more insight into the
problem than simply brute-forcing all 4 billion possibilities For
instance, all the 'X' format instructions have primary opcode 31 or 63
- and the ones with 63 tend be the floating-point instructions while
31 tends to be integer and memory references. (Hmm.. I smell a
5-input AND gate here ;)

4 billion instructions - or 11 instruction formats (of which 3 have only
one instruction of that format), and for each format it's pretty easy
to figure it out. Work smarter, not harder. ;)

-- 
				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech




This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:28:03 EDT