Re: Optimal wildcard search algorithm

From: Tim (tim-pentest@sentinelchicken.org)
Date: Tue Nov 28 2006 - 09:02:37 EST


> You'd get around your storage problems by using a depth-first search, I
> believe. But at the same time, I don't think it's the problem you think
> it is.
>
> Consider the following algorithm:
>
> function recursiveInject(string S)
> For each character C in [a-z0-9] DO
> inject 'SC'
> if match then store SC // i.e., a username exists that is only the
> value of SC
> inject 'SC%' (for SQL)
> if match then recursiveInject (SC)
> Next C
>
> Your storage usage is the number of total usernames times the average
> length of each username. And, since you only search strings that
> indicate matches, you are spending at most (size of alphabet) * (number
> of usernames) * (average length of usernames) on queries that won't go
> anywhere, which is significantly better than traditional brute-forcing,
> where you expect to waste (size of alphabet) ^ (max username length) -
> (number of usernames) queries. I think if what you've described is the
> extent of your injection exploit's power (i.e., it only indicates
> whether a username matches a given string, which may or may not include
> wildcards), then you can't expect to get any better than this.

I guess you're right... since the number of usernames is probably not
incredibly large, the storage wouldn't be that bad in any real-world
system.

However, might other approaches be faster than a simple,
single-character depth or breadth first attack? For instance, with a
relatively small number of users, narrowing the alphabet first may speed
up subsequent searches significantly. Suppose we try the following:

*a*
*b*
...
*9*

and we determine that just a single letter can be eliminated from the
alphabet. How many users must be in the system, with usernames of say,
8 characters in length, in order for this to be an overall benefit? Of
course, mid-way through a search, one could check the character set on a
remaining subset of users with queries like:

jo*a*
jo*b*
...

I imagine there's some break point where the up-front cost of checking
the whole alphabet is beneficial if there's enough users AND there's at
least one letter that can be eliminated. On the other hand, more users
means there's a greater likelyhood that all characters are used at least
once, which makes it pointless.

Any thoughts on how one would approach this question? One reason I am
so concerned with optimality, is that I often start off by manually
brute forcing apps like this, and I'd like to minimize the guessing if
possible.

thanks for the feedback,
tim

------------------------------------------------------------------------
This List Sponsored by: Cenzic

Need to secure your web apps?
Cenzic Hailstorm finds vulnerabilities fast.
Click the link to buy it, try it or download Hailstorm for FREE.
http://www.cenzic.com/products_services/download_hailstorm.php?camp=701600000008bOW
------------------------------------------------------------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:57:23 EDT