Re: Optimal wildcard search algorithm

From: Andres Riancho (andres.riancho@gmail.com)
Date: Wed Nov 29 2006 - 16:28:32 EST


I have been thinking about this for some time now, and this is what I
came up with:

<pseudo-algorithm>
1) bruteforce N rounds ( where N is a number between 2 and 4, should
do some tests to confirm what is the best choice ). By bruteforce I
mean, search for a* , b* ... z* and keep only the ones that match (
this would be the first round ). Then, for the second round, aa*, ab*
....yz* and keep only the ones that match.

2) "elimination". Lets suppose that we only got one matching group
after the two rounds of bruteforcing, the group is 'ac*' . Now what i
propose is to search for ac*a* , ac*b*... ac*z*. This step will tell
you if there is a string matching ac* that has a letter a somewhere,
if it hasnt, you can safely eliminate that char in the next
iterations.

3) Using the result of step 2, create a subset of the letters and use
more rounds of bruteforcing and "elimination".

</pseudo-algorithm>

The key number here as you can see is N.
- Big N creates less groups -> elimination step is more successfull,
but bruteforcing step is slow.
- Small N creates more groups -> elimination step is less successfull,
but bruteforcing step is faster.

Note: My proposal of N between 2 and 4 is completely arbitrary :)

On 11/28/06, Tim <tim-pentest@sentinelchicken.org> wrote:
> > Do I understand correctly, that the wildcard test you describe only
> > gives an exists/doesn't exist answer? How many tests can you afford
> > per second?
>
> Yes, the attack would be somewhat blind, resulting in yes/no or
> yes/no/error responses. For instance, if this were a login form using
> LDAP, it might return success for a proper username/password
> combination, failure for a nonexistent username or bad password, and an
> error for a wildcarded username that does exist but can't be bound to,
> since there's no injection in the LDAP bind request.
>
> > What you can try is to analyse some sample username lists for
> > ways to equalise their distribution. For example, you wouldn't want to
> > start with a*, better jump directly to aa*, ab*, ..., az*
>
> You think starting with 2-character prefixes is actually faster? It
> doesn't seem that way to me. In the worst case for a single character
> test, you need to make 36+36*m, where m is the number of successful
> results from the first round.
>
> If you start with 2-character tuples, you'll always need to perform
> 36*36 tests to get to the same point in the search, which is certainly
> slower on average.
>
> > You might try taking samples from
> > http://www.galbithink.org/names/agnames.htm
> > http://blogs.parc.com/playon/archives/2006/06/naming_patterns.html
> > or similar, some username/password combo lists, etc.
> >
> > You can also test for suffixes (*a, *b, ..) hoping for a better
> > distribution.
>
> Indeed, I think using common names, or even just starting with the most
> probable characters would speed up the initial findings. This is very
> useful if your goal is simply to find one or a few usernames. However,
> if you want to find them all, it probably doesn't help much.
>
> Besides, the general algorithm I seek would be useable on other fields.
> Take for example, the following ldap injections:
>
> *)(userPassword=A*
> *)(userPassword=B*
> ...
>
> If an LDAP server isn't properly secured, this could allow one to
> brute-force all password hashes using just the blind responses.
> Password hashes wouldn't have predictable distribution. Similar
> situations may exist in SQL injections as well, though there's probably
> easier attacks in that situation.
>
>
> > If you're allowed single-char wildcards, you could do more interesting
> > searches - tests for certain username lengths being the most
> > important. You can also walk the search space based on the *second*
> > letter of the username _a%, _b% etc, which will (I guess) be more
> > equally distributed than the first letter.
>
> Indeed, with single-character wildcards, a lot of additional information
> can be divulged. In fact, one could determine the length range of all
> strings right off, which would make it easy to determine when we're
> "done". I didn't want to complicate the question by introducing these
> though.
>
> > I don't know if searching for substrings in the middle hoping they
> > would prune the search tree will be helpful - but you can analyze
> > those sample lists and see if particular patterns come up. If there is
> > a good set of 2 or 3 character strings that have near-zero frequencies
> > in the sample lists, that's the way to go.
>
> Yes, one of my burning questions is to figure out whether mid-string
> checks can be beneficial. Multi-character mid-string searches are
> likely a bad idea, for the same reasons I described above.
>
> While I appreciate the ideas you've suggested, I'm really looking for a
> more formal answer to the optimality question.
>
> thanks,
> tim
>
> ------------------------------------------------------------------------
> This List Sponsored by: Cenzic
>
> Need to secure your web apps?
> Cenzic Hailstorm finds vulnerabilities fast.
> Click the link to buy it, try it or download Hailstorm for FREE.
> http://www.cenzic.com/products_services/download_hailstorm.php?camp=701600000008bOW
> ------------------------------------------------------------------------
>
>

-- 
Andres Riancho
http://w3af.sourceforge.net/ Web App Attack and Audit Framework
http://www.securearg.net/ Secure from the source
------------------------------------------------------------------------
This List Sponsored by: Cenzic
Need to secure your web apps?
Cenzic Hailstorm finds vulnerabilities fast.
Click the link to buy it, try it or download Hailstorm for FREE.
http://www.cenzic.com/products_services/download_hailstorm.php?camp=701600000008bOW
------------------------------------------------------------------------


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:57:23 EDT