Re: Optimal wildcard search algorithm

From: Sir Mordred (sir.mordred.lists@gmail.com)
Date: Tue Nov 28 2006 - 04:43:06 EST


Hello Tim,

Do I understand correctly, that the wildcard test you describe only
gives an exists/doesn't exist answer? How many tests can you afford
per second?

> One approach to finding all usernames would be a kind of breadth-first
> search based on the character set and character position. Given a
> character set of [a-z0-9], we could try the following:

> a*
> b*
> c*
> ...

> and determine which characters exist in the first position. From there,
> each second letter would be tried for each successfully identified first
> letter, and so on.

What you can try is to analyse some sample username lists for
ways to equalise their distribution. For example, you wouldn't want to
start with a*, better jump directly to aa*, ab*, ..., az*

You might try taking samples from
http://www.galbithink.org/names/agnames.htm
http://blogs.parc.com/playon/archives/2006/06/naming_patterns.html
or similar, some username/password combo lists, etc.

You can also test for suffixes (*a, *b, ..) hoping for a better
distribution.

> Let us assume for now that the only wildcard character is the '*' or '%'
> kind (and not the single-character kind), and that the string we're
> searching for could be of any length.

If you're allowed single-char wildcards, you could do more interesting
searches - tests for certain username lengths being the most
important. You can also walk the search space based on the *second*
letter of the username _a%, _b% etc, which will (I guess) be more
equally distributed than the first letter.

I don't know if searching for substrings in the middle hoping they
would prune the search tree will be helpful - but you can analyze
those sample lists and see if particular patterns come up. If there is
a good set of 2 or 3 character strings that have near-zero frequencies
in the sample lists, that's the way to go.

Mordred

------------------------------------------------------------------------
This List Sponsored by: Cenzic

Need to secure your web apps?
Cenzic Hailstorm finds vulnerabilities fast.
Click the link to buy it, try it or download Hailstorm for FREE.
http://www.cenzic.com/products_services/download_hailstorm.php?camp=701600000008bOW
------------------------------------------------------------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:57:23 EDT