tl;dr: The major password-cracking projects should add support for general hash substring search (vanity hashes, partial collisions, etc.).
What is a vanity hash?
A vanity hash is a hash that contains an interesting substring. For example, my actual Bitcoin address is:
1AzVVHTijMGoWzPtUJoWkF5u6vyS1NcTzj
But it would be cooler if it was this:
1RoycehyNKfRC4nN5jKruZpjMKwNviQRb
That's a vanity hash. Finding one requires a brute-force search. The more characters you want, the harder it gets -- exponentially harder. But even just a few characters can be good enough.
Current examples of vanity hashes
We humans are vain creatures -- so when hashes are used as public identifiers, we want to personalize them:
More than vanity
Searching for a vanity hash is really just an attempt to find hashes with certain content. Other efforts seek hashes with specific content for other reasons - hash collisions, partial collisions, etc. More generally, these are searches for hash substrings.
Why hash substring search is interesting
Finding substrings in hashes isn't just about vanity. Bringing hash substring search into the cracking platforms would enable other interesting work:
- Experimenting with collisions and partial collisions. This is a potential attack surface that is not adequately publicly explored. Nation-states are almost certainly capable of locating near-collisions for some hashes. When you check a download hash, how often is that check only a cursory visual check? True collisions for slow hashes are nigh-infeasible, of course -- but near-collisions may be interesting.
- Demonstrating weaknesses. For example, short GPG key IDs can have collisions.
- Filtering for hashes with broad properties (instead of just substrings). Hashes that contain only digits, or only letters, or letters and digits alternating, or only lower case (for hashes that distinguish case). Perhaps only a curiosity today, but they might be interesting in the future.
- Automatic/loose discovery of similar strings. This would be a nice-to-have at some point, but searching by edit distance would complement substring searching nicely. A minimum or maximum edit distance could be specified to find strings that are "close enough". One of the most common measurements of edit distance is Levenshtein distance, which has been implemented in most languages, and implemented efficiently on GPU.
- Working with non-contiguous substrings. Most vanity-hashing tools won't let you search for a hash that begins with "[aA][bB][lL][eE]" and ends with "[eE][lL][bB][aA]". But password crackers' mask syntax is perfect for this.
- Unforeseeable innovation. Once this capability is available across all hash types, people will tinker with it, revealing new use cases. I've seen some CTFs that involve searching for specific substrings in hashes. I think they're on to something.
Calling them "vanity hashes" doesn't do justice to the potential. I think that the activity of hash substring search -- partial collisions, vanity hashes, etc. -- needs a better name.
Let's call it hash filtering.
Why the password crackers are the best place to do hash filtering
- It's efficient. No one is better positioned to implement hash filtering than the password crackers -- and doing it anywhere else is a waste of resources. Each standalone vanity-search implementation reinvents the hash bruteforcing wheel - usually poorly: This rarely approaches the speeds that JtR and hashcat are capable of (though there are exceptions). The Bitcoin folks have an edge today because of FPGAs, but I expect this gap to close.
- It's high leverage. If implemented as a general framework within password crackers, all current and future hash types automatically inherit substring search capability, with little additional effort.
How the password crackers might respond
- Sound great! That would be awesome, but not likely to be the first reaction. So ...
- OK -- but it should be done as a separate executable. This is a waste of time. The best features of professional-grade password crackers -- core brute-force power, Markov, forking/parallelism, session management, masks, device selection, wordlist management, etc. -- would either be unnecessarily duplicated, left out, or suffer from bit rot.
- No. It will slow down cracking too much. Not necessarily. Hash filtering should be implemented as an early pre-optimization step, so that you only have to take the inevitable speed hit if you want hash filtering. Regular cracking would be unaffected.
- No. We're not interested in vanity hashing as a concept. That would be a mistake. Vanity hashing is already driving interest, contribution -- and even innovation (like entirely new S-boxes).
- No. It might attract noobs. Unavoidable -- but this should be offset by an influx of insight and talent. Also, we have to feed a certain number of noobs to Xanadrel, the Sarlacc of #hashcat. ;)
- No. It adds complexity. Technically true, but it should be a largely one-time cost. Also, I think that implementing Levenshtein distance on GPU would be kinda fun if I was atom or Sayantan. ;)
- No. Hash filtering itself will be too slow. By its nature, it will, indeed, be much slower than password cracking. But if it can be pre-optimized (see previous point), then it can take advantage of the rest of the cracking support structure, so that hash filtering will be faster, more portable, and easier to use than it would be anywhere else.
Implementation suggestions
- Allow the user to specify a "hash filter", using existing mask syntax. This would automatically enable specifying one or more desired substrings anywhere in the hash -- beginning, end, or middle -- as well as both non-contiguous strings and loose matching by character set.
- On the command line, consider long options with names like --hash-filter [mask] and --edit-distance [integer]. Or name it something else -- but keep it consistent across projects.
- Implement in the pre-optimization pass, so that the slowdown introduced by examining each hash will only kick in when hash filtering is explicitly requested.
- Add simple sanity checking. If the user supplies a mask that is longer than the target hash, warn rather than silently truncating. Also, warn if the user wants to filter for a mask that isn't possible (for example, if their descrypt filter's last character is not within [.26AEIMQUYcgkosw]).
Potential bounty?
If this proposal is not obviously compelling, I will consider setting up a bounty (or charitable donation of the winner's choosing). The bounty would go to each major natively Linux-based project (John the Ripper or hashcat) that incorporates hash filtering. Edit distance would be optional.
If you want to go in on a bounty, contact me and I'll do the Bountysource setup.
The ask
Bring vanity hashing and its kin into the fold -- generalize it into hash filtering that can be applied to all current and future hashes. I think that the results will bear interesting fruit.
Updated: to use a valid example Bitcoin vanity.