Sunday, January 25, 2015

Managing and optimizing lists of password masks

I've been working on some password-cracking research on the side. I thought I'd come up with a cool new idea, but it turns out that someone else already thought of it.

It occurred to me last night that a big list of passwords could be abstracted out into their equivalent masks, and then a frequency count of those masks could be generated, which could then be exhausted in frequency order.

First, I extracted a frequency count of character set combinations (masks) from all eight-characters-longthe RockYou breach's password list, yielding a list of the form:

100:hundredofthese
95: 95ofthese
[...]
2:justtwoofthese
1:onlyoneofthese
1:alsoonlyoneofthese

... as follows:

#!/bin/bash

echo "- Getting frequency of character patterns from RockYou ..."
time gunzip -cd rockyou.txt.gz \
        | tr '[:lower:]' 'l' \
        | tr '[:upper:]' 'u' \
        | tr '[:digit:]' 'd' \
        | tr "[\ !\"#$%amp;&\'()*+,-./:;<=>?@\[\\\]^_\`{|}~]" 's' \
        | sed 's/[^luds]/a/g' \
        | strings \
        | cut -b1-8 \
        | freqcount \
        > rockyou.freq.8a
wc -l rockyou.freq.8a
head rockyou.freq.8a

echo "- Generate masks."
echo "- Ignoring all masks with more than three consecutive 'a' charset."
time cat rockyou.freq.8a \
        | cut -d\: -f2 \
        | sed 's/l/?l/g;s/u/?u/g;s/d/?d/g;s/s/?s/g;s/a/?a/g' \
        | egrep -v 'aaaa' \
        > rockyou.masks.8
wc -l rockyou.masks.8
head rockyou.masks.8

echo "- Done."
#end of script

Next, I wrote a script to exhaust each one in order by frequency using hashcat:

#!/bin/bash

for mymask in `rockyou.masks.8`; do
        echo "- Running mask: $mymask ..."
        cudaHashcat64.bin -a 3 -m 1500 \
                target-hashes.list \
                $mymask
        echo "$mymask: done - `date`" >> $0.log
done
#end of script

Then it occurred to me that if someone else had published this info, and had used real corpora of passwords as the input, then our frequency lists would probably look similar. So I did the following Google search:

"?l?l?l?d?d?d?d" "?l?l?l?l?l?d?d?d"

... and the first hit was the KoreLogic blog post.

Dangit! :-) But at least I'm catching up to the state of the art; the KoreLogic article was published in April 2014. :-)

I got the idea from work I had done on some license-plate-collecting stuff I do on the side. I thought of it for capturing high-level patterns in serials, so that people can search for a plate based on the serial. A plate with "BDT 606" on it would match any plate whose serial "mask" is "AAA 999" using my notation. (I then match more closely, but it's used for a high-level search first).

I haven't watched the KoreLogic presentation yet, but I can definitely improve upon my own approach, because I'm being overly aggressive in turning then entire set of non-alphanumeric-but-printable characters into 's':

        | tr "[\ !\"#$%&\'()*+,-./:;<=>?@\[\\\]^_\`{|}~]" 's' \

... when most folks use the simple ones (#$%@, etc.) I could create a custom charset for this using the notation as noted here ... and then turn the remaining characters into another custom charset that is the remaining characters.

I then found PACK - the Password Analysis and Cracking Kit, which is is a set of Python scripts to manage masks, including optimizing a set of masks based on a given timeframe (or, "I have 24 hours. Which masks should I use to maximize how many passwords I can crack?")

blog comments powered by Disqus