Showing posts with label soapboxes. Show all posts
Showing posts with label soapboxes. Show all posts

Tuesday, October 27, 2015

Hash filtering - an appeal to more than vanity

tl;dr: The major password-cracking projects should add support for general hash substring search (vanity hashes, partial collisions, etc.).

What is a vanity hash?

A vanity hash is a hash that contains an interesting substring. For example, my actual Bitcoin address is:

1AzVVHTijMGoWzPtUJoWkF5u6vyS1NcTzj

But it would be cooler if it was this:

1RoycehyNKfRC4nN5jKruZpjMKwNviQRb

That's a vanity hash. Finding one requires a brute-force search. The more characters you want, the harder it gets -- exponentially harder. But even just a few characters can be good enough.

Current examples of vanity hashes

We humans are vain creatures -- so when hashes are used as public identifiers, we want to personalize them:

More than vanity

Searching for a vanity hash is really just an attempt to find hashes with certain content. Other efforts seek hashes with specific content for other reasons - hash collisions, partial collisions, etc. More generally, these are searches for hash substrings.

Why hash substring search is interesting

Finding substrings in hashes isn't just about vanity. Bringing hash substring search into the cracking platforms would enable other interesting work:

  • Experimenting with collisions and partial collisions. This is a potential attack surface that is not adequately publicly explored. Nation-states are almost certainly capable of locating near-collisions for some hashes. When you check a download hash, how often is that check only a cursory visual check? True collisions for slow hashes are nigh-infeasible, of course -- but near-collisions may be interesting.
  • Demonstrating weaknesses. For example, short GPG key IDs can have collisions.
  • Filtering for hashes with broad properties (instead of just substrings). Hashes that contain only digits, or only letters, or letters and digits alternating, or only lower case (for hashes that distinguish case). Perhaps only a curiosity today, but they might be interesting in the future.
  • Automatic/loose discovery of similar strings. This would be a nice-to-have at some point, but searching by edit distance would complement substring searching nicely. A minimum or maximum edit distance could be specified to find strings that are "close enough". One of the most common measurements of edit distance is Levenshtein distance, which has been implemented in most languages, and implemented efficiently on GPU.
  • Working with non-contiguous substrings. Most vanity-hashing tools won't let you search for a hash that begins with "[aA][bB][lL][eE]" and ends with "[eE][lL][bB][aA]". But password crackers' mask syntax is perfect for this.
  • Unforeseeable innovation. Once this capability is available across all hash types, people will tinker with it, revealing new use cases. I've seen some CTFs that involve searching for specific substrings in hashes. I think they're on to something.

Calling them "vanity hashes" doesn't do justice to the potential. I think that the activity of hash substring search -- partial collisions, vanity hashes, etc. -- needs a better name.

Let's call it hash filtering.

Why the password crackers are the best place to do hash filtering

  • It's efficient. No one is better positioned to implement hash filtering than the password crackers -- and doing it anywhere else is a waste of resources. Each standalone vanity-search implementation reinvents the hash bruteforcing wheel - usually poorly: This rarely approaches the speeds that JtR and hashcat are capable of (though there are exceptions). The Bitcoin folks have an edge today because of FPGAs, but I expect this gap to close.
  • It's high leverage. If implemented as a general framework within password crackers, all current and future hash types automatically inherit substring search capability, with little additional effort.

How the password crackers might respond

  • Sound great! That would be awesome, but not likely to be the first reaction. So ...
  • OK -- but it should be done as a separate executable. This is a waste of time. The best features of professional-grade password crackers -- core brute-force power, Markov, forking/parallelism, session management, masks, device selection, wordlist management, etc. -- would either be unnecessarily duplicated, left out, or suffer from bit rot.
  • No. It will slow down cracking too much. Not necessarily. Hash filtering should be implemented as an early pre-optimization step, so that you only have to take the inevitable speed hit if you want hash filtering. Regular cracking would be unaffected.
  • No. We're not interested in vanity hashing as a concept. That would be a mistake. Vanity hashing is already driving interest, contribution -- and even innovation (like entirely new S-boxes).
  • No. It might attract noobs. Unavoidable -- but this should be offset by an influx of insight and talent. Also, we have to feed a certain number of noobs to Xanadrel, the Sarlacc of #hashcat. ;)
  • No. It adds complexity. Technically true, but it should be a largely one-time cost. Also, I think that implementing Levenshtein distance on GPU would be kinda fun if I was atom or Sayantan. ;)
  • No. Hash filtering itself will be too slow. By its nature, it will, indeed, be much slower than password cracking. But if it can be pre-optimized (see previous point), then it can take advantage of the rest of the cracking support structure, so that hash filtering will be faster, more portable, and easier to use than it would be anywhere else.

Implementation suggestions

  • Allow the user to specify a "hash filter", using existing mask syntax. This would automatically enable specifying one or more desired substrings anywhere in the hash -- beginning, end, or middle -- as well as both non-contiguous strings and loose matching by character set.
  • On the command line, consider long options with names like --hash-filter [mask] and --edit-distance [integer]. Or name it something else -- but keep it consistent across projects.
  • Implement in the pre-optimization pass, so that the slowdown introduced by examining each hash will only kick in when hash filtering is explicitly requested.
  • Add simple sanity checking. If the user supplies a mask that is longer than the target hash, warn rather than silently truncating. Also, warn if the user wants to filter for a mask that isn't possible (for example, if their descrypt filter's last character is not within [.26AEIMQUYcgkosw]).

Potential bounty?

If this proposal is not obviously compelling, I will consider setting up a bounty (or charitable donation of the winner's choosing). The bounty would go to each major natively Linux-based project (John the Ripper or hashcat) that incorporates hash filtering. Edit distance would be optional.

If you want to go in on a bounty, contact me and I'll do the Bountysource setup.

The ask

Bring vanity hashing and its kin into the fold -- generalize it into hash filtering that can be applied to all current and future hashes. I think that the results will bear interesting fruit.

Updated: to use a valid example Bitcoin vanity.

Wednesday, March 18, 2009

Pidgin episode #8726

... in which I argue that frecency, while good for digging into a list of URLs, is not a good metaphor for presenting IM status.

Pidgin Trac #8726 (new enhancement): User selection of saved statuses for the quick/popular list

In a nutshell: having a predictable, configurable list of saved IM statuses is better than an algorithm that tries to guess based on what your most-often-and-recently-used ones are.

I also am insufferably pleased with myself for coining the phrase power user limbo: a state in which a user has graduated from novice in a particular portion of the interface, but is held back from transitioning to full power usage by a mismatch between the presented and implied interface design metaphors.
Populating the list using frecency data without allowing the power user to control the content leaves them in a kind of 'power user limbo': the user wants to use saved statuses for the purpose that they were designed for (ease of changing statuses), but they are subtly constrained in a way that contradicts that purpose.
Given how rare original thoughts are ("Failure, Mr. Jones, is hardly original!" - Bloom County), there's probably a much better phrase for this concept. But I like the visual that accompanies power user limbo, so I'm stickin' with it.

You can go back to whatever you were doing now.

Wednesday, January 09, 2008

Fighting the inevitable absorption of double vowels

Though amoeba has clearly lost to ameba, and anaesthesia is in a similar state, and hardly anyone on this side of the Atlantic spells it encyclopaedia anymore ...

They'll have to pry the second 'a' in 'archaeology' out of my cold, dead hands. I've seen people trying it. They should stop. Really.

The day that Archaeology Magazine drops it, I will. Not a moment before.

Sunday, July 29, 2007

Why I Don't Shop at Schuck's

Schucks seems to have some long-term contract with my local newspaper to wrap an ad around the comics. Every single Sunday, before I can get my daily dose of Dilbert, I have to put down my spoon and remove it. It's the analog equivalent of a pop-up ad.


Do not get between me and my comics. Those mad mama moose got nothin' on me.


Update 2007-09-23: If you can't see the image, it's because my innate foresight has enabled me to select a filename that raises the hackles of most ad blockers, because it has the (quite descriptive) string "-ad-" in it. Since I haven't the faintest idea how to change an image's filename yet, please temporarily turn off your ad blocker to see it until I grow some clue.

Tuesday, June 21, 2005

Internet Groping Considered Harmful

I love it when two of my interests collide. In this case, it's three of them: computing, etymology ... and righteous indignation.

Computer word origins fascinate me. Usually, ferreting out the various etymologies is a positive experience -- like when I found out that DEL is ASCII 127 because punching out all of the bit positions is the only "final" state of each character, and therefore the only way to delete. On more than one occasion, I've gotten lost in Eric S. Raymond's Jargon File for hours, digging up the stories behind words like kludge, crash, wedged and GECOS. This particular quest, though, sent me on a side trip that "... has as much sex appeal as a road accident." (with apologies to Douglas Adams)

Multiple experts on networking -- even Cisco -- claim that ping got its name as an acronym for "Packet InterNet Groper." This always sounded fishy to me: a little hard to say, a little cheesy ... just not very appealing.

It turns out that ping was actually named after the sound of sonar by its late author, Michael Muuss. The "Packet InterNet Groper" acronym is actually a backronym that was probably coined by David Mills.

Michael's page about ping is a fascinating read (and you can even get the original source code) ... and the real explanation is sooooo much more elegant than all of this pawing around in the dark. Every time I see some web site touting the Groping Theory, I long for megaphone the size of New Jersey into which I would yell "COME OOONNNN, PEOPLE!!!" (I may be channeling for Lynne Truss just a tad).

So spread the word! Ask your geek friends what "ping" stands for, and gently steer them in the right direction. Michael's own description says it best:

I named it after the sound that a sonar makes, inspired by the whole principle of echo-location. In college I'd done a lot of modeling of sonar and radar systems, so the "Cyberspace" analogy seemed very apt. It's exactly the same paradigm applied to a new problem domain: ping uses timed IP/ICMP ECHO_REQUEST and ECHO_REPLY packets to probe the "distance" to the target machine.

Thank you, Michael! I don't know if you were ever a gamer, but wherever you are now, I hope that you're always guaranteed the lowest ping times at St. Peter's LAN parties.