In this post I will share some methods and scripts that helped me set and analyze the results of password cracking experiments using John the Ripper (john). During my Masters, I built a password guesser that learns the linguistic patterns of passwords used as training data. To prove these patterns pose a significant threat, I set up a number of guessing experiments (offline attacks) comparing the performance of my guesser with that of other guessers. In such experiments, the guesses are piped to john, which encodes them using the same hashing function used to encode the target hashes and checks how many passwords match. I was mainly interested in generating charts depicting the number of passwords guessed as a function of the number of guesses tried. It sounds simple, but john wasn’t really built with this academic use case in mind, so I had to use a few tweaks and hacks to get the job done.
In the examples below, I will be using JtR’s community enhanced version (1.8.0-jumbo-1), which has a few extra valuable features. You’ll find the scripts used here to generate charts and analysis at https://github.com/rafaveguim/jtr-lab-tools.
Charting
Let’s use as target the Battlefield hashes, and as wordlist the RockYou passwords.
If you run john in the conventional way, it prints a status line only when a key
is pressed in your keyboard. The status line contains what you need to generate a chart
of the cracking session: # of guesses tried, # of passwords cracked, time, etc.
To enable automatic output of the status line every n-th guess, you need to invoke
the external module AutoStatus. External modules are custom functions called by john,
and can be defined in john.conf, the global configuration file.
Download my custom john.conf and edit the variable interval
in [List.External:AutoStatus]
:
With this set, call john passing “AutoStatus” as value of the parameter --external
:
./john --wordlist=rockyou.txt \
--rules \
--pot=rockyou_vs_bfield.pot \
--conf=john.conf \
--external=AutoStatus \
--format=Raw-MD5 \
bfield.hash 2> rockyou_vs_bfield.log
You get something like the following in output.log:
Press 'q' or Ctrl-C to abort, almost any other key for status
76260g 0:00:00:00 6.16% (ETA: 16:28:30) 143886g/s 1886Kp/s 1886Kc/s 366946MC/s cr4igh1ll..cprocks
88218g 0:00:00:00 12.71% (ETA: 16:28:30) 114568g/s 2597Kp/s 2597Kc/s 707745MC/s chelline..chellelynn
93495g 0:00:00:00 19.45% (ETA: 16:28:30) 96386g/s 3092Kp/s 3092Kc/s 1039GC/s vader125..vader02
96605g 0:00:00:01 26.67% (ETA: 16:28:33) 82568g/s 3418Kp/s 3418Kc/s 1368GC/s sharper3..sharpeiformulauno
99468g 0:00:00:01 33.84% (ETA: 16:28:32) 73138g/s 3676Kp/s 3676Kc/s 1694GC/s persian81..persian05
102037g 0:00:00:01 40.98% (ETA: 16:28:32) 65830g/s 3870Kp/s 3870Kc/s 2016GC/s mfaeez..mfac1640
104402g 0:00:00:01 48.17% (ETA: 16:28:32) 60001g/s 4022Kp/s 4022Kc/s 2337GC/s koukls..kouklaronaki86
106433g 0:00:00:01 55.18% (ETA: 16:28:31) 55433g/s 4166Kp/s 4166Kc/s 2655GC/s im14yrzold..i@m13inmay
109516g 0:00:00:02 62.45% (ETA: 16:28:33) 51903g/s 4265Kp/s 4265Kc/s 1485GC/s evazarate..evaylilian
112001g 0:00:00:02 69.61% (ETA: 16:28:32) 48696g/s 4347Kp/s 4347Kc/s 1642GC/s choclateice..CHOCLATE8
114602g 0:00:00:02 76.80% (ETA: 16:28:32) 46024g/s 4417Kp/s 4417Kc/s 1797GC/s army2id..army2528
116371g 0:00:00:02 83.67% (ETA: 16:28:32) 43748g/s 4511Kp/s 4511Kc/s 1951GC/s 6969kevin..6969it
118104g 0:00:00:02 90.19% (ETA: 16:28:32) 41585g/s 4577Kp/s 4577Kc/s 2104GC/s 2006slanger..2006SENIOR
119389g 0:00:00:03 97.61% (ETA: 16:28:33) 39796g/s 4666Kp/s 4666Kc/s 1504GC/s 066109573..066100732
119641g 0:00:00:03 DONE (2016-01-06 16:28) 38971g/s 4667Kp/s 4667Kc/s 1537GC/s ^^&\*()(..\`\`\`
Warning: passwords printed above might not be all those cracked
Use the "--show" option to display all of the cracked passwords reliably
Session completed
Other times you have, instead of a wordlist, a guess generator that will output as many guesses as possible; for instance, john’s incremental mode. In addition to printing status lines at regular intervals, in this case you also want to set a cap in the number of guesses of a john session. I use the AutoStatusAbort external:
Let’s test it:
./john --incremental \
--pot=inc_vs_bfield.pot \
--conf=john.conf \
--external=AutoStatusAbort \
--format=Raw-MD5 \
bfield.hash 2> inc_vs_bfield.log
The next step is to produce a nice chart of your attack’s results. Let’s compare the two attacks we just ran:
python chart_logs.py -t 423623 -l 4 rockyou_vs_bfield.log inc_vs_bfield.log
Run python chart_logs --help
to see all parameters.
Automation and Reproducibility
Once the number of guessing experiments starts to grow, it’s challenging to keep a record of experiment variations and results. I keep things organized by encapsulating each experiment in a shell script accompanied by a john.conf file. My shell scripts follow the template below:
Note that, for the sake of reproducibility, it’s important that each experiment have its own john.conf file. In addition, the parameter –pot (only available in the Jumbo version) tells john to use a local, session-exclusive .pot file, which stores the plain-text cracked passwords and is used by john to “remember” which passwords have been cracked. By default, john uses a global .pot file persisted across sessions. If you’re using the main version, which doesn’t have –pot, add a line to remove the global .pot file before each session.
Analyzing Password Composition
Finally, after the performance of each guessing attack has been compared, I
turn my attention to the nature of the passwords cracked by each attack.
For this purpose, I’ve written the script composition.py
. It draws a chart
comparing the number of passwords cracked by each attack that match certain
composition requirements. These requirements are often used by websites to ensure
the creation of strong passwords, such as minimum length, and minimum number of
capital letters, numbers, and special characters. So the following analysis would
answer questions like: “If my website enforced password
creation policy X, how many passwords would be cracked under attack Y?”.
The input for composition.py is a list of .pot files and a list of password policies, each of which describing a minimum requirement. A policy is described as a simplified string pattern; for instance:
- ?????? - matches any password with 6 or more characters
- ??????123 - matches any password with at least 9 characters, at lest 3 of which must be numbers
- aaaaaa - any password with 6 or more lowercase alphabetic character
- ??11%%EE - minimum length 8, with two lowercase, two capital letters and two symbols
So, in the pattern, ? is a wildcard, % matches a symbol, and any digit, uppercase, or lowercase letter matches any digit, uppercase and lowercase letter, respectively. Let’s look into the passwords cracked by both attacks:
python composition.py \
rockyou_vs_bfield.pot inc_vs_bfield.pot \
-p abcdef 123456 aaaaa1 ????1% A?????
In this chart, we can see that, for all four kinds of passwords examined, the (mangled) RockYou guesses were more effective. The difference was a bit tighter for passwords with 6+ digits (e.g., 123456). Note that these are not counts for exact matches, as in a regular expression. A pattern here defines the minimum requirement for a password to be accepted, so “john111111” matches 123456, ??, aaaa, etc.
Final Thoughts
If you need more information, check Matt Weir’s blog and subscribe to john’s users mailing list, the folks there were extremely helpful when I needed. Drop me a line if you think I can help in any way.