Hacking Linux Exposed

Cracking an algorithm bit by bit conclusion.
By Bri Hatch.

Summary: We complete our reverse engineering of a terribly-lame encryption algorithm.

Last week we stripped off the outer layer of our home-grown crypto algorithm through some intuition. The data was made pure ASCII using a modified uuencode system.

(For those of you keeping score, David Nash was the winner of the contest, having written a perl script earily similar to the one at the end of this newsletter. One copy of HLEv2 is coming his way.)

Again, here are the sample strings you had to work with:

  # String 1
  !!@!1P!=P!?P!=P!?`!>`!<0!?0!;0!?0!=@!B`!,`!>@!A0!,P!>@!B@!A`

  # String 2
  !T`!M0!TP!V0!X0!Y0!C@!P0!Y0!W0!UP!Y@!H@

  # String 3
  !8@!>@!E`!EP!H`!GP!I0!GP!60!A@!I`!J@!L@!M@!7P!A0!N0!L@!L@!MP!J@!J@

  # String 4
  !^`!P0![0![0!IP!]0!H@!Z`!Y0!^0!I@!``![0!]0!]@!^@!`P!K0!`0!_0!_P!"`!P`

  # String 5
  !S@!O0!W`!SP!BP!V0!X@!X@!XP!D`!G@!D@!YP!V0![0!Z@!EP!X0![`!F@!W0!X0!\`!\@!K0

Using the following perl script, you can reformat and uudecode the output to it's original binary state:


  prompt$ cat decrypt.pl
  #!/usr/bin/perl
  use strict;

  my $encrypted =
     '!!@!1P!=P!?P!=P!?`!>`!<0!?0!;0!?0!=@!B`!,`!>@!A0!,P!>@!B@!A`';

  # Add the backticks again
  $encrypted =~ s/(...)/\1``/g;

  print unpack( "u", $encrypted );

  prompt$ perl decrypt.pl
  Gww|xq}m}v_0z_3z__b

(Note that, since some letters will be outside the printable range, I'll show them as underscores for the rest of this article.)

Using this first string, we have some printable characters as our output, but other encrypted strings yielded very ugly results, outside the printable ASCII range. So instead, let's see the actual numerical values of those characters:


  prompt$ cat decrypt.pl
  #/usr/bin/perl
  use strict;

  my $encrypted =
     '!!@!1P!=P!?P!=P!?`!>`!<0!?0!;0!?0!=@!B`!,`!>@!A0!,P!>@!B@!A`';

  # Add the backticks again
  $encrypted =~ s/(...)/\1``/g;

  my $uudecoded = unpack( "u", $encrypted);

  # Loop through the letters one by one
  for my $char ( split //, $uudecoded ) {
          print ord($char), " ";
  }
  print "\n";

  prompt$ perl decrypt.pl
  6 71 119 127 119 124 120 113 125 109 125 118 136 48 122 133 51 122 138 132

Hmmn. Now most of those numbers seem to come from a very small range, 109 -> 138, and the difference (29) is pretty close to the size of the English alphabet (26). What if they are all shifted over from the actual letters? Let's see what happens if we ASCII-ify our encrypted string, and try to bring the letters back into a proper range:


  prompt$ cat decrypt.pl
  #!/usr/bin/perl
  ...
  # Loop through the letters one by one
  for ( my $offset=0; $offset < 50; $offset++ ) {
          for my $char ( split //, $uudecoded ) {
                  print chr( ord($char) - $offset ), " ";
          }
          print "\n";
  }

  prompt$ perl decrypt.pl
  _ G w _ w | x q } m } v _ 0 z _ 3 z _ _
  _ F v ~ v { w p | l | u _ / y _ 2 y _ _
  _ E u } u z v o { k { t _ . x _ 1 x _ _
  _ D t | t y u n z j z s _ - w _ 0 w _ _
  _ C s { s x t m y i y r _ , v _ / v _ _
  _ B r z r w s l x h x q _ + u _ . u _
  _ A q y q v r k w g w p _ * t _ _ t _ ~
  ...

Well, this certainly looks more like normal letters here, but it's definitely not the answer. Note how the beginning begins with a capital, which could make sense.

Then I tried to figure out if it was a substitution cipher from this point. The garbage characters toward the end, with two printable characters in the middle may be spaces. If that were the case, then the two letter word might be something like "an", "if", or "is". Unfortunately, my attempts to come up with a direct substitution cipher didn't get anywhere.

When comparing the output of this decryption stage against the other strings I had to work with, I noticed that subsequent characters tended to get larger on the whole. Perhaps there was a bit more manipulation going on here. So I tweaked my loop again, adding a $count variable which would be subtracted from each ASCII value, with $count growing by one each time:

  prompt$ cat decrypt.pl
  #!/usr/bin/perl
  ...
  # Loop through the letters one by one
  for ( my $offset=0; $offset < 50; $offset++ ) {
	  my $count=0;
          for my $char ( split //, $uudecoded ) {
                  print chr( ord($char) - $offset - $count), " ";
		  $count++;
          }
          print "\n";
  }

  prompt$ perl decrypt.pl
  _ F u | s w r j u d s k | # l v # i x q
  _ E t { r v q i t c r j { " k u " h w p
  _ D s z q u p h s b q i z ! j t ! g v o
  _ C r y p t o g r a p h y   i s   f u n
  _ B q x o s n f q ` o g x  h r  e t m
  _ A p w n r m e p _ n f w  g q  d s l
  _ @ o v m q l d o ^ m e v  f p  c r k
  ...

Pay dirt! Look at the fourth line, "_Cryptography is fun" -- seems like we've figured out the encoding. This occurred when $offset was set to 3. So it would seem that the algorithm to encrypt was something like this:

Take the ASCII value of the character to be 'encrypted'.
Add 3 ($offset)
Add the position of this character ($count)
Uuencode
Strip the last two characters of the uuencoded value (``)

I ran the same program, this time with the next encrypted string as input, and got the following:

  # Using string 2 this time
  prompt$ perl decrypt.pl
  Ð ´ Ñ Ö Ý à _ º Ý Ô Í Û _
  Ï ³ Ð Õ Ü ß _ ¹ Ü Ó Ì Ú _
  Î ² Ï Ô Û Þ _ ¸ Û Ò Ë Ù _
  Í ± Î Ó Ú Ý _ · Ú Ñ Ê Ø _
  Ì ° Í Ò Ù Ü _ ¶ Ù Ð É × _
  Ë ¯ Ì Ñ Ø Û _ µ Ø Ï È Ö _
  Ê ® Ë Ð × Ú _ ´ × Î Ç Õ _
  É  Ê Ï Ö Ù _ ³ Ö Í Æ Ô _
  È ¬ É Î Õ Ø _ ² Õ Ì Å Ó _
  Ç « È Í Ô × _ ± Ô Ë Ä Ò _
  ...

Drat. I was hoping to see text on the 4th line, but was just getting garbage. Well, not exactly garbage, there were still unprintable characters in the same place as if they'd be word breaks, while the rest was filled with bytes with the high bit set. I compared the ASCII values of this set against the first set. This second string was offset much more than 3 characters. I tweaked the script to go from an $offset of 0 to 256 to see if anywhere there was a readable string. I also prefixed the $offset value so I could keep better track of things:

  # Using string 2
  prompt$ perl decrypt.pl
  0: Ð ´ Ñ Ö Ý à _ º Ý Ô Í Û _
  1: Ï ³ Ð Õ Ü ß _ ¹ Ü Ó Ì Ú _
  2: Î ² Ï Ô Û Þ _ ¸ Û Ò Ë Ù _
  3: Í ± Î Ó Ú Ý _ · Ú Ñ Ê Ø _
  4: Ì ° Í Ò Ù Ü _ ¶ Ù Ð É × _
  ...
  101: k O l q x { # U x o h v 1
  102: j N k p w z " T w n g u 0
  103: i M j o v y ! S v m f t /
  104: h L i n u x   R u l e s .
  105: g K h m t w  Q t k d r -
  106: f J g l s v  P s j c q ,
  ...

Aha! Look at line #104 - "hLinux Rules.". We're definitely onto something.

So, it seems our algorithm was correct, but the offset kept changing. Running the rest of the encrypted strings also yields readable output with different values for $offset. Although, for some reason, one of them was not quite readable:

  # Using string 4 this time
  prompt$ perl decrypt.pl
  ...
  124: | D o n ' t   e a t   _ e l l o _ _ _ n o _ .
  ...

Hmmn - those underscores (originally unprintable characters) indicated I didn't quite have my algorithm tweaked correctly. Looking at the raw ASCII values, I realized I'd not taken into account the fact that when you take an original ASCII value and add or subtract increasing numbers, eventually you'll go beyond the bounds of an 8 bit number. So, tweaking a bit further I added the following and re-ran it:


  prompt$ cat decrypt.pl
  #!/usr/bin/perl
  ...
          for my $char ( split //, $uudecoded ) {
		  my $ascii = ord($char) - $offset - $count;
		  $ascii += 256 while $ascii < 0;
                  print chr( $ascii ), " ";
		  $count++;
          }
          print "\n";
  }

  prompt$ perl decrypt.pl
  ...
  124: | D o n ' t   e a t   y e l l o w   s n o w .
  ...

Excellent. Now we can decrypt the strings correctly. Unfortunately, there's still one nagging problem: we have no way to find the right $offset value, and need to try many of them and either find the right one using our eyes, or write something to check and see when the results match our expectations.

I tried seeing if the offset was related to the size of the message, any of the other related fields in the database, or the message itself.[1] Then it struck me how blind I was - the offset must be stored in the mystery first character.

Converting the initial character of each string to it's uudecoded ascii value you get:

  String 1: !!@`` == 6
  String 2: !T``` == 208
  String 3: !8@`` == 98
  String 4: !^``` == 248
  String 5: !S@`` == 206

The first thing that jumps at you is that each value is even. The next thing that jumps out is that if you divide them by two, you get the offset that we need. The first message had offset 3, the 4th had offset 124, etc. Problem solved in full.

So what problems were built into this custom crypto?

There was a random offset, but this offset was contained within the encrypted strings barely obfuscated.
The randomization (adding the character number to the ascii value before uuencoding) was extremely week, and pretty easy to guess.
The uuencoding was used to make the encrypted version pure printable text, but made the output regular and led to some correct assumptions about the plaintext and the method of creating the ciphertext.
The random initializer, in order to only occupy one byte, needed to be between 0 and 127, which is only 7 bits of 'security'.[2]

I whipped up a set of perl subroutines to encrypt/decrypt text that was consistent with the algorithm that was in use, and presented them to the client who was relying on this home-grown cryptography to protect their sensitive data. After the panic and four letter words subsided, they quickly petitioned to create a committee to asses the feasibility of having new cryptographic security systems investigated to replace their bad algorithm.[3]

Later they showed me their encryption/decryption C code. It was many pages long, not counting the pages it took to re-implement a one-byte uuencode/decode algorithm. The amount of mathematics and manipulation that went into the algorithm were far more advanced than the simple solution I had come across, involving several independent[4] variables taken together as per-message keys. Their algorithm did require the knowledge of a secret key, only available on their server. However the way in which these values were (mis)used, ended up cancelling out the need for this key, and everything reduced to the much simpler algorithm which I'd reverse engineered.

This, of course, leads us to another tenant of cryptography:

Complexity does not guarantee security

So, thus ends our tale of one bad crypto system. Unfortunately, there are many many more out there, waiting to be broken.[5]

Here are the encryption/decryption routines for this bad crypto. Unfortunately, I can't provide the original C sources that were used for comparison. Feel free to use this as a learning/investigative tool, but for goodness sake don't use them for anything that should be secret.



 #!/usr/bin/perl
 # badcrypto.pl
 # copyright 2003, Bri Hatch, released under the GPL.
 #
 # Reverse-engineered algorithms.  Don't use these - they
 # are excellent examples of cryptography done poorly.
 #
 # Usage:  require "VeryBadCrypto.pl";
 #         $encrypted = VeryBadCrypto::encrypt( $string );
 #         $decrypted = VeryBadCrypto::decrypt( $encrypted );
 
 package VeryBadCrypto;
 
 sub encrypt {
	my($message) = join "", @_;
	my $encrypted;

	# Pick random initializer.  Must be less than 128
	my $count = int rand(127);

	# Break up message into individual characters
	my @characters = (chr($count), split //, $message);

	for my $char ( @characters ) {

		my $ascii = $count + ord($char);

		# convert to a 0-255 number
		$ascii = $ascii % 256;

		my $uuencoded = pack "u", chr($ascii);

		# Strip the trailing `` chars we have from uuencoding
		$uuencoded = substr($uuencoded,0,3);

		$encrypted .= $uuencoded;

		$count++;

		# Avoid potential problems when integers wrap around
		$count %= 256;
	}
	$encrypted
 }

 sub decrypt {
	my($encrypted) = join "", @_;
	my $decrypted;

	# Put back trailing "``" characters
	$encrypted =~ s/(...)/\1``/g;

	my $uudecoded = unpack "u", $encrypted;
	my @characters = split //, $uudecoded;

	my $initializer = shift @characters;

	# Find original initializer value
	$initializer = ord($initializer) / 2;

	# add one to compensate since the first one was used
	# to encode the initializer itself.
	$initializer++;
	
	# Ok, we've got our initializer, time to decrypt the rest.
	for my $char ( @characters ) {
		my $ascii = ord($char);

		$ascii -= $initializer;
		$ascii += 256 while $ascii < 0;

		$decrypted .= chr($ascii);

		$initializer++;
	}
	$decrypted;
 }

NOTES:

[1] Which was pretty foolish, since the message wouldn't be available to the decrypt routine, but I wasn't thinking clearly.

[2] If you could even call this algorithm secure.

[3] Ahh, now if that isn't a definitive step, I don't know what is.

[4] But, by their error, these other values were not independent, which is why they ended up being irrelevant.

[5] And if you live in the US, you'd better hope they're not being used to protect copyrighted works, or your knowledge of rot13 could land you in jail for DMCA violations.

Bri Hatch is Chief Hacker at Onsight, Inc and author of Hacking Linux Exposed and Building Linux VPNs. He developed his first (horribly insecure) cryptographic algorithm when he was six. It was no better than rot13, but took up a lot more space. Bri can be reached at bri@hackinglinuxexposed.com.

This is the February 05, 2003 issue of the Linux Security: Tips, Tricks, and Hackery newsletter. If you wish to subscribe, visit http://lists.onsight.com/ or send email to Linux_Security-request@lists.onsight.com.

previous article
index
next article