How to decode a HTML page that uses hexa code for obfuscation

I was trying to read the source code of a html page, but the author decided to make it hard to read:

<HtMl> <hEAD> & #x20;& #x20;<MetA chaRSet="utf-8"> & #x20;& #x20;& #x20;<MetA NAmE="v& #x69;ew& #x70;& #x6f;rt" coNTeNt="width=device-width, height=device-height, user-scalable=no,

Then I decided to decode it. I got the ASCII->HTML Hexa code here:
http://webdesign.about.com/od/localization/l/blhtmlcodes-ascii.htm

Save two files ascii.txt and hexa.txt.
This is the header of ascii file:

 
 
 
!
"
#
$
%
&
'

and this is the header of hexa.txt:

& #x09;
& #x10;
& #x20;
& #x21;
& #x22;
& #x23;
& #x24;
& #x25;
& #x26;
& #x27;

Then used this shell script from Linux terminal:

$ while read -r -u3 html; read -r -u4 ascii; do sed -i s/"$html"/"$ascii"/g encoded_file.html ; done 3<hexa.txt 4<ascii.txt

Note that letters in the hexa code file (hexa.txt) are upper case, then we also need to convert it lowercase and repeat the above shell script:

$ sed -i s/A/a/g hexa.txt
$ sed -i s/B/b/g hexa.txt
$ sed -i s/C/c/g hexa.txt
$ sed -i s/D/d/g hexa.txt
$ sed -i s/E/e/g hexa.txt
$ sed -i s/F/f/g hexa.txt
$ while read -r -u3 html; read -r -u4 ascii; do sed -i s/"$html"/"$ascii"/g encoded_file.html ; done 3<hexa.txt 4<ascii.txt

Great, now encoded_file.html is easy to read! I think there are some easier solution using python or some tool, but I didn’t find it. Suggestion???

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s