1. URI reader

1234567
href='www.google.com/a.html
href= google.com
google.com
www.google.com
  • group 1 optional hyperlink indicator depends if it is an active link on the page
  • group 2 optional delimiter
  • group 3 - 5 repeating group of pattern \text&dot\
  • group 6 ending of URI, which can be file extension or top class domain name
  • group 7 optional delimiter
  • syntactical solution: (href= ){0,1}['\|”]{0,1}([0-9a-zA-Z_/]{1,*}[\.]){1,*}[0-9a-zA-Z]{2,4}['\|“]{0,1}
  • wget --quiet -O - http://www.csszengarden.com | grep -E "[a-z][a-zA-Z0-9+.-]*:\/\/([a-zA-Z0-9+.-_]+@)*[a-zA-Z0-9+.\-]+(:[0-9]+)*\/([a-zA-Z0-9+_.\-]+\/)*([a-zA-Z0-9+_#?=.\-]+(&|;)*)*" -o
  • result:
    vding@fx160-08:~$ wget --quiet -O - http://www.csszengarden.com | grep -E "[a-z][a-zA-Z0-9+.-]*:\/\/([a-zA-Z0-9+.-_]+@)*[a-zA-Z0-9+.\-]+(:[0-9]+)*\/([a-zA-Z0-9+_.\-]+\/)*([a-zA-Z0-9+_#?=.\-]+(&|;)*)*" -o 
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
    http://www.w3.org/1999/xhtml
    http://www.bluerobot.com/web/css/fouc.asp
    http://www.csszengarden.com/favicon.ico
    http://www.csszengarden.com/zengarden.xml
    http://www.mezzoblue.com/zengarden/resources/
    http://www.mezzoblue.com/zengarden/submit/
    http://www.mezzoblue.com/zengarden/resources/
    http://www.mezzoblue.com/zengarden/submit/guidelines/
    http://creativecommons.org/licenses/by-nc-sa/1.0/
    http://www.mediatemple.net/
    http://www.amazon.com/exec/obidos/ASIN/0321303474/mezzoblue-20/
    http://validator.w3.org/check/referer
    http://jigsaw.w3.org/css-validator/check/referer
    http://creativecommons.org/licenses/by-nc-sa/1.0/
    http://mezzoblue.com/zengarden/faq/#s508
    http://www.mezzoblue.com/zengarden/faq/#aaa
    http://www.ericstoltz.com/
    http://skybased.com/
    http://www.kevinaddison.com/
    http://www.pixel-house.com.au/
    http://www.benklemm.de/
    http://www.re-bloom.com/
    http://rpmdesignfactory.com/
    http://users.skynet.be/bk316398/temp.html
    http://www.mezzoblue.com/zengarden/alldesigns/
    http://www.mezzoblue.com/zengarden/resources/
    http://www.mezzoblue.com/zengarden/faq/
    http://www.mezzoblue.com/zengarden/submit/
    http://www.mezzoblue.com/zengarden/translations/

explain: wget will get the file from www.csszengarden.com quietly (without output to avoid garbage) and immediately truncate it then pass it to grep . Group1 can actually contain digits and special characters as long as it is not on the first position followed by ”:\/\/“. Then the repeating group of dot separated strings, then the repeating group of slash separated strings. Finally end with almost any character.

2. phone number reader

123456
+31(0)35-12345678
0031 35 12345678

Group 1 can either be a + sign or 00.
Group 2 must be digits, 1 to 3 letters long, but with first letter not being 0.
Group 3 is an optional ”(0)“. Sometimes people use brackets around 0, sometimes don't.
Group 4 is similar to group 2. It consists of 1 to 3 letters, with first letter not being 0.
Group 5 is the optional -
Group 6 is the number string with 7 to 8 letters, with first letter not being 0.

  • solution: (+/00)[1-9][0-9]{0,2}(\(0\))?[1-9]{1,3}(-)?[1-9][0-9]{6,7}
  • BASH syntax: ^\([+]\|[0]\{2\}\)[1-9][0-9]\{1,2\}\([\(][0][\)]\)\{0,1\}[1-9]\{1,3\}[-]\{0,1\}[1-9][0-9]\{6,7\}

my phone number in different forms saved in file 'phonenumber' and corresponding grep -E command

+31(0)624820296
+31624820296
+31(0)6-24820296
+316-24820296
0031624820296
0031(0)624820296
0031(0)6-24820296
00316-24820296


grep -E ^\([+]\|[0]\{2\}\)[1-9][0-9]\{1,2\}\([\(][0][\)]\)\{0,1\}[1-9]\{1,3\}[-]\{0,1\}[1-9][0-9]\{6,7\} phonenumber 

improvement

3. bash comment remover with grep

Line with only comments I can think of is a line start with #. So I will search for a # at the begin of the line and match all the rest of the line.

  • solution: [^\][#]{*}[.]{*}
  • syntax: grep -E [^\][#]{*} scriptsrc

Every string contains # and not leading by escape will be filtered.

#!/bin/bash
### MySQL Server Login Info ###
MUSER="root"
MPASS="MYSQL-ROOT-PASSWORD"
MHOST="localhost"
MYSQL="$(which mysql)"
MYSQLDUMP="$(which mysqldump)"
BAK="/backup/mysql"
GZIP="$(which gzip)"
### FTP SERVER Login info ###
FTPU="FTP-SERVER-USER-NAME"
FTPP="FTP-SERVER-PASSWORD"
FTPS="FTP-SERVER-IP-ADDRESS"
NOW=$(date +"%d-%m-%Y")
#tester 2
[ ! -d $BAK ] && mkdir -p $BAK || /bin/rm -f $BAK/* #tester
 
DBS="$($MYSQL -u $MUSER -h $MHOST -p$MPASS -Bse 'show databases')"
for db in $DBS
do
 FILE=$BAK/$db.$NOW-$(date +"%T").gz
 $MYSQLDUMP -u $MUSER -h $MHOST -p$MPASS $db | $GZIP -9 > $FILE
done
 
lftp -u $FTPU,$FTPP -e "mkdir /mysql/$NOW;cd /mysql/$NOW; mput /backup/mysql/*; quit" $FTPS
#grep -E ^[\#]\{1,\} scriptsrc

The last line is the command. I used the command to filter out the comment lines, do not know how to replace them with empty line