1. URI reader
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|
href= | ' | www. | google. | com/a. | html | “ |
href= | google. | com | ||||
google. | com | |||||
www. | google. | com |
- group 1 optional hyperlink indicator depends if it is an active link on the page
- group 2 optional delimiter
- group 3 - 5 repeating group of pattern \text&dot\
- group 6 ending of URI, which can be file extension or top class domain name
- group 7 optional delimiter
- syntactical solution: (href= ){0,1}['\|”]{0,1}([0-9a-zA-Z_/]{1,*}[\.]){1,*}[0-9a-zA-Z]{2,4}['\|“]{0,1}
wget --quiet -O - http://www.csszengarden.com | grep -E "[a-z][a-zA-Z0-9+.-]*:\/\/([a-zA-Z0-9+.-_]+@)*[a-zA-Z0-9+.\-]+(:[0-9]+)*\/([a-zA-Z0-9+_.\-]+\/)*([a-zA-Z0-9+_#?=.\-]+(&|;)*)*" -o
- result:
vding@fx160-08:~$ wget --quiet -O - http://www.csszengarden.com | grep -E "[a-z][a-zA-Z0-9+.-]*:\/\/([a-zA-Z0-9+.-_]+@)*[a-zA-Z0-9+.\-]+(:[0-9]+)*\/([a-zA-Z0-9+_.\-]+\/)*([a-zA-Z0-9+_#?=.\-]+(&|;)*)*" -o http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd http://www.w3.org/1999/xhtml http://www.bluerobot.com/web/css/fouc.asp http://www.csszengarden.com/favicon.ico http://www.csszengarden.com/zengarden.xml http://www.mezzoblue.com/zengarden/resources/ http://www.mezzoblue.com/zengarden/submit/ http://www.mezzoblue.com/zengarden/resources/ http://www.mezzoblue.com/zengarden/submit/guidelines/ http://creativecommons.org/licenses/by-nc-sa/1.0/ http://www.mediatemple.net/ http://www.amazon.com/exec/obidos/ASIN/0321303474/mezzoblue-20/ http://validator.w3.org/check/referer http://jigsaw.w3.org/css-validator/check/referer http://creativecommons.org/licenses/by-nc-sa/1.0/ http://mezzoblue.com/zengarden/faq/#s508 http://www.mezzoblue.com/zengarden/faq/#aaa http://www.ericstoltz.com/ http://skybased.com/ http://www.kevinaddison.com/ http://www.pixel-house.com.au/ http://www.benklemm.de/ http://www.re-bloom.com/ http://rpmdesignfactory.com/ http://users.skynet.be/bk316398/temp.html http://www.mezzoblue.com/zengarden/alldesigns/ http://www.mezzoblue.com/zengarden/resources/ http://www.mezzoblue.com/zengarden/faq/ http://www.mezzoblue.com/zengarden/submit/ http://www.mezzoblue.com/zengarden/translations/
explain: wget will get the file from www.csszengarden.com quietly (without output to avoid garbage) and immediately truncate it then pass it to grep . Group1 can actually contain digits and special characters as long as it is not on the first position followed by ”:\/\/“. Then the repeating group of dot separated strings, then the repeating group of slash separated strings. Finally end with almost any character.
2. phone number reader
1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
+ | 31 | (0) | 35 | - | 12345678 |
00 | 31 | 35 | 12345678 |
Group 1 can either be a + sign or 00.
Group 2 must be digits, 1 to 3 letters long, but with first letter not being 0.
Group 3 is an optional ”(0)“. Sometimes people use brackets around 0, sometimes don't.
Group 4 is similar to group 2. It consists of 1 to 3 letters, with first letter not being 0.
Group 5 is the optional -
Group 6 is the number string with 7 to 8 letters, with first letter not being 0.
- solution: (+/00)[1-9][0-9]{0,2}(\(0\))?[1-9]{1,3}(-)?[1-9][0-9]{6,7}
- BASH syntax: ^\([+]\|[0]\{2\}\)[1-9][0-9]\{1,2\}\([\(][0][\)]\)\{0,1\}[1-9]\{1,3\}[-]\{0,1\}[1-9][0-9]\{6,7\}
my phone number in different forms saved in file 'phonenumber' and corresponding grep -E command
+31(0)624820296 +31624820296 +31(0)6-24820296 +316-24820296 0031624820296 0031(0)624820296 0031(0)6-24820296 00316-24820296 grep -E ^\([+]\|[0]\{2\}\)[1-9][0-9]\{1,2\}\([\(][0][\)]\)\{0,1\}[1-9]\{1,3\}[-]\{0,1\}[1-9][0-9]\{6,7\} phonenumber
improvement
3. bash comment remover with grep
Line with only comments I can think of is a line start with #. So I will search for a # at the begin of the line and match all the rest of the line.
- solution: [^\][#]{*}[.]{*}
- syntax: grep -E [^\][#]{*} scriptsrc
Every string contains # and not leading by escape will be filtered.
#!/bin/bash ### MySQL Server Login Info ### MUSER="root" MPASS="MYSQL-ROOT-PASSWORD" MHOST="localhost" MYSQL="$(which mysql)" MYSQLDUMP="$(which mysqldump)" BAK="/backup/mysql" GZIP="$(which gzip)" ### FTP SERVER Login info ### FTPU="FTP-SERVER-USER-NAME" FTPP="FTP-SERVER-PASSWORD" FTPS="FTP-SERVER-IP-ADDRESS" NOW=$(date +"%d-%m-%Y") #tester 2 [ ! -d $BAK ] && mkdir -p $BAK || /bin/rm -f $BAK/* #tester DBS="$($MYSQL -u $MUSER -h $MHOST -p$MPASS -Bse 'show databases')" for db in $DBS do FILE=$BAK/$db.$NOW-$(date +"%T").gz $MYSQLDUMP -u $MUSER -h $MHOST -p$MPASS $db | $GZIP -9 > $FILE done lftp -u $FTPU,$FTPP -e "mkdir /mysql/$NOW;cd /mysql/$NOW; mput /backup/mysql/*; quit" $FTPS #grep -E ^[\#]\{1,\} scriptsrc
The last line is the command. I used the command to filter out the comment lines, do not know how to replace them with empty line