GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   anyone good with sed? (https://gfy.com/showthread.php?t=1023400)

fris 05-20-2011 06:30 AM

anyone good with sed?
 
im trying to grab all the links out of a chrome bookmark export file.

i want in the format of <a href="link">name</a>

Code:

cat bookmarks_5_20_11.html | sed 's/^.*HREF="//' | sed 's/".*$//'
works just for the links, anyone have an idea how i can get the full link with title?

Juicy D. Links 05-20-2011 07:02 AM

Quote:

Originally Posted by fris (Post 18153957)
im trying to grab all the links out of a chrome bookmark export file.

i want in the format of <a href="link">name</a>

Code:

cat bookmarks_5_20_11.html | sed 's/^.*HREF="//' | sed 's/".*$//'
works just for the links, anyone have an idea how i can get the full link with title?

I have no clue

Klen 05-20-2011 07:11 AM

find /location/ -name "*.html" | xargs grep '<a href="link">name</a>'

fris 05-20-2011 09:02 AM

Quote:

Originally Posted by KlenTelaris (Post 18154012)
find /location/ -name "*.html" | xargs grep '<a href="link">name</a>'

im looking to grab all the links from the export .html

Klen 05-20-2011 11:07 AM

Quote:

Originally Posted by fris (Post 18154208)
im looking to grab all the links from the export .html

Replace * with export

fris 05-20-2011 11:47 AM

Quote:

Originally Posted by KlenTelaris (Post 18154491)
Replace * with export

that will only grep for the 1 link though.

Klen 05-20-2011 11:54 AM

Google for command xargs and you should find proper example,saw tons of examples regarding what you looking for.

V_RocKs 05-20-2011 02:35 PM

Your problem is one of regex... coming up with a solution now...

marlboroack 05-20-2011 02:36 PM

Fuck if i know, help me help you.

GrouchyAdmin 05-20-2011 02:37 PM

you could probably get exactly what you want in one line by using 'cut'

cut -f(whereverhereitis) -d\" (or whatever the encapsulating format is) file

96ukssob 05-20-2011 02:39 PM

http://foodnetworkhumor.com/wp-conte...h-say-what.jpg

V_RocKs 05-20-2011 03:08 PM

Code:

cat bookmarks_5_20_11.html |grep 'DT><A' | sed 's/\s*//'|sed 's/<DT>//' | sed 's/\s*ADD_DATE="[^"]*"//'| sed 's/\s*ICON="[^"]*"//'

fris 05-24-2011 06:34 PM

Quote:

Originally Posted by V_RocKs (Post 18155151)
Code:

cat bookmarks_5_20_11.html |grep 'DT><A' | sed 's/\s*//'|sed 's/<DT>//' | sed 's/\s*ADD_DATE="[^"]*"//'| sed 's/\s*ICON="[^"]*"//'

thanks, nice one liner ;)

anyway to trim the space at the end of the " >

so its just <a href="link">test</a> instead of <a href="link" >test</a>
:thumbsup

Socks 05-24-2011 10:45 PM

Quote:

Originally Posted by fris (Post 18164324)
thanks, nice one liner ;)

anyway to trim the space at the end of the " >

so its just <a href="link">test</a> instead of <a href="link" >test</a>
:thumbsup

If your cursor is to the right of the space, you use backspace. If it's to the left, you have to use the delete key.

fris 05-25-2011 08:03 AM

Quote:

Originally Posted by Socks (Post 18164667)
If your cursor is to the right of the space, you use backspace. If it's to the left, you have to use the delete key.

thanks fatfoo ;)

ps got it working

Code:

#!/usr/local/bin/bash

# chrome bookmark cleanup
# converts bookmark export to just links with titles

# our chrome bookmark export file

cat bookmarks.html | \

# grep the links

grep 'DT><A' | \

# remove the <DT> html tag

sed 's/<DT>//' | \

# remove the DATE variable from the link

sed 's/ ADD_DATE=\".*\"//g' | \

# remove the ICON variable from the link

sed 's/ ICON=\".*\"//g' | \

# remove leading whitespace

sed 's/^[ \t]*//' | \

# convert the html link tags to lowercase

tr '<A HREF' '<a href'


fris 06-07-2011 03:41 PM

Quote:

Originally Posted by V_RocKs (Post 18155151)
Code:

cat bookmarks_5_20_11.html |grep 'DT><A' | sed 's/\s*//'|sed 's/<DT>//' | sed 's/\s*ADD_DATE="[^"]*"//'| sed 's/\s*ICON="[^"]*"//'

found a shorter way.

Code:

cat bookmarks.html | sed '/<DT><A\|<DT><H3/!d;s/<DT>//;/Bookmarks bar/d;s/ ADD_DATE=\".*\"//g;s/^[ \t]*//;s/<A HREF/<a href/'


All times are GMT -7. The time now is 09:25 PM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123