strip HREFS
I was importing text into a shopping cart that was provided by the vendor, but they included links back to all their products! I wanted the keywords, but not the links. HTML::TokeParser::Simple did the job:
#!/usr/bin/perl
# strip out all hrefs, keep the rest
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new(\*DATA);
while ( my $token = $parser->get_token ) {
if ($token->is_start_tag('a')) {
my $href = $token->get_attr('href');
if (defined $href and $href !~ /^#/) {
print $parser->get_trimmed_text('/a');
$parser->get_token; # discard </a>
next;
}
}
print $token->as_is;
}
__DATA__
paste your html or text with html in it, here