Information Technology Grimoire

Version .0.0.1

IT Notes from various projects because I forget, and hopefully they help you too.

strip HREFS

I was importing text into a shopping cart that was provided by the vendor, but they included links back to all their products! I wanted the keywords, but not the links. HTML::TokeParser::Simple did the job:

#!/usr/bin/perl
# strip out all hrefs, keep the rest
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new(\*DATA);
while ( my $token = $parser->get_token ) {
  if ($token->is_start_tag('a')) {
    my $href = $token->get_attr('href');
    if (defined $href and $href !~ /^#/) {
      print $parser->get_trimmed_text('/a');
      $parser->get_token; # discard </a>
      next;
    }
  }
  print $token->as_is;
}
__DATA__
paste your html or text with html in it, here