Web Analytics Made Easy -
StatCounter regular expression to extract html links from a string - CodingForum


No announcement yet.

regular expression to extract html links from a string

  • Filter
  • Time
  • Show
Clear All
new posts

  • regular expression to extract html links from a string

    hello--I have spent hours now trying to figure this out. it should be simple but I have found it to not be.

    I have a string and need to extract the links.


    hello today is a fine day to post a <a href="http://www.linkme.com">link</a> to my favorite website.
    should return

    <a href="http://www.linkme.com">link</a>
    I need to use preg_match_all() to find all occurances of html links in my string. preg_match_all() will put all the occurances into an array.

    So far I have found about a dozen instances of this on the web but they must not have been written for php bc I keep getting errors like "Unknown modifier '['" and others.

    my own attempt at the regular expression I would need is

    PHP Code:
    (1 occurance of "<a then zero or more occurances of any character (.), then "</a>". btw this probably sucks. as u can see i'm no good at regular expressions).

    I would sincerely appreciate help bc I have just been working on this damn problem for so long.
    Last edited by ralph.m; Sep 24, 2006, 01:58 PM. Reason: wanted to make it clear I wasn't crawling (I changed "page" to "string" in the title)

  • #2
    i found this on one of the other threads...

    PHP Code:
    preg_match_all("/<a.*? href=\"(.*?)\".*?>(.*?)<\/a>/i",$string,$results); 
    but it doesn't work bc when I use preg_replace with the same expression and replace links with '', there are still tons of links in the string...


    • #3
      That one you found will work for some anchor tags but not all. It doesn't find tags that use single quotes, for example. But it's a pretty good start.

      This will include single quotes too:
      PHP Code:
      $preg "/<a.*? href=(\"|')(.*?)(\"|').*?>(.*?)<\/a>/i"
      If you want to post an example of an anchor tag that didn't get picked up by that regex then we can try to modify it further.


      • #4
        and for extract links only in the form /somedir/ or /somedir ??? Links like /someurl.php or http://www.lalala.com/index.php will be ignored