Selecting all links of a document

Task

Source document

<document id="1">
  <content>
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      </head>
      <body>
        <div>
          <ul>
            <li><a href="http://www.google.de/">http://www.google.de/">Google</a>
              <ul>
                  <li><a href="http://earth.google.de/">Google Earth</a></li>
                  <li><a href="http://picasa.google.de/intl/de/">Picasa</a></li>
              </ul>
            </li> 
            <li><a href="http://www.heise.de/">http://www.heise.de/">Heise</a></li> 
            <li><a href="http://www.yahoo.de/">http://www.yahoo.de/">Yahoo</a></li> 
          </ul>
        </div>
      </body>
    </html>
  </content>
  <teaser>
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      </head>
      <body>
        <div>
          <ul>
            <li><a href="http://www.google.de/">http://www.google.de/">Google</a></li>
          </ul>
        </div>
      </body>
    </html> 
  </teaser>
</document>

Challenge

Collect all links in the document (XHTML elements of type a, which are attached to the namespace http://www.w3.org/1999/xhtml) below the elements document/content.

For the XPath expression, the prefix x is available for the namespace http://www.w3.org/1999/xhtml.

Desired selection (XML-output)

  • <a href="http://www.google.de/" xmlns="http://www.w3.org/1999/xhtml">http://www.google.de/">Google</a>
  • <a href="http://earth.google.de/" xmlns="http://www.w3.org/1999/xhtml">Google Earth</a>
  • <a href="http://picasa.google.de/intl/de/" xmlns="http://www.w3.org/1999/xhtml">Picasa</a>
  • <a href="http://www.heise.de/" xmlns="http://www.w3.org/1999/xhtml">http://www.heise.de/">Heise</a>
  • <a href="http://www.yahoo.de/" xmlns="http://www.w3.org/1999/xhtml">http://www.yahoo.de/">Yahoo</a>

Available prefix for the XPath-expression

prefixNamespace
xhttp://www.w3.org/1999/xhtml

Exercise

Your input

Eine Lösung wäre: Lösung