Viewed   109 times

I have well-formed xml documents into string variables. I want to use preg_replace to add a defined attribute to every xml tags.

For example replace:

<tag1>
<tag2> some text </tag2>
</tag1>

by:

<tag1 attr="myAttr">
<tag2 attr="myAttr"> some text </tag2>
</tag1>

So I basically need the regex expression to find any start tags and add my attribute, but I'm a complete regex noob.

 Answers

5

Don't use regular expressions for working on xml. Xml is not a regular language. Use the xml extensions of php instead:

$xml = new SimpleXml(file_get_contents($xmlFile));
function process_recursive($xmlNode) {
    $xmlNode->addAttribute('attr', 'myAttr');
    foreach ($xmlNode->children() as $childNode) {
        process_recursive($childNode);
    }
}
process_recursive($xml);
echo $xml->asXML();

All answers containing regular expressions will break this valid xml, for example:

<?xml version="1.0" encoding='UTF-8'?>
<html>
    <head>
        <!-- <meta> ... </meta> -->
        <script>//<![CDATA[
            function load() {document.write('<tt>Test</tt>');}
        //]]></script>
        <title><![CDATA[Fancy <<SiteName>> [with Breadcrumbs] > in > title]]></title>
    </head>
    <body onload="load()">
        <input
            type="submit"
            value="multiline
                   button
                   text"
        />
    </body>
</html>
Friday, October 21, 2022
2

You should add the following flags to your regex:

  • m to enable multiline strings
  • u to enable UTF8 strings (if necessary)
Wednesday, November 2, 2022
2

Focusing on the regex, can anyone foresee a situation under which this would fail horribly when used against well-formed markup?When run against the XML conformance test suite, how many well-formed XML documents does it reject, and how many ill-formed XML documents does it accept?

Perhaps the biggest objection from those who share the culture of the XML community is that it will not only parse most well-formed XML documents, it will also parse most non-XML documents, in the sense that it doesn't tell you they are ill-formed. Now perhaps you think that doesn't matter too much in your environment - but in the end, if you accept ill-formed documents, then people will start sending you ill-formed documents, and before long you are in the same mess as HTML, where you have to accept any old rubbish for legacy reasons.

I don't know enough PHP to judge quickly how well your code will work against well-formed XML. But I question the motivation - why one earth would you want to write a cheap-and-dirty-and-slow XML parser by hand when there are perfectly good-and-correct-and-fast-and-free ones available off the shelf?

Thursday, September 1, 2022
3

An attribute declaration can be added within xs:complexType after the xs:sequence:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           elementFormDefault="qualified">
  <xs:element name="Address">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="StreetAddress" minOccurs="0" type="xs:string"/>
        <xs:element name="OtherDestination" minOccurs="0" type="xs:string"/>
        <xs:element name="City" minOccurs="0" type="xs:string"/>
      </xs:sequence>

      <!------------------------------------------>
      <!-- This is where to declare attributes: -->
      <xs:attribute name="id" type="xs:string"/>
      <!------------------------------------------>

    </xs:complexType>
  </xs:element>
</xs:schema>

The above XSD will validate your XML successfully.

Tuesday, November 8, 2022
3

Your XML in a variable

DECLARE @xml XML=
N'<ContentTemplate>
  <Tab Title="Lesson">
    <Section Title="Lesson Opening" />
    <Section Title="Lesson/Activity" />
  </Tab>
  <Tab Title="Wrap Up and Assessment">
    <Section Title="Lesson Closing" />
    <Section Title="Tracking Progress/Daily Assessment" />
  </Tab>
  <Tab Title="Differentiated Instruction">
    <Section Title="Strategies - Keyword" />
    <Section Title="Strategies – Text" />
    <Section Title="Resources" />
    <Section Title="Acceleration/Enrichment" />
  </Tab>
  <Tab Title="District Resources">
    <Section Title="Related Content Items" />
    <Section Title="Other" />
  </Tab>
</ContentTemplate>';

1) FLWOR

The .modify()-statement allows you to change one decent point in your XML, but you'd need many calls to change many places. FLWOR allows you to re-build the XML out of itself:

SET @xml=@xml.query(
'<ContentTemplate>
{
for $t in /ContentTemplate/Tab
   return 
   <Tab Title="{$t/@Title}" PortletName="CommunitiesViewer">
   {$t/*}
   </Tab>
}
</ContentTemplate>');

SELECT @xml

2) Rebuild with SELECT ... FOR XML PATH()

You'd reach the same with this approach: Again the XML is re-built, but this time it is shredded and used as a new SELECT ... FOR XML PATH

SELECT tb.value('@Title','nvarchar(max)') AS [@Title]
      ,'CommunitiesViewer' AS [@PortletName]
      ,tb.query('*')
FROM @xml.nodes('/ContentTemplate/Tab') AS A(tb)
FOR XML PATH('Tab'),ROOT('ContentTemplate')
Monday, December 5, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :