I'm trying to verify that my string matches a pattern. That is, the full string can be written as that pattern.
However preg_match
returns true
, if any substring matches that pattern.
(E.g. preg_match("#[a-z]*#, "333k")
returns 1
, which I don't want to.
In this example I'd rather verify, the whole string contains only small Latin letters.)
Answers
I just tried my XElement.Parse solution. I created an extension method on the string class so I can reuse the code easily:
public static bool ContainsXHTML(this string input)
{
try
{
XElement x = XElement.Parse("<wrapper>" + input + "</wrapper>");
return !(x.DescendantNodes().Count() == 1 && x.DescendantNodes().First().NodeType == XmlNodeType.Text);
}
catch (XmlException ex)
{
return true;
}
}
One problem I found was that plain text ampersand and less than characters cause an XmlException and indicate that the field contains HTML (which is wrong). To fix this, the input string passed in first needs to have the ampersands and less than characters converted to their equivalent XHTML entities. I wrote another extension method to do that:
public static string ConvertXHTMLEntities(this string input)
{
// Convert all ampersands to the ampersand entity.
string output = input;
output = output.Replace("&", "amp_token");
output = output.Replace("&", "&");
output = output.Replace("amp_token", "&");
// Convert less than to the less than entity (without messing up tags).
output = output.Replace("< ", "< ");
return output;
}
Now I can take a user submitted string and check that it doesn't contain HTML using the following code:
bool ContainsHTML = UserEnteredString.ConvertXHTMLEntities().ContainsXHTML();
I'm not sure if this is bullet proof, but I think it's good enough for my situation.
One option is to use regular expressions:
if (str.match("^Hello")) {
// do this if begins with Hello
}
if (str.match("World$")) {
// do this if ends in world
}
Use a negated character class: [^A-Za-z-w]
This will only match if the user enters something OTHER than what is in that character class.
if (preg_match('/[^A-Za-z-w]/', $input)) { /* invalid charcter entered */ }
You use the start and end markers,
^
and$
respectively, to indicate beginning and end of the string in your regular expression pattern. That way you can make the expression match only the whole string, not any kind of substring. In your case it would then look like this:You can also, with one these markers, specify that the pattern must only match the beginning or the end of the string.