Viewed   70 times

Is there a pre-existing function or class for URL normalization in PHP?

Specifically, following the semantic preserving normalization rules laid out in this wikipedia article on URL normalization, (or whatever 'standard' I should be following).

  • Converting the scheme and host to lower case
  • Capitalizing letters in escape sequences
  • Adding trailing / (to directories, not files)
  • Removing the default port
  • Removing dot-segments

Right now, I'm thinking that I'll just use parse_url(), and apply the rules individually, but I'd prefer to avoid reinventing the wheel.

 Answers

2

The Pear Net_URL2 library looks like it'll do at least part of what you want. It'll remove dot segments, fix capitalization and get rid of the default port:

include("Net/URL2.php");
$url = new Net_URL2('HTTP://example.com:80/a/../b/c');
print $url->getNormalizedURL();

emits:

http://example.com/b/c

I doubt there's a general purpose mechanism for adding trailing slashes to directories because you need a way to map urls to directories which is challenging to do in a generic way. But it's close.

References:

  • http://pear.php.net/package/Net_URL2
  • http://pear.php.net/package/Net_URL2/docs/latest/Net_URL2/Net_URL2.html
Thursday, August 11, 2022
2

The weird characters in the values passed in the URL should be escaped, using urlencode().


For example, the following portion of code :

echo urlencode('dsf13f3343f23/23=');

would give you :

dsf13f3343f23%2F23%3D

Which works fine, as an URL parameter.


And if you want to build aquery string with several parameters, take a look at the http_build_query() function.

For example :

echo http_build_query(array(
    'id' => 'dsf13f3343f23/23=',
    'a' => 'plop',
    'b' => '$^@test', 
));

will give you :

id=dsf13f3343f23%2F23%3D&a=plop&b=%24%5E%40test

This function deals with escaping and concatenating the parameters itself ;-)

Friday, October 14, 2022
5

The "domainNameSuffix" is called a top level domain (tld for short), and there is no easy way to extract it.

Every country has it's own tld, and some countries have opted to further subdivide their tld. And since the number of subdomains (my.own.subdomain.example.com) is also variable, there is no easy "one-regexp-fits-all".

As mentioned, you need a list. Fortunately for you there are lists publicly available: http://publicsuffix.org/

Sunday, November 13, 2022
5

You should not!

Parse it as text (eg. by stripping the content of spaces and splitting it by commas) or, less preferably, make the content a PHP code and then include it using include.

Parsing TXT file as text

You should probably stick to parsing the file as text - you can use file() function for this. file() reads the whole file and returns its content as array of lines (with end of lines still attached). You can do whatever you need with such data (eg. strip it off the commas and spaces).

PHP code within included file

If you choose the former (including PHP code in TXT file, which is a bad idea if you want to make sites.txt available publicly), then you should try not to pollute global namespace. You can achieve it by using solution mentioned by other answers:

  • in sites.txt:

    <?php
    return 'your string goes here';
    
  • in your script:

    <?php
    $my_string = include('sites.txt');
    

The worst solution - eval()

Just including it for informative reasons. If you have a PHP code within your TXT file, you can use eval() for parsing it as PHP code. But remember - eval() is evil, avoid it at all costs.

Tuesday, September 20, 2022
 
andy_s
 
1

The answer pretty much depends what you need it for. If you are developing a theme and want to keep values constant through all files you can place them in functions.php in the theme directory, which is always loaded. Variables defined there should be available everywhere in the theme. This works if you distribute the theme.

If you want to modify an existing theme for your needs on your own installation, you can either put them in wp-config.php, as suggested, or (a cleaner method) you can create a child theme of the theme you want to change. This will keep it separate from the wordpress core and will prevent theme updates from overwriting your changes.

I just tried it using functions.php:

functions.php:

$variable = "value";

header.php:

global $variable;
echo $variable;

works for me.

Thursday, October 6, 2022
 
skyluc
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :