Can someone show me how to get the youtube id out of a url regardless of what other GET variables are in the URL.
Use this video for example: http://www.youtube.com/watch?v=C4kxS1ksqtw&feature=related
So between v=
and before the next &
Can someone show me how to get the youtube id out of a url regardless of what other GET variables are in the URL.
Use this video for example: http://www.youtube.com/watch?v=C4kxS1ksqtw&feature=related
So between v=
and before the next &
Brazenly stolen from htmlpurifier's youtube plugin:
preg_match('#<object[^>]+>.+?http://www.youtube.com/v/([A-Za-z0-9-_]+).+?</object>#s', $markup, $matches);
var_dump($matches[1]);
Using this pattern with a capturing group should give you the string you want:
d/(w+)?rel=d+"
example: https://regex101.com/r/kH5kA7/1
For this PHP regex:
$str = preg_replace ( '{(.)1+}', '$1', $str );
$str = preg_replace ( '{[ '-_()]}', '', $str )
In Java:
str = str.replaceAll("(.)\1+", "$1");
str = str.replaceAll("[ '-_\(\)]", "");
I suggest you to provide your input and expected output then you will get better answers on how it can be done in PHP and/or Java.
It seems that in the environment you have, the PCRE library was compiled without the PCRE_NEWLINE_ANY
option, and $
in the multiline mode only matches before the LF symbol and .
matches any symbol but LF.
You can fix it by using the PCRE (*ANYCRLF)
verb:
'~(*ANYCRLF)S+(?=*$)~m'
(*ANYCRLF)
specifies a newline convention: (*CR)
, (*LF)
or (*CRLF)
and is equivalent to PCRE_NEWLINE_ANY
option. See the PCRE documentation:
PCRE_NEWLINE_ANY
specifies that any Unicode newline sequence should be recognized.
In the end, this PCRE verb enables .
to match any char BUT a CR and LF symbols and $
will match right before either of these two chars.
See more about this and other verbs at rexegg.com:
By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a
.
(as the dot it doesn't match line breaks unless in dotall mode), as well the^
and$
anchors' behavior in multiline mode. You can override this default with the following modifiers:✽
(*CR)
Only a carriage return is considered to be a line break
✽(*LF)
Only a line feed is considered to be a line break (as on Unix)
✽(*CRLF)
Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽(*ANYCRLF)
Any of the above three is considered to be a line break
✽(*ANY)
Any Unicode newline sequence is considered to be a line breakFor instance,
(*CR)w+.w+
matches Line1nLine2 because the dot is able to match the n, which is not considered to be a line break. See demo.
Use parse_url() and parse_str().
(You can use regexes for just about anything, but they are very easy to make an error in, so if there are PHP functions specifically for what you are trying to accomplish, use those.)
parse_url takes a string and cuts it up into an array that has a bunch of info. You can work with this array, or you can specify the one item you want as a second argument. In this case we're interested in the query, which is
PHP_URL_QUERY
.Now we have the query, which is
v=C4kxS1ksqtw&feature=relate
, but we only want the part afterv=
. For this we turn toparse_str
which basically works likeGET
on a string. It takes a string and creates the variables specified in the string. In this case$v
and$feature
is created. We're only interested in$v
.To be safe, you don't want to just store all the variables from the
parse_url
in your namespace (see mellowsoon's comment). Instead store the variables as elements of an array, so that you have control over what variables you are storing, and you cannot accidentally overwrite an existing variable.Putting everything together, we have:
Working example
Edit:
hehe - thanks Charles. That made me laugh, I've never seen the Zawinski quote before:
Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.
– Jamie Zawinski