Viewed   258 times

PHP has built in support for reading EXIF and IPTC metadata, but I can't find any way to read XMP?



XMP data is literally embedded into the image file so can extract it with PHP's string-functions from the image file itself.

The following demonstrates this procedure (I'm using SimpleXML but every other XML API or even simple and clever string parsing may give you equal results):

$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end   = strpos($content, '</x:xmpmeta>');
$xmp_length     = $xmp_data_end - $xmp_data_start;
$xmp_data       = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp            = simplexml_load_string($xmp_data);

Just two remarks:

  • XMP makes heavy use of XML namespaces, so you'll have to keep an eye on that when parsing the XMP data with some XML tools.
  • considering the possible size of image files, you'll perhaps not be able to use file_get_contents() as this function loads the whole image into memory. Using fopen() to open a file stream resource and checking chunks of data for the key-sequences <x:xmpmeta and </x:xmpmeta> will significantly reduce the memory footprint.
Tuesday, October 25, 2022

About RDF

It appears that what Photoshop is doing is reading a valid, well formed, RDF/XML serialization of some data, and then displaying it back to the user in UI in another valid, well-formed, RDF/XML serialization that happens to follow some additional conventions.

RDF is a graph-based data representation. The fundamental piece of knowledge in RDF is the triple, also called a statement. Each triple has a subject, a predicate, and an object. Subjects, predicates, and objects may all be IRI references; subjects and objects can also be blank nodes, and objects may also be literals (e.g., a string). RDF/XML is one particular serialization of RDF. The RDF/XML snippet:

<rdf:Description rdf:about="" xmlns:photoshop="">
  <photoshop:CaptionWriter>OOO </photoshop:CaptionWriter>

contains three triples:

<this-document> <> "OOOInstructions"
<this-document> <> "OOOHeadline"
<this-document> <> "OOO "

where <this-document> is the result of resolving the reference "" (the value of the rdf:about attribute. (Page 21 of the XMP documentation says that the value of the rdf:about attribute may be an empty string …, which means that the XMP is physically local to the resource being described. Applications must rely on knowledge of the file format to correctly associate the XMP with the resource".)


<rdf:Description rdf:about=""

<rdf:Description rdf:about=""
  <photoshop:CaptionWriter>OOO </photoshop:CaptionWriter>

is exactly the same as doing

<rdf:Description rdf:about=""
  <photoshop:CaptionWriter>OOO </photoshop:CaptionWriter>

They serialize the same set of triples. Neither is invalid or incorrect. It's just a matter of which you prefer. Other variations are possible as well. For instance, in some cases you can use element attributes to indicate property values. The triple:

<this-document> <> "OOOInstructions"

can be seralized using elements, as described in Section 2.2 Node Elements and Property Elements of the RDF/XML recommendation:

<rdf:Description rdf:about="" xmlns:photoshop="">

or using attributes to indicate the property value, as described in Section 2.5 Property Attributes of the same document:

<rdf:Description rdf:about="" xmlns:photoshop=""

So, as to your second question:

Why should I spend the time to format my output to the RDF specs when it works nicely all jumbled together in a single rdf:Description?

If the output is supposed to be in RDF, you should make it valid RDF. Whether it's in a particular aesthetically pleasing format is an entirely different question. It's relatively easy to translate between the two of these, and I expect that what Photoshop is doing is reading a blob of RDF as it should (i.e., not depending on any particular structure of the XML serialization, since that's not always the same (e.g., you shouldn't try to manipulate RDF with XPath)) and then formatting that data for the user in a way that it considers nice, namely, the convention that you mentioned.

If you're not already, I very strongly suggest that you use an RDF library in PHP to construct the metadata graph, and not try to construct the RDF/XML serialization by hand.

About XMP in RDF

Note: this is an update based on the documentation. According to the documentation, page 19, XMP only supports a subset of RDF, so it is still a meaningful question about whether the RDF above and in the question, though suitable as RDF, is suitable as XMP. However, also from page 19:

The sections below describe the high-level structure of XMP data in an XMP Packet:

  • The outermost element is optionally an x:xmpmeta element
  • It contains a single rdf:RDF element
  • which in turn contains one or more rdf:Description elements
  • each of which contains one or more XMP Properties.

Page 20 contains some elaboration about the rdf:Description elements (emphasis added):

The rdf:RDF element can contain one or more rdf:Description elements. … By convention, all properties from a given schema, and only that schema, are listed within a single rdf:Description element. (This is not a requirement, just a means to improve readability.)

The part with added emphasis is what we need in order to conclude that both forms we've seen above are acceptable. It's probably easier to just create one big blob, and consider yourself lucky if some other tool splits it into the conventional form for you.

Tuesday, December 6, 2022

The following seems to work nicely, but if there's something bad about it, I'd appreciate any comments.

    public string GetDate(FileInfo f)
        using(FileStream fs = new FileStream(f.FullName, FileMode.Open, FileAccess.Read, FileShare.Read))
            BitmapSource img = BitmapFrame.Create(fs);
            BitmapMetadata md = (BitmapMetadata)img.Metadata;
            string date = md.DateTaken;
            return date;
Thursday, August 25, 2022

You can use textConnection() to pass the character vector to read.table(). An example:

x  <- "first,secondnthird,fourthn"
x1 <- read.table(textConnection(x), sep = ",")
# x1
     V1     V2
1 first second
2 third fourth

Answer found in the R mailing list.

2017 EDIT

Seven years later, I'd probably do it like this:

read.table(text = x, sep = ",")
Saturday, September 3, 2022

You're correct; physical destruction is the only good way to do this (you'd need a magnet so strong that it's not feasible to get one for most people unless you're on staff at the Large Hadron Collider). Professional disposal operations generally do this with an industrial metal shredder. For you, bending the platters with a hammer, sandpapering them, and then running a drill through them in multiple points is sufficient to stop anything but advanced forensic data recovery. If you're really concerned about even that, or you just want style points, you might try thermite. It is usually sufficient to melt the platters entirely.

Wednesday, October 5, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :