I have a table with 3 columns - id (pk), pageId (fk), name. I have a php script which dumps about 5000 records into the table, with about half being duplicates, with same pageId and name. Combination of pageId and name should be unique. What is the best way to prevent duplicates being saved to the table as I loop through the script in php?
Answers
I would say just build it yourself. You can set it up like this:
$query = "INSERT INTO x (a,b,c) VALUES ";
foreach ($arr as $item) {
$query .= "('".$item[0]."','".$item[1]."','".$item[2]."'),";
}
$query = rtrim($query,",");//remove the extra comma
//execute query
Don't forget to escape quotes if it's necessary.
Also, be careful that there's not too much data being sent at once. You may have to execute it in chunks instead of all at once.
Yes, your solution with a many-to-many relationship between tables sounds good for your case. You can then easily JOIN the tables to get information out of the IDs.
The function you're looking for is find_in_set:
select * from ... where find_in_set($word, pets)
for multi-word queries you'll need to test each word and AND (or OR) the tests:
where find_in_set($word1, pets) AND find_in_set($word2, pets) etc
Every system I know of that stores large numbers of big files stores them externally to the database. You store all of the queryable data for the file (title, artist, length, etc) in the database, along with a partial path to the file. When it's time to retrieve the file, you extract the file's path, prepend some file root (or URL) to it, and return that.
So, you'd have a "location" column, with a partial path in it, like "a/b/c/1000", which you then map to: "http://myserver/files/a/b/c/1000.mp3"
Make sure that you have an easy way to point the media database at a different server/directory, in case you need that for data recovery. Also, you might need a routine that re-syncs the database with the contents of the file archive.
Also, if you're going to have thousands of media files, don't store them all in one giant directory - that's a performance bottleneck on some file systems. Instead,break them up into multiple balanced sub-trees.
First step would be to set a unique key on the table:
Then you have to decide what you want to do when there's a duplicate. Should you:
ignore it?
Overwrite the previously entered record?
Update some counter?