Viewed   106 times

I have a table with 3 columns - id (pk), pageId (fk), name. I have a php script which dumps about 5000 records into the table, with about half being duplicates, with same pageId and name. Combination of pageId and name should be unique. What is the best way to prevent duplicates being saved to the table as I loop through the script in php?

 Answers

1

First step would be to set a unique key on the table:

ALTER TABLE thetable ADD UNIQUE INDEX(pageid, name);

Then you have to decide what you want to do when there's a duplicate. Should you:

  1. ignore it?

    INSERT IGNORE INTO thetable (pageid, name) VALUES (1, "foo"), (1, "foo");
    
  2. Overwrite the previously entered record?

    INSERT INTO thetable (pageid, name, somefield)
    VALUES (1, "foo", "first")
    ON DUPLICATE KEY UPDATE (somefield = 'first')
    
    INSERT INTO thetable (pageid, name, somefield)
    VALUES (1, "foo", "second")
    ON DUPLICATE KEY UPDATE (somefield = 'second')
    
  3. Update some counter?

    INSERT INTO thetable (pageid, name)
    VALUES (1, "foo"), (1, "foo")
    ON DUPLICATE KEY UPDATE (pagecount = pagecount + 1)
    
Friday, September 16, 2022
2

I would say just build it yourself. You can set it up like this:

$query = "INSERT INTO x (a,b,c) VALUES ";
foreach ($arr as $item) {
  $query .= "('".$item[0]."','".$item[1]."','".$item[2]."'),";
}
$query = rtrim($query,",");//remove the extra comma
//execute query

Don't forget to escape quotes if it's necessary.

Also, be careful that there's not too much data being sent at once. You may have to execute it in chunks instead of all at once.

Saturday, November 5, 2022
2

Yes, your solution with a many-to-many relationship between tables sounds good for your case. You can then easily JOIN the tables to get information out of the IDs.

Tuesday, August 23, 2022
 
2

The function you're looking for is find_in_set:

 select * from ... where find_in_set($word, pets)

for multi-word queries you'll need to test each word and AND (or OR) the tests:

  where find_in_set($word1, pets) AND find_in_set($word2, pets) etc 
Wednesday, August 17, 2022
1

Every system I know of that stores large numbers of big files stores them externally to the database. You store all of the queryable data for the file (title, artist, length, etc) in the database, along with a partial path to the file. When it's time to retrieve the file, you extract the file's path, prepend some file root (or URL) to it, and return that.

So, you'd have a "location" column, with a partial path in it, like "a/b/c/1000", which you then map to: "http://myserver/files/a/b/c/1000.mp3"

Make sure that you have an easy way to point the media database at a different server/directory, in case you need that for data recovery. Also, you might need a routine that re-syncs the database with the contents of the file archive.

Also, if you're going to have thousands of media files, don't store them all in one giant directory - that's a performance bottleneck on some file systems. Instead,break them up into multiple balanced sub-trees.

Friday, November 18, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :