Viewed   68 times

I have to query a database of thousands of entries and order this by the distance from a specified point.

The issue is that each entry has a latitude and longitude and I would need to retrieve each entry to calculate its distance. With a large database, I don't want to retrieve each row, this may take some time.

Is there any way to build this into the mysql query so that I only need to retrieve the nearest 15 entries.

E.g.

`SELECT events.id, caclDistance($latlng, events.location) AS distance FROM events ORDER BY distance LIMIT 0,15`

    function caclDistance($old, $new){
       //Calculates the distance between $old and $new
    }

 Answers

4

Option 1: Do the calculation on the database by switching to a database that supports GeoIP.

Option 2: Do the calculation on the databaseusing a stored procedure like this:

CREATE FUNCTION calcDistance (latA double, lonA double, latB double, LonB double)
    RETURNS double DETERMINISTIC
BEGIN
    SET @RlatA = radians(latA);
    SET @RlonA = radians(lonA);
    SET @RlatB = radians(latB);
    SET @RlonB = radians(LonB);
    SET @deltaLat = @RlatA - @RlatB;
    SET @deltaLon = @RlonA - @RlonB;
    SET @d = SIN(@deltaLat/2) * SIN(@deltaLat/2) +
    COS(@RlatA) * COS(@RlatB) * SIN(@deltaLon/2)*SIN(@deltaLon/2);
    RETURN 2 * ASIN(SQRT(@d)) * 6371.01;
END//

If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP ($minLat, $maxLat, $minLong and $maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN $minLat AND $maxLat AND longitude BETWEEN $minLong AND $maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.

If you're simply using a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.

Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.

If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.

Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.

Within your PHP (guess you're running PHP from the $ variable name), we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.

There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.

EDIT The value 6371.01 in the calcDistance stored procedure is the multiplier to give you a returned result in kilometers. Use appropriate alternative multipliers if you want to result in miles, nautical miles, meters, whatever

Wednesday, September 28, 2022
3

You could use something like

$iDistance = 20;
$iRadius = 6371; // earth radius in km
$iRadius = 3958; // earth radius in miles
$fLat = x.y; // Your position latitude
$fLon = x.y; // Your position longitude

$strQuery = "
SELECT 
  *, 
  $iRadius * 2 * ASIN(SQRT(POWER(SIN(( $fLat - abs(pos.lat)) * pi() / 180 / 2),2) +
COS( $fLat * pi()/180) * COS(abs(pos.lat) * pi() / 180) * POWER(SIN(( $fLon - pos.lon) *
pi() / 180 / 2), 2) )) AS distance
FROM user_zip_codes pos
HAVING distance < $iDistance 
ORDER BY distance";

where you have to fetch your lat/lon value before using the SQL. This works for me

Monday, September 5, 2022
 
kat
 
kat
3

Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:

  1. List the tables involved in a comma separated list in the FROM clause
  2. Write the association between the tables in the WHERE clause

    SELECT *
      FROM TABLE_A a,
           TABLE_B b
     WHERE a.id = b.id
    

Here's the query re-written using ANSI-92 JOIN syntax:

SELECT *
  FROM TABLE_A a
  JOIN TABLE_B b ON b.id = a.id

From a Performance Perspective:


Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:

  • Ability to control JOIN order - the order which tables are scanned
  • Ability to apply filter criteria on a table prior to joining

From a Maintenance Perspective:


There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:

  • More readable, as the JOIN criteria is separate from the WHERE clause
  • Less likely to miss JOIN criteria
  • Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
  • WHERE clause only serves as filtration of the cartesian product of the tables joined

From a Design Perspective:


ANSI-92 JOIN syntax is pattern, not anti-pattern:

  • The purpose of the query is more obvious; the columns used by the application is clear
  • It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.

Conclusion


Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.

Monday, September 5, 2022
 
5

In MySQL Levenshtein and Damerau-Levenshtein UDF’s you have several implementations of this algorithm.

Wednesday, November 30, 2022
 
richp10
 
1

not sure but :

$R = 6371; // radius of Earth in KM

$lat = '46.98025235521883'; // lat of center point
$lon = '-110.390625'; // longitude of center point
$distance = 1000; // radius in KM of the circle drawn 
$rad = $distance / $R; // angular radius for query 
$query = '';

// rough cut to exclude results that aren't close
$radR = rad2deg($rad/$R);
$max_lat = $lat + radR;
$min_lat = $lat - radR;
$radR = rad2deg($rad/$R/cos(deg2rad($lat)));
$max_lon = $lon + radR;
$min_lon = $lon - radR;
// this part works just fine!
$query .= '(latitude > ' . $min_lat . ' AND latitude < ' . $max_lat . ')';
$query .= ' AND (longitude > ' . $min_lon . ' AND longitude < ' . $max_lon . ')';
// refining query -- this part returns no results
$query .= ' AND acos(sin('.deg2rad($lat).') * sin(radians(latitude)) + cos('.deg2rad($lat).') * cos(radians(latitude)) *
    cos(radians(longitude) - ('.deg2rad($lon).'))) <= '.$rad;
Friday, October 28, 2022
 
hali
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :