Viewed   96 times

I need to implement a Geo proximity search in my application but I'm very confused regarding the correct formula to use. After some searches in the Web and in StackOverflow I found that the solutions are:

  1. Use the Haversine Formula
  2. Use the Great-Circle Distance Formula
  3. Use a Spatial Search Engine in the Database

Option #3 is really not an option for me ATM. Now I'm a little confused since I always though that the Great-Circle Distance Formula and Haversine Formula were synonymous but apparently I was wrong?

The above screen shot was taken from the awesome Geo (proximity) Search with MySQL paper, and uses the following functions:

ASIN, SQRT, POWER, SIN, PI, COS

I've also seen variations from the same formula (Spherical Law of Cosines), like this one:

(3956 * ACOS(COS(RADIANS(o_lat)) * COS(RADIANS(d_lat)) * COS(RADIANS(d_lon) - RADIANS(o_lon)) + SIN(RADIANS(o_lat)) * SIN(RADIANS(d_lat))))

That uses the following functions:

ACOS, COS, RADIANS, SIN

I am not a math expert, but are these formulas the same? I've come across some more variations, and formulas (such as the Spherical Law of Cosines and the Vincenty's formulae - which seems to be the most accurate) and that makes me even more confused...

I need to choose a good general purpose formula to implement in PHP / MySQL. Can anyone explain me the differences between the formulas I mentioned above?

  • Which one is the fastest to compute?
  • Which one provides the most accurate results?
  • Which one is the best in terms of speed / accuracy of results?

I appreciate your insight on these questions.


Based on theonlytheory answer I tested the following Great-Circle Distance Formulas:

  • Vincenty Formula
  • Haversine Formula
  • Spherical Law of Cosines

The Vincenty Formula is dead slow, however it's pretty accurate (down to 0.5 mm).

The Haversine Formula is way faster than the Vincenty Formula, I was able to run 1 million calculations in about 6 seconds which is pretty much acceptable for my needs.

The Spherical Law of Cosines Formula revealed to be almost twice as fast as the Haversine Formula, and the precision difference is neglectfulness for most usage cases.


Here are some test locations:

  • Google HQ (37.422045, -122.084347)
  • San Francisco, CA (37.77493, -122.419416)
  • Eiffel Tower, France (48.8582, 2.294407)
  • Opera House, Sydney (-33.856553, 151.214696)

Google HQ - San Francisco, CA:

  • Vincenty Formula: 49 087.066 meters
  • Haversine Formula: 49 103.006 meters
  • Spherical Law of Cosines: 49 103.006 meters

Google HQ - Eiffel Tower, France:

  • Vincenty Formula: 8 989 724.399 meters
  • Haversine Formula: 8 967 042.917 meters
  • Spherical Law of Cosines: 8 967 042.917 meters

Google HQ - Opera House, Sydney:

  • Vincenty Formula: 11 939 773.640 meters
  • Haversine Formula: 11 952 717.240 meters
  • Spherical Law of Cosines: 11 952 717.240 meters

As you can see there is no noticeable difference between the Haversine Formula and the Spherical Law of Cosines, however both have distance offsets as high as 22 kilometers compared to the Vincenty Formula because it uses an ellipsoidal approximation of the earth instead of a spherical one.

 Answers

5

The Law of Cosines and the Haversine Formula will give identical results assuming a machine with infinite precision. The Haversine formula is more robust to floating point errors. However, today's machines have double precision of the order of 15 significant figures, and the law of cosines may work just fine for you. Both these formulas assume spherical earth, whereas Vicenty's iterative solution (most accurate) assumes ellipsoidal earth (in reality the earth is not even an ellipsoid - it is a geoid). Some references: http://www.movable-type.co.uk/scripts/gis-faq-5.1.html

It gets better: note the latitude to be used in the law of cosines as well as the Haversine is the geocentric latitude, which is different from geodetic latitude. For a sphere, these two are the same.

Which one is fastest to compute?

In order from fastest to slowest are: law of cosines (5 trig. calls) -> haversine (involves sqrt) -> Vicenty (have to solve this iteratively in a for loop)

Which one is most accurate?

Vicenty.

Which one is best when speed and accuracy are both considered?

If your problem domain is such that for the distances you are trying to calculate, the earth can be considered as flat, then you can work out (I am not going to give details) a formula of the form x = kx * difference in longitude, y = ky * difference in latitude. Then distance = sqrt(dxdx + dydy). If your problem domain is such that it can be solved with distance squared, then you won't have to take sqrt, and this formula will be as fast as you get possibly get. It has the added advantage that you can calculate the vector distance - x is distance in east direction, and y is distance in the north direction. Otherwise, experiment with the 3 and choose what works best in your situation.

Wednesday, August 3, 2022
5

Thinking a little laterally I've come up with a 'sort of' solution to the problem of the missing markers. The two equations I posted originally gave the correct results but each missed out either markers close to the target or on the edges of the search radius

It's not very elegant but I figured that running both equations and producing 2 arrays which I then combined (removing any duplicates) would give me all the markers I'm looking for. This does work (obviously a performance hit but it's not a high traffic application) so I'll work with this for the time being but I'm still after a more practical solution if anyone has one!

Friday, August 26, 2022
 
1

You should search for the Haversine formula, but a good start could be:

  • Creating a Store Locator with PHP, MySQL & Google Maps - See Section 'Finding Locations with MySQL'
  • Geo/Spatial Search with MySQL

Citing from the first url:

Here's the SQL statement that will find the closest 20 locations that are within a radius of 25 miles to the 37, -122 coordinate. It calculates the distance based on the latitude/longitude of that row and the target latitude/longitude, and then asks for only rows where the distance value is less than 25, orders the whole query by distance, and limits it to 20 results. To search by kilometers instead of miles, replace 3959 with 6371.

SELECT
    id,
    ( 3959
      * acos( cos( radians(37) )
              * cos(  radians( lat )   )
              * cos(  radians( lng ) - radians(-122) )
            + sin( radians(37) )
              * sin( radians( lat ) )
            )
    )
    AS distance
FROM markers
HAVING distance < 25
ORDER BY distance
LIMIT 0 , 20;
Thursday, November 24, 2022
 
5

Calculating the distance using that function there is pretty computationally expensive, because it involves a whole bunch of transcendental functions. This is going to be problematic when you have a large number of rows to filter on.

Here's an alternative, an approximation that's way less computationally expensive:

Approximate distance in miles:

sqrt(x * x + y * y)

where x = 69.1 * (lat2 - lat1) 
and y = 53.0 * (lon2 - lon1) 

You can improve the accuracy of this approximate distance calculation by adding the cosine math function:

Improved approximate distance in miles:

sqrt(x * x + y * y)

where x = 69.1 * (lat2 - lat1) 
and y = 69.1 * (lon2 - lon1) * cos(lat1/57.3) 

Source: http://www.meridianworlddata.com/Distance-Calculation.asp


I ran a bunch of tests with randomly generated datasets.

  • The difference in accuracy for the 3 algorithms is minimal, especially at short distances
  • The slowest algorithm is, of course, the one with the trig functions (the one on your question). It is 4x slower than the other two.

Definitely not worth it. Just go with an approximation.
Code is here: http://pastebin.org/424186


To use this on MySQL, create a stored procedure that takes coordinate arguments and returns the distance, then you can do something like:

SELECT columns 
  FROM table 
 WHERE DISTANCE(col_x, col_y, target_x, target_y) < 25
Monday, October 3, 2022
 
leftbit
 
2

sure, why not:

=ARRAYFORMULA(COUNTA(IFERROR(QUERY(UNIQUE(TRANSPOSE(SPLIT(CONCATENATE("×"&
 SPLIT(REPT(INDIRECT("B1:B"&COUNTA(B1:B))&"×", 
 NETWORKDAYS(INDIRECT("B1:B"&COUNTA(B1:B)), INDIRECT("C1:C"&COUNTA(B1:B)))), "×")+
 TRANSPOSE(ROW(INDIRECT("A1:A"&MAX(NETWORKDAYS(B1:B, C1:C))))-1)), "×"))), 
 "where Col1>4000", 0))))

Tuesday, September 6, 2022
 
snite
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :