A few words about this blog

I haven’t tried this before, to blog and I didn’t thought I ever would. But I have found myself getting interesting discussions and information from other blogs, so I thought I should give it a try.

About the name jordogskog it is Norwegian. It is three words, jord og skog. The direct translation to english would be soil and forest, but is used as agriculture and forestry. My education is forestry my daily work is related to forestry. I have a great interest in how we use our land areas and in the larger view, our earth. To me, the step from that to GIS and open source is very small, so that is what I use a lot of my spare time for.

ST_Distance, the faster edition or Birgers Boost

When I was working on the new functions described in previous post I found that the distance calculation in general is very heavy and slow. The distance function gets two geometries to find the shortest distance in between. The approach has been to calculate the distance between all possible combinations of vertex-vertex and vertex-edge between the two geometries. That means that two geometries with 1000 vertexes each causes one million iterations and even if computers are fast, that takes some time.

The ideas how to make it faster came to me by the time of the birth of my son. I guess you get some extra boost from something like that. I was home from job for 10 days to help my wife and son, and I did, I promise 🙂 But I also had time to try some ideas of getting distance calculations faster. Because of this I call  it Birgers Boost from my son Birger.

The idea was to find a way to not do this distance calculation between all and every vertexes. I thought that at least the ones behind the middle of the geometry must be possible to avoid. I imagined like a wall that I projected against the geometries and then I could sort the vertexes as they appear on the other side of the wall as I move it through the geometry. I guess it maybe doesn’t make sense but I thought it was a little fun to describe how the idea appeared. The resulting algorithm uses a line from the middle of the first geometry to the middle of the second geometry. Then it orders the vertexes along that line and calculates the distances in the order of how close they are along that line. The big difference from the old function is that the preparation here, giving the vertexes a value along this line only happens once per vertex. So in the example of 1000 vertexes per geometry it takes only 2000 calculations to get those values. Then, when the vertexes is ordered we can do the distance calculations in the right order. And when the distance between those abstract walls that I imagined is bigger than the smallest found distance, then we know that the shortest distance is found. How many distances we have to calculate before we know this will vary depending on how the geometries is related to each other.

From the testing we have done it seems like it in general gives a quite good increase in speed. For larger geometries it is between 10 and 100 times faster than the old algorithm. In some special cases it is not that fast and in some cases it is even faster.

This way of doing it will not work if the geometries overlap. The easiest way to be sure they don’t overlap is to check for overlapping bounding boxes. So, if there is overlapping bounding boxes the calculation is sent to the old hard way of doing it. The same is the situation if one of the geometries is a point because then there is no gain to get. Then it is done the same way as before

This is a problem but hopefully this will be solved. Paul Ramsey have come up with ideas that might make my way of doing it short lived, see his blog:
http://blog.cleverelephant.ca/2009/11/is-good-enough-good-enough.html
He is mostly discussing his new geography functions but probably it will be a good way of doing it for geometry too. So in PostGIS 2.0 the development will continue 🙂

Those distance calculations enhancements might be quite important because it makes it possible to calculate directly with the geometries in nearest neighbor calculations and thing like that instead of using the centroids. Using points will still be faster bu sometimes it may be useful to be able to run on the whole geometry and before it was often more or less impossible because of too heavy calculations.

This will be in PostGIS 1.5. A Beta release will hopefully be out soon. For windows there is experimental builds already available here:
http://postgis.org/download/windows/experimental.php
And of course the source code is available to compile for other platforms.

I have wrote some lines in the wiki too, to describe this
http://trac.osgeo.org/postgis/wiki/NewDistCalcGeom2Geom

Shortest line and other new functionality in PostGIS 1.5

One and a half year ago I found PostGIS. I did fast become a fan. Handling spatial data with sql is a wonderful way of doing it. PostGIS also have a great amount of functionality and if something is missing no one will be stopped from creating that functionality. When I realized that I understood that I no longer could complain about a functionality I have missed in other GIS systems. I have done some avenue scripting in Arcview 3.x and solved a lot of tasks that way. But I have missed an easy way to get the information about between which points the distance-function gets that min distance.

Let’s say you are working with linestrings of rivers and you want to know how close a linestring that represents a road is to that river. Ok, the distance-function tells you that the minimum distance is 20 meters. Great, but the next question will be, where. Where is the road only 20 meters away from the river. In a couple of times I have wanted that information and I have always imagined that the information have to be somewhere in there, in the function. To find the minimum distance you first have to identify where to measure, was my thought. That was partly right I found.

That’s the great thing about open source, if you are wondering how it is done the code is there to read. Since I have never studied C before I didn’t have very high expectations of understanding anything. But from quite good commenting and clean structure I successes to put this together
http://www.jordogskog.no/distance.html
The minimum distance between to geometries have to be between two vertexes or between one vertex and one edge. The distance calculation iterated through the vertexes and edges defining the inputted geometries comparing their relations one by one. How to find the distance between two vertexes is just done with the Pythagorean theorem. Little bit worse is it to calculate the distance between one vertex and an edge. Search for “How do I find the distance from a point to a line?” in this link
http://www.faqs.org/faqs/graphics/algorithms-faq/
There is a description how to get the distance from the line to the point. That is the way it was done before. But there is also a description how to identify the point on the edge (line) from where the shortest distance is found. Time for copy and paste. When the overall shortest distance is found the points defining that distance is returned to the user as a line. I found a line being the best way of returning the information because than the user can get both first and last point from that and the distance from the length of the line. The use of this functionality will probably, as described in the beginning be to identify where the minimum distance is found. Let’s say you are sitting on an big Island with your laptop and asking yourself from where you should swim to get the shortest way to shore. Now that problem is solved. For convenience the first point of ST_Shortestline can also be found with function ST_Closestpoint.

From this rewriting a also successes to get maximum distance calculation working, ST_Maxdistance. Then it was natural to also add longest line function which relates to ST_Maxdistance as ST_Shortestline relates to ST_Distance.
To make the symmetry complete I also added ST_DFullywithin. That function returns true if the maxdistance between two geometries is smaller or the same as the inputted last parameter. Just like ST_DWithin but with maximum distance instead of minimum distance.

So as summary
the old functions working with minimum distance, ST_Distance and ST_DWithin has now got a new friend ST_Shortestline and there is also the corresponding functions for max distance, ST_Maxdistance, ST_Longestline and ST_DFullywithin.

I will get back soon and tell about how I found the maybe fastest distance calculation, included in 1.5