Comment Re:I fail to see how this could be useful. (Score 1) 114
I agree that by itself, the average link distance between two pages (19) isn't a very useful number. However, there is definitely useful information in the article itself:
1. We are given a real-world (probabilistic) distribution of link distances between pages (i.e. given two randomly chosen pages, what is the probability that the shortest/longest link distance between them is X?)
2. From the visualizations, we can see that the web is a graph containing a number of densely connected components which are themselves only fairly loosely connected to one another, and that this behavior is fairly scale-independent.
These two tidbits could lead to impressively improved Web crawlers. You could decide to stop following links once you've gone 25 deep, for example; you could try and determine on-the-fly if more than one of your crawler processes is working on the same densely connected component of the Web and combine their efforts (or move one of the processes over to a new uncharted component), thus effectively searching more of the web. Using similar statistics for distribution of in-link and out-link counts, you could improve crawler heuristics so that pages with a number of out-links significantly deviant from the mean are given more weight for future crawling.
Oh well, just some random thoughts.
1. We are given a real-world (probabilistic) distribution of link distances between pages (i.e. given two randomly chosen pages, what is the probability that the shortest/longest link distance between them is X?)
2. From the visualizations, we can see that the web is a graph containing a number of densely connected components which are themselves only fairly loosely connected to one another, and that this behavior is fairly scale-independent.
These two tidbits could lead to impressively improved Web crawlers. You could decide to stop following links once you've gone 25 deep, for example; you could try and determine on-the-fly if more than one of your crawler processes is working on the same densely connected component of the Web and combine their efforts (or move one of the processes over to a new uncharted component), thus effectively searching more of the web. Using similar statistics for distribution of in-link and out-link counts, you could improve crawler heuristics so that pages with a number of out-links significantly deviant from the mean are given more weight for future crawling.
Oh well, just some random thoughts.