A Larger Pool of Data Isn't Necessarily a Better One...
Statisticians often seek a mountain of data to gain as much credibility as possible for their results. One often hears of medical studies conducted with thousands of people. But the numbers aren't everything. And neither is statistics. Or as that famous phrase goes: "There are three kinds of lies: lies, damned lies, and statistics."
If you understand how to manipulate the data and realize the shortcomings of statistics, you can almost make it say whatever you want to back up your argument. For example, many doomsayers often say that the data shows that earthquakes have been increasing alarmingly over the last century and it is all leading to super huge deadly quakes and we are all doomed to eternal damnation, etc. etc. But, the data they present into their simple minded analysis is as much data as is available, including quakes that aren't even humanly perceptible or quakes that until recently weren't even recorded and preserved. What happens is an artificial inflation of the data over time which makes it appear as though there is an actual increase in activity when in fact it is really only an increase in the detail being observed over time.
Similarly if you want to make a subject appear to have no discernable effect so as to dismiss it, you drown it in a mass of data that is too wide in scope to get a focused result. Many times, if you want to test the credibility of a statistical study, all you have to do is see who paid for the analysis and determine if they had anything to gain or a conflict of interest.
Looking at the subject of UFOs, one can argue that there is an increased presence of aliens since monthly reports have more than tripled in the past fifteen years. Again, like the earthquakes, there are better or more complete data collection methods than before. Also, there may not be enough man power to weed out weak reports from the rest and as the stigma of reporting them goes down, reports go up. The founding researchers in the field of Ufology were aware of the problem, not being able to see any significance in the data since there was too much noise to be able to discern any signal that it might contain. They started labeling reports with a strangeness and credibility rating.
The reports with the highest strangeness and credibility rose to the top so that more time and effort could be assigned them for analysis. The data was like a pyramid with the best sightings at capstone level. That data was also in the minority, comprising no more than 5% of all the information available. If any significance could be found, it would be in that gold mine of data.
Similarly, if any link of earthquakes to an external trigger or activity that parallels them can be found, one best seeks out the most significant data. Destructive earthquakes that take a large number of human lives are usually no smaller than 6.2 in magnitude, but most are 6.7 or higher. That's why I usually look at data from that level and up, amounting to almost 28 quakes a year on average. If one were to look at earthquakes from the 5.0 magnitude and higher range, one would be dealing with 1,586 quakes annually on average (or 4.5 per day). One would also lose the signal from all the background noise of randomness and to aftershocks of the largest earthquakes. As an example see the monthly breakdown for 2011 of quakes from 5.5 to 6.6 magnitude (shown above). Nothing more significant than a flood of aftershocks to the huge Japanese earthquake and tsunami were responsible for the spike in the month of March.
When I look at quakes of 6.7 magnitude or higher I am interpreting 1.75% of the total of all earthquakes from 5.0 or larger. But the greatest significance to a parallel of earthquakes to sunspot numbers occurs when I look at temblors of 7.5 magnitude or higher which translates to only 4.5 earthquakes per year annually (on average) or 0.285% of the 5.0 magnitude or larger data set (see the graphic below).
Here is a breakdown of the number of worldwide earthquakes one can expect annually on average (based on data from USGS/NEIC for the years 1973-2011):
Magnitude Earthquakes per Year
4.5-9.9 ttl= 4,657.90
© 2012 Joseph Ritrovato