Post
Mining for Science
25 June 2012
Tim Shaw has written about an event that occurred 834 years ago, which some have proposed was a meteor collision with the moon. The event was observed by Gervase of Canterbury, who then collected the observations of five monks who also witnessed the event. What’s interesting about this is how Gervase followed an early form of scientific research. When you observe an unusual event, one you can’t simply repeat in the lab, your best alternative is to try gathering more data of the observed event.
This is a particularly common method in astronomy and astrophysics. Usually the most interesting events are those that can happen at random. Things like supernova, meteor collisions and gamma ray bursts aren’t things you can plan for. (Okay, sometimes you can plan for meteor collisions) Instead you have to observe wide patches of the sky and hope that one of these events happens when you are looking.
One of the side effects of this approach is that when you collect massive amounts of data you sometimes observe things without knowing it. For example, the Kepler probe is discovering extra-solar planets by watching them eclipse their star. It does this by watching a patch of sky over long periods, which makes for a massive amount of data. Pulling out the right data is done by a process called data mining, where you take all that data and filter it through certain constraints to find what you want. It’s basically an astrophysical version of mining for gold.
Of course all that data means there are probably other valuable gems you can find. For example, you can also mine the Kepler data to look at the light variation of the stars themselves, which can tell you about things like starspots. Starspots are sunspots that occur on other stars, and they tell us about a stars activity level, magnetic field strength, and changes over time.
So while Gervase of Canterbury didn’t observe a meteor collision, he did follow an approach that we still use to today. When you see something interesting, gather more data.