Google’s Palimpsest project: Open-Source Science Data

June 13th, 2008 by jose

Google will host large scientific datasets at . That is, of you have a dataset that is requested constantly, now you can ‘open-source’ it and let google take the server load. Wired has this covered.

For those not seeing the point in having open, portable data, this presentation (Making Massive Datasets Universally Accessible and Useful) is a good an explanation.

How do you ship a large dataset to google? Well, they send you hard drives in a suitcase!:

(Google people) are providing a 3TB drive array (Linux RAID5). The array is provided in “suitcase” and shipped to anyone who wants to send they data to Google. Anyone interested gives Google the file tree, and they SLURP the data off the drive. I believe they can extend this to a larger array (my memory says 20TB).

  The Quantitative Peace Says:

  Victor Says:

    It’s a great idea, yet – to be perfectly honest – I probably wouldn’t have open-sourced the data I gathered during my Ph.D. until I had “secured” the publications I needed for my thesis. Moreover, I’ve talked to many researchers who are reluctant to share the data underlying their publications – perhaps fearing that it could be used to show calculation errors in their work. I know it’s not how academia *should* work, but it’s been my experience.

    I wonder whether this habit of secrecy depends on my academic discipline (consumer research/psychology), and whether other disciplines are more open about their data?

  George Says:

    I cant wait to see all the new developments from google. The new browser and operating system.

