Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Image

Lies, Damned Lies and Cat Statistics 175

spopepro writes "While un-captioned cats might be of limited interest to the /. community, I found this column on how a fabricated statistic takes on a life of its own interesting. Starting with the Humane Society of the United States' (HSUS) claim that the unsterilized offspring of a cat will '...result in 420,000 cats in 5 years,' the author looks at other erroneous numbers, where they came from and why they won't go away."

Comment Re:Another vote for NoSQL and some experience (Score 1) 235

I have seen these kinds of situations happen a lot (I'm a statistician who works on computationally-intensive physical science applications), and the best solution I have seen was a BerkeleyDB setup. One group I work with had a very, very large number of ASCII data files (order of 10-100 million) in a directory tree. One of their researchers consolidated them to a BerkeleyDB, which greatly improved data management and access [...] I think the general idea of a key-value store that lets you keep your data in the original structure would work well.

A file system *is* a key-value store.

I suspect those 100,000,000 files were in fact tiny pieces of data which didn't make sense to access using normal tools (from ls to MS Word). That the conversion worked out for *you* doesn't mean that it would be useful to convert *every* set of files into a BerkeleyDB. Especially not sets of (say) 500 files, 10GB each.

I completely agree. If you have a lot of small datasets that break ls and such (as was the case in my situation), BerkeleyDB provided a great solution. If you have a smaller set of very large files, a different solution is needed (perhaps just the file system with some kind of automated indexing).

Comment Another vote for NoSQL and some experience (Score 2, Informative) 235

I have seen these kinds of situations happen a lot (I'm a statistician who works on computationally-intensive physical science applications), and the best solution I have seen was a BerkeleyDB setup. One group I work with had a very, very large number of ASCII data files (order of 10-100 million) in a directory tree. One of their researchers consolidated them to a BerkeleyDB, which greatly improved data management and access. CouchDB or the like could also work, but I think the general idea of a key-value store that lets you keep your data in the original structure would work well.

Slashdot Top Deals

The world will end in 5 minutes. Please log out.

Working...