
I have seen these kinds of situations happen a lot (I'm a statistician who works on computationally-intensive physical science applications), and the best solution I have seen was a BerkeleyDB setup. One group I work with had a very, very large number of ASCII data files (order of 10-100 million) in a directory tree. One of their researchers consolidated them to a BerkeleyDB, which greatly improved data management and access [...] I think the general idea of a key-value store that lets you keep your data in the original structure would work well.
A file system *is* a key-value store.
I suspect those 100,000,000 files were in fact tiny pieces of data which didn't make sense to access using normal tools (from ls to MS Word). That the conversion worked out for *you* doesn't mean that it would be useful to convert *every* set of files into a BerkeleyDB. Especially not sets of (say) 500 files, 10GB each.
I completely agree. If you have a lot of small datasets that break ls and such (as was the case in my situation), BerkeleyDB provided a great solution. If you have a smaller set of very large files, a different solution is needed (perhaps just the file system with some kind of automated indexing).
The world will end in 5 minutes. Please log out.