22
votes

I'm looking to for a service that is similar to Amazon S3, a simple service to store and retrieve arbitrary data (and meta-data), but one that runs locally in your own data center. Strictly speaking, I'm not sure whether you would call this a CDN or a lightweight CMS.

It must be horizontally scalable (both for storage and bandwidth) and fault tolerable. It must also support REST, preferably WS too, with a pluggable authentication and authorization system. Something built with Java EE would be preferable for more convenient integration and extensibility, but this is just a personal preference, and it not a requirement.

Suggestions?

4
If it's inside your data centre, why not just use some sort of SAN? - Toby Hede
The SAN is just the storage part (isn't it?). I'm looking for the storage and the APIs to go along with it: to store and retrieve arbitrary data and meta-data, security, etc. - jnorris

4 Answers

17
votes

Here are a few open source solutions I have come across that deserve further research:

  1. Apache Sling (JCR based CMS (JSR170, JSR283), RESTful interface).
  2. Apache Hadoop (Java based distributed data-store, map reduce functionality).
  3. HBase (built on top of Hadoop, provding Google Bigtable-like capabilities).
  4. CouchDB (Erlang based key/value DB with Map/Reduce functionality, RESTful interface).
  5. Dynomite (Erlang based, Amazon dynamo clone).
  6. Voldemort (Distributed key-value storage system).
  7. Cassandra (highly scalable, eventually consistent, distributed, structured key-value store).
  8. MongoDB (highly scalable, JSON document based storage).
6
votes

Walrus project (mostly s3 api compatible) . . .

http://open.eucalyptus.com/wiki/EucalyptusStorage_v1.4

2
votes
-6
votes

In addition to Park Place, the only other big player against S3 right now is Nirvanix. Nirvanix