Steward uses both a directory tree for storage of the binaries and a database for their meta-data.
Steward assigns unique identifiers to objects that are uploaded into its store based on the contents of the object. While the file is being uploaded it calculates a one-way SHA-1 hash of the contents. This hash is then encoded base32 to give a Identifier that is exactly 32bytes long and only consists of numbers and uppercase letters. The probability that two different files will have the same identifier is exceedingly low (for a 50% chance of having one collision you need 2^80 or 1.2*10^24 objects in a store.)
Using this identifier has a few interesting properties:
In principal the files are stored in a directory with sub-directories. They are stored with the name of the Id Steward assigns to them according to their name. This directory is called “the store”.
Next to the directories where objects are stored storage, there is also a directory where files are stored while they are being uploaded. These file names are random. Once the entire object has been received, the final document Id will be calculated and the document will be moved to it’s final location with file being renamed to this identifier. Lastly the meta-data will be added to the database, making the object available in Steward.
The metadata about the files is stored in the database.
This database can be any of the database systems that sqlalchemy supports. So you can scale from a maintenence free Sqlite to a full blown PostgreSQL, MySQL or MSSQL (see http://www.sqlalchemy.org/docs/04/dbengine.html#dbengine_supported for the entire list)
This is the main table containing all the files known in the Steward instance.
| Name | Type | Observations |
|---|---|---|
| sha (PK) | varchar(32) | Identifier of the object in the store. |
| length | Integer | Length in bytes of the object. |
| mimetype | text(64) | Mimetype of the object. |
| date_stored | timestamp | Moment that the object was made available in the store. |
| status | varchar(7) | Current status of the object. Can be ‘active’, ‘deleted’, ‘purged’ |
The statuses have the following means:
Each access or change in the steward database is registered this is done in a simple table called ‘events’.
| Name | Type | Observations |
|---|---|---|
| id (PK) | integer | Internal number, never used outside of the database. |
| date | timestamp | Moment of the event |
| register_id | varchar(32) | The object the alias points to. (FK with registery.sha) |
| action | varchar(6) | Kind of action. This can be one of ‘get’, ‘save’, ‘check’, ‘delete’, ‘purge’. |
| user | varchar(32) | The user that performed the action (if authenticated) |
Sometimes it is desirable to associate a easy to remember name to file. This can be done via aliases. Here is the detail of the table that maintains the aliases.
| Name | Type | Observations |
|---|---|---|
| id (PK) | integer | Internal number, never used outside of the database. |
| name | varchar(255) | The alias itself. |
| register_id | varchar(32) | The object the alias points to. (FK with registery.sha) |
Steward tries to detect the version of the database with each start up of the server. It does this by checking the stewardversion table. If it detects a version diferent from what it expects, it will fail.
| Name | Type | Observations |
|---|---|---|
| id (PK) | integer | Internal number, never used outside of the database. |
| schema | integer | The number of the schema. Higher is later. |
| version | varchar(255) | The version of the software that installed that version of the schema. |