Home | Download | Documentation PBMS | Documentation S3 Daemon | Contact | Forum

Documentation - PBMS v0.5.09

Contents

1. What is PBMS?

2. "Streaming-enabled" Engines

3. "Streaming-enabled" Tables
   3.1 BLOB reference columns

4. Building and Installing PBMS

5. Accessing the database using BLOB References
   5.1. A BLOB Repository Reference
   5.2. A BLOB Table Reference
   5.3. A BLOB Database Reference
   5.4. A BLOB Field Reference (Experimental)

6. Storing BLOB References in Tables

7. Retrieving BLOBs from Tables

8. Transactions

9. Backup and Recovery
   9.1. mysqldump method
   9.2. Engine level backup

10. PBMS HTTP Headers
   10.1. Metadata Headers
   10.2. BLOB Alias Header
   10.3. Reply Headers
   10.4. Control Headers

11. The BLOB Repository
   11.1. The Repository File Structure
   11.2. Temporary BLOB Timeout
   11.3 Automatic Compaction

12. The System Tables
   12.1 The pbms_repository Table
   12.2 The pbms_blob Table
   12.3 The pbms_metadata Table
   12.4. The pbms_reference Table
   12.5 The pbms_metadata_header Table
   12.6 The pbms_dump Table
   12.5 The pbms_variable Table

13. PBMS System Variables

14. MySQL Connector/J Streaming Enabled
   14.1. How It Works
   14.2 Using the JDBC Driver
   14.3 Limitations
   14.4. Building and Testing JDBC

15. PBMS C API
  15.1 Function Descriptions
  15.2 PBMS Connection Options
  15.3 Function Definitions

16. PHP Extension

Further reading
   README
   Release notes (ChangeLog)
   Installation (INSTALL)
   TO-DO List


1. What is PBMS?

PBMS stands for PrimeBase Media Streaming. The PBMS engine is a MySQL storage engine which enables the storage and streaming of BLOB data in and out of a MySQL database.

MySQL 5.1 provides a pluggable storage engine API which allows 3rd parties to extend the storage capabilities of the server. Storage engines are built as shared libraries and can be loaded at runtime by the server. Most storage engines provide an alternative way of storing the table data. By doing this they add functionality to the server in terms of transactional support, special indexing methods, data warehousing and archiving facilities or even clustering and high-availability capabilities.

To use a particular engine, you specify the name of the engine when creating a table in MySQL. For example:

CREATE TABLE my_test_tab (
  n_id           INTEGER PRIMARY KEY AUTO_INCREMENT,
  n_text         LONGBLOB,
  n_string       VARCHAR(255)
) ENGINE=PBXT;

This example creates a table using the PBXT storage engine.

The PBMS engine specializes in the storage and retrieval of BLOB data. Unlike other storage engines, it does not actually provide table storage. So you would never actually create tables of type PBMS.

Instead, PBMS works together with a standard storage engine, in order to add functionality to MySQL tables. The most basic additional functionality provided by PBMS is:


2. "Streaming-enabled" Engines

A streaming-enabled storage engine is a MySQL storage engine that supports the PBMS server-side streaming API. PBMS blob references stored in the tables of a streaming-enabled engine will automatically increment and decrement BLOB reference counters when BLOB references are inserted, updated, and deleted. Streaming-enabled engines will also automatically notify the PBMS engine when tables containing blob references are altered or dropped.

3. "Streaming-enabled" Tables

A streaming-enabled table is a MySQL table that belongs to a "Streaming-enabled" engine.

Streaming-enabled tables can store BLOB data in the PBMS BLOB repository where reference counts to the data are maintained so that the BLOB will automatically be deleted ounce the reference count drops to zero.

Currently, the only engine that is streaming-enabled is PBXT, version 0.9.88 later but due to simplification of the PBMS server-side streaming API other engines should soon become available. The MySQL server can also be patched directly so that all engines would be streaming-enabled.

The most recent version of both engines are provided on this web-site. You can download the source code and compile the engines yourself or you can use one of the binaries that are provided. However, binaries are only available for a very limited number of platforms.


3.1 BLOB Reference columns

A BLOB reference column is a column in a streaming-enabled table which contains PBMS BLOB references and notifies PBMS when a reference in the column is inserted, updated or deleted. For tables created by stream-enabled engines, longblob columns are assumed to be BLOB reference columns.

4. Building and Installing PBMS

Please refer to the INSTALL document in the source code distribution for details of how to build and install the PBMS plug-in.


5. Accessing the database using BLOB References

PBMS uses the concept of a BLOB reference. A BLOB reference is, in fact, a Uniform Resource Locator (URL), similar to the Web address of an internet page.

There are currently 4 types of BLOB references:

5.1. A BLOB Repository Reference

This type of URL identifies a BLOB in the PBMS BLOB repository, and has two forms:

http://host-name:pbms-port//~*database-id~repository-id

or

http://host-name:pbms-port//~*database-id_repository-id

For example: http://localhost:8080/~*8176~1-150-660294d3-0*123 or http://localhost:8080/~*8176_1-246-57d672ac-0*123

The prefix '~*' helps to identify a BLOB repository URL. This is used by BLOB enabled engines to identify BLOB URLs when inserted into the database. The prefix is also used by the JDBC driver and other client software to recognize BLOB URLs.

BLOB repository reference with a '~' character before the repository-id are references that are already associated with a database table. BLOB repository reference with a '_' character before the repository-id are references that are not associated with a particular table, and are the result of uploading a BLOB using a BLOB database reference (see below).

The  repository-id is a value generated by the PBMS BLOB repository. It has the following form: table_id-blob_id-access_code-server_id*size for references that are associated with a database table, and repo_id-repo_offset-access_code-server_id*size for references that are not associated with a database table.

The access code (access_code) is a unique 4-byte value generated by PBMS whenever a BLOB is uploaded into the BLOB repository. This code is required to retrieve the BLOB making it very unlikely for anyone to retrieve the BLOB without the correct URL.

The 'size' is the size of the actual BLOB in bytes.

5.2. A BLOB Table Reference

This URL identifies a table in the database, and has the following form:

http://host-name:pbms-port//database-name/table-name

BLOB table references are used to upload data to a particular table in the BLOB repository. For example:

$ curl -d "This is an example text" http://localhost:8080/test/my_test_tab

The result returned by this command will be a partial BLOB repository reference, of the form:

~*database-name/~repository-id

In order to retrieve the value from the repository, add the host name and PBMS port to the URL, for example:

$ curl "http://localhost:8080/~*1568~1-150-660294d3-0*23"

5.3. A BLOB Database Reference

This type of URL identifies a database repository, and has the following form:

http://host-name:pbms-port//database-name

BLOB database references are used to upload data to a BLOB repository without specifying a table. For example:

$ curl -d "This is an example text" http://localhost:8080/test

The result returned by this command will be a partial BLOB repository reference, of the form:

~*database-id_repository-id

In order to retrieve the value from the repository, add the host name and PBMS port to the URL, for example:

$ curl "http://localhost:8080/~*1678_1-246-57d672ac-0*123"

BLOB  table references are more efficient than BLOB database references and should be used when ever the targeted table name is known.

5.4. A BLOB Alias Reference

This URL identifies a particular BLOB in the database by it's alias:

http://host-name:pbms-port//database-name/alias

The alias is set by the user when the BLOB is uploaded to the PBMS engine by including the http header 'PBMS_BLOB_ALIAS' and setting it to the BLOB alias you want to use, for example:

$ curl -H "PBMS_BLOB_ALIAS:MyBLOB" -d "This is an example text" http://localhost:8080/test/my_test_tab

The BLOB can then be retrieved using a BLOB alias reference:

$ curl "http://localhost:8080/test/MyBLOB"

A BLOB alias is a special type of BLOB metadata and must be unique with in the database.

6. Storing BLOB References in Tables

As explained above, BLOB data can be uploaded to the BLOB repository using a BLOB database or table references. The upload is done using any standard HTTP client that supports the HTTP PUT or POST method or the client API mybs_put_data() function.

For example, using curl I can upload BLOB data as follows:

$ curl -d "This is an example text" http://localhost:8080/test

The result will be a partial BLOB repository reference, for example: ~*1786_1-246-57d672ac-0*23.

Now to store this BLOB in a database table, you just insert the BLOB reference. For example:

mysql> INSERT my_test_tab (n_text) VALUES ("~*1786_1-246-57d672ac-0*23");

When the streaming-enable table receives this insert, it will call the PBMS engine in order to "retain" the BLOB repository reference in it's BLOB reference columns. Retain basically means that PBMS increments the reference count for the BLOB. PBMS keeps a list of references to each BLOB so that it knows when a BLOB can be freed in order to reclaim resources.

Note that BLOB references should only be inserted into BLOB reference columns in streaming-enabled tables. Only in this case will PBMS be called to retain the BLOB

The following diagram illustrates the steps that are taken to insert a BLOB into the database.

In step (1) the client software uploads the BLOB to the repository using an HTTP PUT request method. On reply, the repository returns a BLOB repository reference in step (2). Step (3), the reference is placed in the row to be inserted in the database. Using a standard SQL INSERT, the client inserts the row into the database table in step (4). In step (5), the streaming enabled table recognizes that BLOB URL has been inserted and automatically calls the BLOB Streaming engine to retain the BLOB.


7. Retrieving BLOBs from Tables

Once a BLOB reference has been stored in a table, a standard SELECT will return the BLOB reference, not the actual BLOB data.

For example:

mysql> select n_id, n_text from my_test_tab where n_id=5;
+------+-----------------------------------+
| n_id | n_text                      |
+------+-----------------------------------+
|    5 | ~*test/~2-128-57d672ac-0*23 |
+------+-----------------------------------+

The BLOB reference can then be used to retrieve the BLOB data using the HTTP GET method, for example:

$ curl "http://localhost:8080/~*1786~2-128-57d672ac-0*23"
This is an example text

The following diagram illustrates the process of downloading a BLOB from the repository:

In step (1) the row containing a BLOB repository reference is retrieved from the database using a standard SQL SELECT statement. In step (2) the client software recognizes that a valid BLOB URL is in the database field. With this information it creates a complete URL and downloads the BLOB data from the BLOB repository in step (3), using a standard HTTP GET request method. In step (4) the BLOB data is streamed back to the client application.


8. Transactions

The PBMS engine supports transactions and transaction savepoints.

The PBMS system tables are not transactional though.

9. Backup and Recovery

There are 2 methods that you can use to backup an online BLOB repository. You can use mysqldump to do a traditional backup or you can use the PBMS engine level backup procedure. Using the engine level backup is highly recommended since it is far more efficient and allows you to backup the repository even while BLOBs are being added and removed from it.

9.1. mysqldump Method

To backup a repository with mysqldump you must include the PBMS system table pbms_dump in the tables to be dumped. The dump must also be done with the --hex-blob option. Other than that you need to use a locking method to ensure data consistency across tables.

When restoring a repository from a dump you need to set the PBMS system variable Restoring-Dump in the pbms_variable table to "True" before starting the restore operations and then set it back to "False" after the restore has completed. It is important that this variable is set to 'True' during the restore of the pbms_dump table as well as any table containing PBMS BLOB references.

Example:

$ mysqldump --hex-blob aDatabase > dumptest.sql

To restore using the mysql client:

mysql > use aDatabase;

mysql > update pbms_variable set value = "True" where name = "Restoring-Dump";

mysql > source dumptest.sql;

mysql > update pbms_variable set value = "False" where name = "Restoring-Dump";

9.2. Engine Level Backup

To perform an engine level backup you need to first set the Backup-Location variable in the pbms_variable table to the path where the repository backup is to be stored. This path must already exist. When the backup is started a 'pbms' folder will be created in it if it doesn't already exist and then the backed up repository will be created in there as <backupDir>/pbms/<databaseName-id>. It is an error to try and backup the same database repository to the same location twice with out removing the older one first.

To perform a backup do the following:

The PBMS engine level backup is started by setting the Do-Backup variable in the pbms_variable table to "True". Once started PBMS will create a snapshot image of the repository as it appeared at the start of the backup. Once the backup has started the table locks are no longer required. While the repository backup is in progress you can use the database as you like including adding and deleting BLOBs.

To restore the database all you need to do is to restore the database tables in the normal way and then copy/move the backed up repository from the <backups>/pbms folder into the <MySQL data Dir>/pbms folder. If the restore was done to a database of a different name then the 'databaseName' portion of the restored database repository must be changed to the new name but the 'id' portion must remain the same.

Example: $ mv <backups>/pbms/oldName-1234 <MySQL data Dir>/pbms/newName-1234

The restored database is now ready for use.

If the regular tables in the database were restored from a dump then the PBMS system variable Restoring-Dump in the pbms_variable table must be set to "True" during the restore operation the same way it was for the mysqldump restore method.

10. PBMS HTTP Headers

10.1. Metadata Headers

Metadata headers are headers that allow additional information to be stored in the repository with the BLOB. A metadata header can be a user defined or a standard HTTP header such as "Content-Type". The system table "pbms_metadata_headers" contains the metadata headers recognized by the PBMS engine, each database has it's own "pbms_metadata_headers" table. The user can insert into this table headers that they wish to be recognized as metadata headers. When the PBMS HTTP server receives a BLOB any HTTP headers in the 'HTTP Put' request that are also in the "pbms_metadata_headers" table will be stored with the BLOB as metadata.

On retrieval all metadata headers along with their values will be added to the HTTP reply.

Example, assume that the string "About_the_blob" exists in the "pbms_metadata_headers" table. The metadata header, along with the BLOB alias could be sent with the BLOB to the PBMS engine as:

$ curl -H "PBMS_BLOB_ALIAS:MyBLOB" -H "About_the_blob:An example" -d "This is an example text" http://localhost:8080/test

and when fetched back by:

$ curl -D - "http://localhost:8080/test/MyBLOB"

The result would be:

HTTP/1.1 200 OK
PBMS_CHECKSUM: C36B9D679BF48385082118FC61C7E709
PBMS_BLOB_SIZE: 23
PBMS_LAST_ACCESS: 1236370182
PBMS_ACCESS_COUNT: 0
PBMS_CREATION_TIME: 1236370182
PBMS_BLOB_ALIAS: MyBLOB
About_the_blob: An example
Content-Length: 23

This is an example text

10.2. BLOB Alias Header

The "PBMS_BLOB_ALIAS" header is a special metadata header that enables you to associate a string with the BLOB being uploaded which can then be used later to download it again. The alias can be any string up to 1024 bytes long that conforms to the rules for a valid URL and doesn't start with a '/' character.

10.3. Reply Headers

In addition to the metadata headers the following headers are also returned:

10.4. Control Headers

Control headers are headers which are sent to the PBMS engine and have special meaning. At this time there is only one control header "PBMS_RETURN_HEADER_ONLY" which signals the PBMS engine to return the header and the BLOB or the header only. Example:

$ curl -H "PBMS_RETURN_HEADER_ONLY:yes" -D - "http://localhost:8080/test/MyBLOB"

The result would be:

HTTP/1.1 200 OK
PBMS_CHECKSUM: C36B9D679BF48385082118FC61C7E709
PBMS_BLOB_SIZE: 23
PBMS_LAST_ACCESS: 1236370182
PBMS_ACCESS_COUNT: 0
PBMS_CREATION_TIME: 1236370182
PBMS_BLOB_ALIAS: MyBLOB
About_the_blob: An example
Content-Length: 0

In the future other control headers are expected to return general information about the repository or when sent with a BLOB upload to indicate how the BLOB is to be stored.

11. The BLOB Repository

The PBMS BLOB repository contains BLOBs uploaded using the HTTP PUT or GET method, for example:

$ curl -d "This is an example text" http://localhost:8080/test
~*1786_1-246-57d672ac-0*23

This operation returns a reference to the BLOB in the repository. The BLOB can be retrieved from the repository using this URL, and the HTTP GET method. For example:

$ curl "http://localhost:8080/~*1786_1-246-57d672ac-0*23"
This is an example text

11.1. The Repository File Structure

Each database has its own BLOB repository. This means that if a BLOB reference is inserted into a table that belongs to another database, the BLOB will be copied over to the repository of the database containing the table. The repository doesn't reside in the database folder itself but in a folder with in the pbms folder with the same name as the database folder. The pbms folder resides in the MySQL databases folder along side the databases. When doing a 'show databases' it will appear as a database although it is not infact a database.

A repository consists of a number of repository files. The repository files have names of the form: repo-file_id.bs, for example repo-241.bs. file_id is the ID of the repository file.

Each repository file can contain a number of BLOBs. The number of BLOBs in a repository file depends on the size of the BLOBs and the current setting of the "repository threshold". The repository threshold is determined by setting the pbms_repository_threshold system variable. This is a value in bytes unless you specify: Kb, MB or GB after the value (note: the parameter must be set as a string value).

BLOBs are written sequentially to the repository files. Once a file reaches the threshold, a new repository files is created.

PBMS writes multiple repository files simultaneously. Each BLOB that is being uploaded is written to a different repository file in parallel.

Repository files may be a maximum of 256 Terabytes in size, and the maximum number of repository files is 16777215.

This provides a theoretically maximum repository size of 4 Zettabytes (2^72 bytes). The maximum size of a single BLOB is also 256 Terabytes.

11.2. Temporary BLOB Timeout

Once a BLOB has been uploaded, PBMS sets a timer. When the timer expires, PBMS deletes the BLOB automatically, unless the BLOB has been retained in the meantime by the act of inserting it into a streaming-enabled table.

This also applies to BLOBs that are deleted or no longer referenced due to a database UPDATE.

The amount of time waited is determined by the pbms_temp_blob_timeout system parameter. This is a value in seconds. By default the value is set to 10 minutes.

The information about which BLOBs are to be deleted and when, is stored in the temporary log files. These files are stored in the bs-logs sub-directory of each database. The temporary log files have names of the form temp-log_id.bs. log_id is the temporary log file ID.

Once a log is full, a new one is created with the next highest ID. The maximum size of a temporary log file is determined by the pbms_temp_log_threshold system variable, which is a value in bytes, unless one of the following is specified: Kb, MB or GB after the value (note: the parameter must be set as a string value).

The maximum temporary log size is 4GB.

11.3 Automatic Compaction

When BLOBs are deleted from the repository files, the free space is not immediately re-used. Instead, the system records the amount of free space in the repository file. This is also know as the garbage level of the file, and is measured as a percentage of the size of the file.

When the garbage level of a repository file exceeds a certain threshold PBMS begins to compact the repository file. It does this by creating a new repository file and copying all valid data to the new file. After this is done, it deletes the old repository file.

The garbage threshold is determined by the pbms_garbage_threshold system variable. Set this variable to a value between 1 and 100. The default value is 50, which means that a repository file is compacted as soon as it contains 50% garbage or more.

Note that compaction may not begin immediately. This is due to the fact that some repository files may be locked. A repository file is locked when it is being used to receive uploaded BLOB data. In this case, the file will only be unlocked when it has reached the repository threshold (pbms_repository_threshold).


12. The System Tables

PBMS has seven system tables which provide a view of the BLOB repository. The tables are called pbms_repository , pbms_blob, pbms_metadata, pbms_reference, pbms_metadata_header, pbms_dump, and pbms_variable.

The pbms_repository, pbms_blob and pbms_metadata tables can be thought of as views into the PBMS BLOB repository files. A record in the pbms_repository table and it's related records in the pbms_blob and pbms_metadata constitute 1 physical block of data in a BLOB repository file. The pbms_reference table contains a record for each reference to a BLOB including references from temporary log files.

The system tables are created automatically the first time they are referenced but to clarify their structure a create statement will be given for each table.

12.1 The pbms_repository Table

CREATE TABLE pbms_repository (
Blob_alias CHAR(1024) COMMENT "User supplied BLOB identifier",
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Blob_size BIGINT COMMENT 'The size of the BLOB in bytes',
MD5_Checksum CHAR(32) COMMENT "The MD5 Digest of the BLOB data",
Head_size SMALLINT UNSIGNED COMMENT 'The size of the BLOB header - proceeds the BLOB data',
Access_code INT COMMENT 'The 4-byte authorisation code required to access the BLOB - part of the BLOB URL',
Creation_time TIMESTAMP COMMENT 'The time the BLOB was created',
Last_ref_time TIMESTAMP COMMENT 'The last time the BLOB was referenced',
Last_access_time TIMESTAMP COMMENT 'The last time the BLOB was accessed (read)',
Access_count INT COMMENT 'The count of the number of times the BLOB has been read'
) ENGINE=PBMS;

As mentioned above, this table contains one row for every BLOB in the repository. The BLOB is identified by the repository file ID (Repository_id column), and the offset into the file (Repo_blob_offset column). The Blob_alias, when supplied on upload, can also be used as a unique identifier of the BLOB with in the database. This table is read only.

Blob_size is the uncompressed size of the BLOB in bytes. Head_size is the size of the BLOB header which proceeds the BLOB data.

MD5_Checksum is the hex representation of the MD5 digest of the BLOB data.

The Access_code column contains the BLOBs unique 4-byte access code.

Creation_time is the time the BLOB was created. Last_ref_time is updated whenever the BLOB is referenced and Last_access_time is updated each time the BLOB is downloaded.

The Content_type field contains the HTTP content type of the data that was specified when the BLOB was uploaded. It is returned in the HTTP header when the BLOB is downloaded.

12.2. The pbms_blob Table

CREATE TABLE pbms_blob (
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Blob_data LONGBLOB COMMENT 'The data of the BLOB',
) ENGINE=PBMS;

This table contains the actual BLOB data. This is a read only table.

Repository_id and Repo_blob_offset identify the BLOB in the pbms_repository table.

The Blob_data contains the actual BLOB data of the BLOB. This mean that the data can be retrieved from this table using a normal SQL SELECT statement.

However, BLOBs in the PBMS repository may be larger the 4GB, which is the maximum size of a MySQL LONGBLOB value. Note that other limitations, such as computer memory and the MySQL maximum packet size will also affect the maximum size of BLOBs that can be returned by a SELECT statement.

12.3. The pbms_metadata Table

CREATE TABLE pbms_metadata (
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Name CHAR( 32) COMMENT 'The name of the metadata',
Value CHAR( 1024) COMMENT 'The value of the metadata',
) ENGINE=PBMS;

This table contains the metadata associated with a BLOB including the BLOB alias. This table is updatable.

Repository_id and Repo_blob_offset identify the BLOB in the pbms_repository table.

Name is the name of the metadata value and when combined with the Repository_id and Repo_blob_offset provide the primary key to this table.

You can add, modify, or delete metadata associated with a BLOB by performing INSERTS, UPDATES, or DELTES on this table. This includes assigning aliases to BLOBs.

12.4. The pbms_reference Table

CREATE TABLE pbms_reference (
Table_name CHAR(64) COMMENT 'The name of the referencing table',
Blob_id BIGINT COMMENT 'The BLOB reference number - part of the BLOB URL',
Column_ordinal INTCOMMENT 'The column position in the table starting at position 1',
Blob_url VARCHAR(200) COMMENT 'The BLOB URL for HTTP GET access',
Repository_id INT COMMENT 'The repository file number of the BLOB',
Repo_blob_offset BIGINT COMMENT 'The offset in the repository file',
Blob_size BIGINT COMMENT 'The size of the BLOB in bytes',
Deletion_time TIMESTAMP COMMENT 'The time the BLOB was deleted',
Remove_in INT COMMENT 'The number of seconds before the reference/BLOB is removed perminently',
Temp_log_id INT COMMENT 'Temporary log number of the referencing deletion entry',
Temp_log_offset BIGINT COMMENT 'Temporary log offset of the referencing deletion entry'
) ENGINE=PBMS;

This table contains one row for every reference to every BLOB in the database. The reference is identified by the table name (Table_name column) and the BLOB ID (Blob_id column).

The Column_ordinal identifies the table column that references the BLOB.  This column may be NULL if the BLOB is currently not referenced by any row in the table. In the case of an unreferenced blob, the Remove_in will indicate when the BLOB will be removed if there are no references at all.

Blob_url contains the BLOB repository reference required to retrieve the BLOB.

Repository_id and Repo_blob_offset identify the BLOB in the pbms_repository table.

Blob_size is the uncompressed size of the BLOB in bytes.

Deletion_time is the time the BLOB was deleted. This is the time the BLOB temporary timer was set. The Remove_in column contains the remaining time in seconds, before the reference is deleted. If the reference is removed, and there are not other references to the BLOB, the BLOB will be deleted as well.

Temp_log_id and Temp_log_offset identify where the timer is stored in the temporary log files.

12.5. The pbms_metadata_header Table

CREATE TABLE pbms_metadata_header (
Name CHAR( 32) COMMENT 'The name of the metadata',
) ENGINE=PBMS;

This table contains the HTTP headers that will be treated as metadata names when a BLOB is uploaded. Any header sent with the BLOB that contained in this table will be stored, along with it's value, as metadata in the BLOB repository. This table is updatable.


12.6. The pbms_dump Table

CREATE TABLE pbms_dump (
Data LONGBLOB COMMENT 'A BLOB repository record',
) ENGINE=PBMS;

This table is only provided as a means to dump and restore a BLOB repository. The mysqldump can be used to backup the repository by dumping this table. The repository is then restored by inserting the dumped data back into this table. No assumptions should be made about the content of any one record in this table. Failure to restore the entire repository as dumped from this table can result in a damaged repository.


12.7. The pbms_variable Table

CREATE TABLE pbms_variable(
Id INT COMMENT 'The variable ID',
Name CHAR(32) COMMENT 'The variable Name',
Value CHAR(1024) COMMENT 'The variable Value',
Description CHAR(124) COMMENT 'The variable Description',
) ENGINE=PBMS;

This table stores database specific information used by the PBMS engine. By updating the value for specific variables the value will be stored with the database or an action will be taken depending on the variable being updated. The following is a table of system variables and their descriptions.

Name Default Description
Storage-type "Repository" The storage method used for the BLOB data, "Repository" or "Cloud".
S3-Public-Key NULL The public key used for S3 cloud storage.
S3-Private-Key NULL The private key used for S3 cloud storage.
S3-Bucket-Name NULL The S3 bucket name into which the BLOBs are to be stored if using cloud storage.
Restoring-Dump "False" Indicates if the database is being restored from a dump file.
Backup-Location NULL The location for repository level backups.
Do-Backup "False" When set to "True" the repository will be backed up to the backup location. If set to "False" any currently running backup will be terminated. You can determine if a backup is currently underway by selecting this value back.


13. PBMS System Variables

The following system variables are available for controlling the behavior of the PBMS engine.

Name Default Description
pbms_garbage_threshold 50 The percentage of garbage in a repository file before it is compacted.
pbms_port 8080 The port for the PBMS stream-based communications. Setting this value to 0 will disable HTTP communication with the engine.
pbms_repository_threshold 128MB The maximum size of a BLOB repository file. You may use Kb, MB or GB to indicate the unit of the value. A value in bytes is assumed when no unit is specified.
pbms_temp_blob_timeout 600 The timeout, in seconds, for temporary BLOBs. Uploaded BLOB data is removed after this time, unless they are referenced by a record in the database.
pbms_temp_log_threshold 32MB The maximum size of a temporary BLOB log file. You may use Kb, MB or GB to indicate the unit of the value. A value in bytes is assumed when no unit is specified.
pbms_max_keep_alive 10ms The timeout for inactive connection with the keep-alive flag set. After this time the connection will be closed. The time-out is in milliseconds (1/1000).
pbms_metadata_headers "Content-Type" A ':' delimited list of metadata headers to be used to initialize the pbms_metadata_header table when a database is created.

All variables can be set at runtime except pbms_port.

To set variables on the mysqld command line, use the syntax: --variable_name=value, for example:
--pbms_repository_threshold=10GB.

pbms_field_references is a boolean value and can be enabled on the mysqld command line as follows:
--pbms_field_references, and disabled as follows: --skip_pbms_field_references.

On the command line '-' may be used in place of '_', for example: --pbms-garbage-threshold=20.

Variable that can be set at runtime are set using the SET GLOBAL statement, for example:

mysql> set global pbms_temp_log_threshold = "80MB";

Note that pbms_repository_threshold and pbms_temp_log_threshold are string values and so the value assigned must be quoted.


14. MySQL Connector/J Streaming Enabled

The MySQL Connector/J Streaming Enabled is a version of the standard MySQL JDBC driver which has been modified to enable BLOB streaming with the PBMS BLOB streaming engine.

The modified source code is available with the standard PBMS source code distribution. An already built .jar file with the modifications is also available from http://www.blobstreaming.org/download.

14.1. How It Works

The Connector/J SE makes working with the PBMS BLOB repository transparent to the programmer. It hides the fact that a reference is stored in the table field instead of the actual BLOB data.

BLOBs of any length can be stored and retrieved from the database without setting the value of the max_allowed_packet system variable. This is because BLOB data is actually streamed to and from the database. Values inserted using setBinaryStream(), for example, are automatically uploaded to the BLOB repository by the JDBC driver using HTTP PUT, as described in the section: 5. Storing BLOB References in Tables.

The JDBC driver automatically retrieve data stored in the repository when it is selected from the database. Using getBinaryStream(), you can take control of the stream of data returned by the HTTP GET operation as described in section 6. Retrieving BLOBs from Tables.

Note that the BLOB data is only retrieved when it is actually required. Just selecting rows which contains BLOB stored in the repository does not cause the JDBC driver to fetch the data from the repository. This means table rows that contain BLOBs stored in the repository do not influence the speed of the SELECT. Only when the data is accessed, for example using getBytes(), getString(), getBinaryStream(), etc. is the data actually retrieved from the repository.

14.2. Using the JDBC Driver

Enable BLOB streaming by adding the following option to the JDBC URL: enableBlobStreaming=true. For example:

 Class.forName("com.mysql.jdbc.Driver").newInstance();

 Connection conn = null;
 try {
     conn = DriverManager.getConnection("jdbc:mysql://localhost/test"+
                                          "?user=root&enableBlobStreaming=true");
     // Do something with the connection
 }
 catch (Exception e) {
     // Handle the exception
     e.printStackTrace();
 }
 finally {
     if (conn != null)
         conn.close();
 }

When set to true the JDBC driver will upload values set with setBinaryStream(), setAsciiStream() and setBlob() to the PBMS BLOB repository, but only if a valid length is specified. For example, a value inserted using setBinaryStream() with a length of -1 will be not be uploaded to the repository. Instead the value will by inserted directly into the table, as if enableBlobStreaming were set to false.

Here is an example of how to insert a BLOB into a table.

 public void insertBLOB(Connection conn, InputStream in, int len)
                                                 throws SQLException
 {
     PreparedStatement ps = null;

     try {
         ps = conn.prepareStatement("insert my_test_tab(n_text) values(?)");
         ps.setBinaryStream(1, in, len);
         int rows = ps.executeUpdate();
      }
      finally {
          if (ps != null)
              ps.close();
      }
 }

Note that currently only LONGBLOB type columns of PBXT tables may contain streamable BLOB data. This means that when enableBlobStreaming=true you should only use setBinaryStream(), setAsciiStream() and setBlob() with a valid length on columns of type LONGBLOB belonging to a PBXT table.

** WARNING ** The JDBD driver does not check that the database column or table is the correct type . If use setBinaryStream(), setAsciiStream() or setBlob() with a length greater or equal to one on columns that are not of type LONGBLOB, or table type other then PBXT, data will be lost!

To avoid this problem, specify a length of -1 when using setBinaryStream(), setAsciiStream() or setBlob() on columns that are not of type LONGBLOB, or when using a table type other than PBXT.

Selected values of type LONGBLOB are automatically checked for valid BLOB repository references. The reference is then used to automatically download the BLOB data from the BLOB repository. Using a getBinaryStream(), getAsciiStream() or getBlob() you have direct access to the BLOB data stream from the repository.

Here is an example of how to retrieve a BLOB from a table:

 public static InputStream selectBLOB(Connection conn, int id) throws SQLException
 {
     Statement st = null;
     InputStream in = null;
     ResultSet rs;

     try {
         st = conn.createStatement();
         rs = st.executeQuery(
                       "select n_text from pbms_test_tab where n_id = "+id);
         rs.next();
         in = rs.getBinaryStream("n_text");
     }
     finally {
         if (st != null)
             st.close();
     }
     return in;
 }

If you required the length of the BLOB before you begin to retrieve the data, then use getBlob(), instead of rs.getBinaryStream().

14.3. Limitations

The current implementation is based on JDK 1.5.x, which only allows for lengths in setBinaryStream(), setAsciiStream() and setBlob() of type 'int'. This means that the maximum BLOB size that can currently permitted is 2GB.

As mentioned above, streamable BLOBs can only be stored in LONGBLOB type columns in PBXT tables.

14.4. Building and Testing JDBC

This source code distribution of the PBMS storage engine includes the modified source code of the MySQL Connector/J JDBC driver, in a sub-directory called java. This directory also contains a ready build JDBC driver in the form of a .jar file.

For debugging purposes you can include the source code in your project or build the JDBC driver yourself. To do this you will need Apache Ant version 1.6 or newer (available from http://ant.apache.org/). You will also need the Java JDK-1.4.2 or later.

Enter the following command to compile and create the driver:

$ ant dist

After this you can test the BLOB streaming using the 'runtest.sh' shell script:

$ ./runtest.sh

The test connects to a local MySQL server and creates a PBXT table. The test assumes that you have already installed the PBXT and PBMS storage engines.


15. PBMS C API

The C API is provided through the pbmscl library which is installed in the same location as the MySQL libraries when PBMS is installed.

One of the features of the API is that the 'get' and 'put' functions used to transfer the BLOB data between the client and server do so by using callback functions that get called one or more times during the transfer. This enables the client to transfer large BLOBs with out having too buffer the entire BLOB during the transfer.

Example code

An example is included with the PBMS source code under the 'pbmstest' directory. The example demonstrates a couple of different ways in which the API can be used to handle BLOB data.

The test demonstrates inserting and fetching of BLOB data in the form of text entered by the user or files. The insertion and fetching of files is a good example of how BLOB data can be transferred with out having to buffer the entire BLOB. The test also includes examples on how to set and get BLOB metadata including the use of BLOB aliases.

15.1 Function Descriptions

Name
Description
pbms_library_init() Initialize PBMS API. This should be called to initialize the PBMS library before you call any other PBMS library function.
pbms_library_end() Terminate PBMS API use. This should be called to free all PBMS library resources when you are done using the PBMS library.
pbms_errno() Returns the error code for the most recently invoked API function that can succeed or fail.
pbms_error() Returns a null-terminated string containing the error message for the most recently invoked API function that can succeed or fail.
pbms_connect() Establishes a connection to a PBMS BLOB streaming server via a host name and port number.
pbms_mysql_connect() Establishes a connection to a PBMS BLOB streaming server via a MySQL connection handle.
pbms_close() Closes a statement handle or a connection to the PBMS BLOB streaming server.
pbms_set_option() Sets a connection option value for the PBMS connection handle.
pbms_get_option() Gets a connection option value for the PBMS connection handle.
pbms_add_metadata() Adds a metadata header to a PBMS statement.
pbms_clear_metadata() Removes all metadata headers from a statement.
pbms_reset_metadata() Resets the metadata cursor to the start returning the number of metadata values in the statement.
pbms_next_metadata() Gets the next metadata value from the statement's metadata cursor.
pbms_get_metadata_value() Gets a specific metadata value for the BLOB just downloaded on the statement.
pbms_set_md5_digest() Set the MD5 digest for the BLOB about to be uploaded on the statement.
pbms_get_md5_digest()
Get the Md5 digest for the BLOB just downloaded on the statement.
pbms_get_info() Get BLOB headers/metadata only.
pbms_put_data() Sends data to the PBMS BLOB streaming server and receives a BLOB reference after completion.
pbms_get_data() Get data from the PBMS BLOB streaming server.
pbms_is_blob_reference() Checks the format of the BLOB reference and returns true if it looks like a valid reference.
pbms_get_blob_size() Get the size of the BLOB for the specified BLOB reference.


15.2 PBMS Connection Options

Option
Parameter type
Default Description
PBMS_OPTION_
KEEP_ALIVE
unsigned int* false A boolean flag to indicate if the keep-alive flag should be set on HTTP requests.
PBMS_OPTION_
HOST
(const char *)* localhost The connection's Blob Streaming server host IP address. (Read Only)
PBMS_OPTION_
PORT
unsigned int* 8080 The connection's Blob Streaming server host port number. (Read Only)
PBMS_OPTION_
TRANSMITION_TIMEOUT
unsigned int* 0 If greater than zero this sets a limit to how long a blob transfer can take.
PBMS_OPTION_
DATABASE
(const char*)* none The database associated with the connection.


15.3 Function Definitions
pbms_bool pbms_library_init(pbms_bool thread_safe)

Description

Initialize PBMS API. This should be called to initialize the PBMS library before you call any other PBMS library function.

Parameters

Return value


void pbms_library_end()

Description

Terminate PBMS API use. This should be called to free all PBMS library resources when you are done using the PBMS library.


int pbms_errno(PBMS pbms)

Description

For the handle specified by pbms, pbms_errno() returns the error code for the most recently invoked API function that can succeed or fail.

Parameters

Return value


const char *pbms_error(PBMS pbms)

Description

For the handle specified by pbms, pbms_error() returns a null-terminated string containing the error message for the most recently invoked API function that can succeed or fail.

Parameters

  Return value
PBMSC pbms_connect(const char* host, unsigned int port, const char *database)

Description

Establishes a connection to a PBMS BLOB streaming server running on host. pbms_connect() must complete successfully before you can execute any other API functions that require a valid PBMS connection handle structure. 

Internal connections to the PBMS server are only opened as needed and are closed after each access unless the PBMS_OPTION_KEEP_ALIVE is set to true. Even then the connection will only remain open for as long as the server's max_keep_alive setting allows.

Parameters

Return value
PBMS pbms_mysql_connect(struct st_mysql *mysql, const char *database)

Description

Establishes a connection to a PBMS BLOB streaming server associated with the MYSQL connection handle. 

Parameters

Return value
void pbms_close(PBMS pbms)

Description

Closes the connection and frees all of it's resources.  

Parameters


pbms_bool pbms_set_option(PBMS pbms, enum pbms_option option, const void *in_value)

Description

Sets a connection option value for the PBMS handle.

Parameters

Return value
pbms_bool pbms_get_option(PBMS pbms, enum pbms_option option, const void *out_value)

Description

Gets a connection option value for the PBMS handle.

Parameters

Return value
pbms_bool pbms_add_metadata(PBMS pbms, const char *name, const char *value)

Description

Adds the metadata to the connection to be associated with the next BLOB uploaded on the connection.

Parameters

Return value


pbms_bool pbms_clear_metadata(PBMS pbms)

Description

Removes all metadata associated with the connection.

Parameters

Return value


int pbms_reset_metadata(PBMS pbms)

Description

Resets the position of the metadata cursor to the start of the metadata for the downloaded BLOB.

Parameters

Return value


pbms_bool pbms_next_metadata(PBMS pbms, char *name, char *value, size_t *v_size)

Description

Fetches the next metadata record.

Parameters

Return value


pbms_bool pbms_get_metadata_value(PBMS pbms, const char *name, char *value, size_t v_size)

Description

Gets a specific metadata value from the connection's metadata cursor.

Parameters

Return value


pbms_bool pbms_set_md5_digest(PBMS pbms, const char *md5_digest)

Description

Specifies the MD5 digest for the BLOB data about to be sent. This is not required but is a good thing to provide if you are concerned about possible corruption of data in transit.

Parameters

Return value


pbms_bool pbms_get_md5_digest(PBMS pbms, char *md5_digest)

Description

Gets the MD5 digest string of the BLOB just downloaded..

Parameters

Return value


typedef char *(* PBMS_CALLBACK_FUNC) (void *caller_data, char *buffer, size_t *buffer_size, size_t size);

Description

The user defined callback function is called when transferring data. The callback will continue to be called until an error occurs or the callback returns NULL.

Parameters

Return value

Example:

See the examples provided for pbms_put_data() and pbms_get_data().


pbms_bool pbms_put_data(PBMS pbms, const char *table, const char *alias, char *ref, size_t total_data_size, char *buffer, size_t data_size, PBMS_CALLBACK_FUNC cb, void *caller_data)

Description

Sends data to the PBMS BLOB streaming server and receives a BLOB reference after completion.

Parameters

Return value

Example:

#define BUFFER_SIZE 100
char *my_put_file_callback(void *caller_data, char *buffer, size_t *buffer_size, size_t size)
{
    int *fh = (int*) caller_data;
    void *data = NULL;

    if (size) {
        data = buffer;
        *buffer_size = read(fh, data, BUFFER_SIZE);
    }
    return data;
}

main ()
{
    PBMS pbms;
    PBMS_BLOB_REF blob_ref;
    char buffer[BUFFER_SIZE];

    pbms_library_init(false);

    pbms = pbms_connect("localhost", 8080, "test");

    char *data = "A test string.";
    size_t size = strlen(info.data) +1;

    // Send a blob directly with out using the callback:
    pbms_put_data(pbms, NULL, "myBlob1", &blob_ref, size, data, size, NULL, NULL);

    printf("Blob reference = \"%s\"\n", (char *) &blob_ref);

    // Send a file using a callback:
    int fh = open("myblob.jpg", O_RDONLY, 0000777);

    data = NULL;
    size = lseek(fh, 0, SEEK_END);
    lseek(fh, 0, SEEK_SET);
    pbms_put_data(pbms, NULL , "MyBlob2", &blob_ref, size, buffer, 0, my_put_file_callback, &fh);

    printf("Blob reference = \"%s\"\n", (char *) &blob_ref);
    close(fh);

    pbms_close(pbms);
    pbms_library_end();
}


pbms_bool pbms_get_data(PBMS pbms, const char *ref, char *buffer, size_t buffer_size, PBMS_CALLBACK_FUNC cb, void *caller_data)

Description

Get data from the PBMS BLOB streaming server for the specified BLOB reference or alias.

Parameters

Return value

Example:

 #define BUFFER_SIZE 100
char *blob_to_file_callback(void *caller_data, char *buffer, size_t *buffer_size, size_t size)
{
i    nt *fh = (int*) caller_data;

     // Write out the last block of data received.
     write(*fh, buffer, *buffer_size);

     if (!size)
        return NULL; // No more date.

     buffer_size = BUFFER_SIZE;
     return buffer;
}

char *my_get_callback(void *caller_data, char *buffer, size_t *buffer_size, size_t size)
{
     MY_INFO *info = (MY_INFO*) caller_data;

     if (!size)
         return NULL;

    info->size = 0;
    return info->data;
}

main ()
{
     PBMS pbms;
     char buffer[BUFFER_SIZE];
     int fh;

     pbms_library_init(false);

    pbms = pbms_connect("localhost", 8080, "test");

     fh = open("myblob.jpg", O_WRONLY | O_CREAT | O_TRUNC, 0000777);

    pbms_get_data(pbms, "MyBlob1", buffer, BUFFER_SIZE, blob_to_file_callback, &fh);

    printf("Blob size = %d"\n", lseek(fh, 0, SEEK_END));
     close(fh);

     pbms_close(pbms);
     pbms_library_end();
}


pbms_bool pbms_get_info(PBMS pbms, const char *ref )

Description

Get metadata and headers from the PBMS BLOB streaming server for the specified BLOB reference or alias.

Parameters

Return value

Example:

main ()
{
   PBMS pbms;
   char buffer[PBMS_META_VALUE_SIZE];
   time_t secs;

   pbms_library_init(false);

   pbms = pbms_connect("localhost", 8080, "test");

   pbms_get_info(pbms, "MyBlob1");

   printf("BLOB Info:\n");
   pbms_get_metadata_value(pbms, MS_ALIAS_TAG, buffer, NULL);
   printf("ALIAS: %s", buffer);

   pbms_get_metadata_value(pbms, MS_BLOB_SIZE, buffer, NULL);
   printf("SIZE: %s", buffer);

   pbms_get_metadata_value(pbms, MS_CHECKSUM_TAG, buffer, NULL);
   printf("CHECKSUM: %s", buffer);

   pbms_get_metadata_value(pbms, MS_ACCESS_COUNT, buffer, NULL);
   printf("ACCESS_COUNT: %s", buffer);

   pbms_get_metadata_value(pbms, MS_CREATION_TIME, buffer, NULL);
   secs = atol(buffer);
   ctime_r(secs, buffer);
   printf("CREATION_TIME: %s", buffer);

   pbms_get_metadata_value(pbms, MS_LAST_ACCESS, buffer, NULL);
   secs = atol(buffer);
   ctime_r(secs, buffer);
   printf("LAST_ACCESS: %s", buffer);

   pbms_close(pbms);
   pbms_library_end();
}


pbms_bool pbms_is_blob_reference(PBMS pbms, const char *ref)

Description

Checks the format of the BLOB reference and returns true if it looks like a valid reference. This doesn't guarantee that the reference references a valid BLOB on the PBMS BLOB streaming server only that the format of the BLOB reference is correct.  

Parameters

Return value
pbms_bool pbms_get_blob_size(PBMS pbms, const char *ref, size_t *size)

Description

Get the size of the BLOB for the specified BLOB reference.

If the reference is a PBMS BLOB URL then the blob size is imbedded in the URL and so the size can be returned directly. But it the ref is not a URL it is assumed to be a BLOB alias and so a request will be sent to the PBMS server to get the information.

Parameters

Return value _________________________________________________________________________________________________
Copyright 2009 by PrimeBase Technologies GmbH