Home | Download | Documentation PBMS | Documentation S3 Daemon | Contact | Forum

Documentation - PBMS v0.5.08

Contents

1. What is PBMS?

2. "Streaming-enabled" Engines

3. "Streaming-enabled" Tables
   3.1 BLOB reference columns

4. Building and Installing PBMS

5. Accessing the database using BLOB References
   5.1. A BLOB Repository Reference
   5.2. A BLOB Table Reference
   5.3. A BLOB Database Reference
   5.4. A BLOB Field Reference (Experimental)

6. Storing BLOB References in Tables

7. Retrieving BLOBs from Tables

8. PBMS HTTP Headers
   8.1. Metadata Headers
   8.2. BLOB Alias Header
   8.3. Reply Headers
   8.4. Control Headers

9. The BLOB Repository
   9.1. The Repository File Structure
   9.2. Temporary BLOB Timeout
   9.3 Automatic Compaction

10. The System Tables
   10.1 The pbms_repository Table
   10.2 The pbms_blob Table
   10.2 The pbms_metadata Table
   10.4. The pbms_reference Table
   10.2 The pbms_metadata_header Table

11. PBMS System Variables

12. MySQL Connector/J Streaming Enabled
   12.1. How It Works
   12.2 Using the JDBC Driver
   12.3 Limitations
   12.4. Building and Testing JDBC

13. PBMS UDFs
  13.1 UDF Descriptions
  13.2 UDF Definitions

14. PBMS C API
  14.1 Function Descriptions
  14.2 PBMS Connection Options
  14.3 Function Definitions

Further reading
   README
   Release notes (ChangeLog)
   Installation (INSTALL)
   TO-DO List


1. What is PBMS?

PBMS stands for PrimeBase Media Streaming. The PBMS engine is a MySQL storage engine which enables the storage and streaming of BLOB data in and out of a MySQL database.

MySQL 5.1 provides a pluggable storage engine API which allows 3rd parties to extend the storage capabilities of the server. Storage engines are built as shared libraries and can be loaded at runtime by the server. Most storage engines provide an alternative way of storing the table data. By doing this they add functionality to the server in terms of transactional support, special indexing methods, data warehousing and archiving facilities or even clustering and high-availability capabilities.

To use a particular engine, you specify the name of the engine when creating a table in MySQL. For example:

CREATE TABLE my_test_tab (
  n_id           INTEGER PRIMARY KEY AUTO_INCREMENT,
  n_text         LONGBLOB,
  n_string       VARCHAR(255)
) ENGINE=PBXT;

This example creates a table using the PBXT storage engine.

The PBMS engine specializes in the storage and retrieval of BLOB data. Unlike other storage engines, it does not actually provide table storage. So you would never actually create tables of type PBMS.

Instead, PBMS works together with a standard storage engine, in order to add functionality to MySQL tables. The most basic additional functionality provided by PBMS is:


2. "Streaming-enabled" Engines

A streaming-enabled storage engine is a MySQL storage engine that supports the PBMS server-side streaming API. PBMS blob references stored in the tables of a streaming-enabled engine will automatically increment and decrement BLOB reference counters when BLOB references are inserted, updated, and deleted. Streaming-enabled engines will also automatically notify the PBMS engine when tables containing blob references are altered or dropped.

The PBMS engine can be used with non streaming-enabled engines through the use of a set of User Defined Functions (UDFs) provided by the PBMS engine. These UDFs can be used in triggers to notify the PBMS engine when BLOB references are inserted, updated, or deleted in tables. UDFs are also provided that can be called to notify the PBMS engine when tables containing blob references are altered or dropped.


3. "Streaming-enabled" Tables

A streaming-enabled table is a MySQL table that either belongs to a "Streaming-enabled" engine or has triggers defined on blob referencing columns that increment and decrement BLOB reference counters when BLOB references are inserted, updated, and deleted.

The following is an example of how a streaming-enabled table can be defined for a non streaming-enabled engine:

Create table x.foo(c1 integer, c2 longblob) ENGINE = MYISAM;

create trigger x.foo_insert_trig BEFORE INSERT on x.foo for each row BEGIN set NEW.c2 = pbms_insert_blob_trig("x", "foo", 2, NEW.c2); END
create trigger x.foo_update_trig BEFORE UPDATE on x.foo for each row BEGIN set NEW.c2 = pbms_update_blob_trig("x", "foo", 2, OLD.c2, NEW.c2); END
create trigger x.foo_delete_trig BEFORE DELETE on x.foo for each row BEGIN declare dummy integer; set dummy = pbms_delete_blob_trig("x", "foo", 2, OLD.c2); END

Streaming-enabled tables can store BLOB data in the PBMS BLOB repository where reference counts to the data are maintained so that the BLOB will automatically be deleted ounce the reference count drops to zero.

Currently, the only engine that is streaming-enabled is PBXT, version 0.9.88 later.

The most recent version of both engines are provided on this web-site. You can download the source code and compile the engines yourself or you can use one of the binaries that are provided. However, binaries are only available for a very limited number of platforms.


3.1 BLOB Reference columns

A BLOB reference column is a column in a streaming-enabled table which contains PBMS BLOB references and notifies PBMS when a reference in the column is inserted, updated or deleted. For tables created by stream-enabled engines, which columns in a table are treated as BLOB reference columns is engine dependent, currently in PBXT longblob columns are assumed to be BLOB reference columns. For tables created by non stream-enabled engines BLOB reference columns are the columns for which triggers are defined which call the PBMS UDFs to notify PBMS of inserts, updates, and deletes.

The PBMS client API pbms_init_table() and pbms_reset_table_blobs() functions assume that longblob columns are intended to be BLOB reference columns and includes them in the trigger definitions that they create.


4. Building and Installing PBMS

Please refer to the INSTALL document in the source code distribution for details of how to build and install the PBMS plug-in.


5. Accessing the database using BLOB References

PBMS uses the concept of a BLOB reference. A BLOB reference is, in fact, a Uniform Resource Locator (URL), similar to the Web address of an internet page.

There are currently 4 types of BLOB references:

5.1. A BLOB Repository Reference

This type of URL identifies a BLOB in the PBMS BLOB repository, and has two forms:

http://host-name:pbms-port//~*database-name/~repository-id

or

http://host-name:pbms-port//~*database-name/_repository-id

For example: http://localhost:8080/~*test/~1-150-660294d3-0*123 or http://localhost:8080/~*test/_1-246-57d672ac-0*123

The prefix '~*' helps to identify a BLOB repository URL. This is used by BLOB enabled engines to identify BLOB URLs when inserted into the database. The prefix is also used by the JDBC driver and other client software to recognize BLOB URLs.

BLOB repository reference with a '~' character before the repository-id are references that are already associated with a database table. BLOB repository reference with a '_' character before the repository-id are references that are not associated with a particular table, and are the result of uploading a BLOB using a BLOB database reference (see below).

The  repository-id is a value generated by the PBMS BLOB repository. It has the following form: table_id-blob_id-access_code-server_id*size for references that are associated with a database table, and repo_id-repo_offset-access_code-server_id*size for references that are not associated with a database table.

The access code (access_code) is a unique 4-byte value generated by PBMS whenever a BLOB is uploaded into the BLOB repository. This code is required to retrieve the BLOB making it very unlikely for anyone to retrieve the BLOB without the correct URL.

The 'size' is the size of the actual BLOB in bytes.

5.2. A BLOB Table Reference

This URL identifies a table in the database, and has the following form:

http://host-name:pbms-port//database-name/table-name

BLOB table references are used to upload data to a particular table in the BLOB repository. For example:

$ curl -d "This is an example text" http://localhost:8080/test/my_test_tab

The result returned by this command will be a partial BLOB repository reference, of the form:

~*database-name/~repository-id

In order to retrieve the value from the repository, add the host name and PBMS port to the URL, for example:

$ curl "http://localhost:8080/~*test/~1-150-660294d3-0*23"

5.3. A BLOB Database Reference

This type of URL identifies a database repository, and has the following form:

http://host-name:pbms-port//database-name

BLOB database references are used to upload data to a BLOB repository without specifying a table. For example:

$ curl -d "This is an example text" http://localhost:8080/test

The result returned by this command will be a partial BLOB repository reference, of the form:

~*database-name/_repository-id

In order to retrieve the value from the repository, add the host name and PBMS port to the URL, for example:

$ curl "http://localhost:8080/~*test/_1-246-57d672ac-0*123"

BLOB  table references are more efficient than BLOB database references and should be used when ever the targeted table name is known.

5.4. A BLOB Alias Reference

This URL identifies a particular BLOB in the database by it's alias:

http://host-name:pbms-port//database-name/alias

The alias is set by the user when the BLOB is uploaded to the PBMS engine by including the http header 'PBMS_BLOB_ALIAS' and setting it to the BLOB alias you want to use, for example:

$ curl -H "PBMS_BLOB_ALIAS:MyBLOB" -d "This is an example text" http://localhost:8080/test/my_test_tab

The BLOB can then be retreived using a BLOB alias reference:

$ curl "http://localhost:8080/test/MyBLOB"

A BLOB alias is a special type of BLOB metadata and must be unique with in the database.

6. Storing BLOB References in Tables

As explained above, BLOB data can be uploaded to the BLOB repository using a BLOB database or table references. The upload is done using any standard HTTP client that supports the HTTP PUT or POST method or the client API mybs_put_data() function.

For example, using curl I can upload BLOB data as follows:

$ curl -d "This is an example text" http://localhost:8080/test

The result will be a partial BLOB repository reference, for example: ~*test/_1-246-57d672ac-0*23.

Now to store this BLOB in a database table, you just insert the BLOB reference. For example:

mysql> INSERT my_test_tab (n_text) VALUES ("~*test/_1-246-57d672ac-0*23");

When the streaming-enable table receives this insert, it will call the PBMS engine in order to "retain" the BLOB repository reference in it's BLOB reference columns. Retain basically means that PBMS increments the reference count for the BLOB. PBMS keeps a list of references to each BLOB so that it knows when a BLOB can be freed in order to reclaim resources.

Note that BLOB references should only be inserted into BLOB reference columns in streaming-enabled tables. Only in this case will PBMS be called to retain the BLOB

The following diagram illustrates the steps that are taken to insert a BLOB into the database.

In step (1) the client software uploads the BLOB to the repository using an HTTP PUT request method. On reply, the repository returns a BLOB repository reference in step (2). Step (3), the reference is placed in the row to be inserted in the database. Using a standard SQL INSERT, the client inserts the row into the database table in step (4). In step (5), the streaming enabled table recognizes that BLOB URL has been inserted and automatically calls the BLOB Streaming engine to retain the BLOB.


7. Retrieving BLOBs from Tables

Once a BLOB reference has been stored in a table, a standard SELECT will return the BLOB reference, not the actual BLOB data.

For example:

mysql> select n_id, n_text from my_test_tab where n_id=5;
+------+-----------------------------------+
| n_id | n_text                      |
+------+-----------------------------------+
|    5 | ~*test/~2-128-57d672ac-0*23 |
+------+-----------------------------------+

The BLOB reference can then be used to retrieve the BLOB data using the HTTP GET method, for example:

$ curl "http://localhost:8080/~*test/~2-128-57d672ac-0*23"
This is an example text

The following diagram illustrates the process of downloading a BLOB from the repository:

In step (1) the row containing a BLOB repository reference is retrieved from the database using a standard SQL SELECT statement. In step (2) the client software recognizes that a valid BLOB URL is in the database field. With this information it creates a complete URL and downloads the BLOB data from the BLOB repository in step (3), using a standard HTTP GET request method. In step (4) the BLOB data is streamed back to the client application.


8. PBMS HTTP Headers

HTTP headers can be used to send and receive aditional information about the BLOB. They can also be used to control the actions taken by the PBMS engine.

8.1. Metadata Headers

Metadata headers are headers that allow additional information to be stored in the repository with the BLOB. A metadata header can be a user defined or a standard HTTP header such as "Content-Type". The system table "pbms_metadata_headers" contains the metadata headers recognized by the PBMS engine, each database has it's own "pbms_metadata_headers" table. The user can insert into this table headers that they wish to be recognized as metadata headers. When the PBMS HTTP server recieves a BLOB any HTTP headers in the 'HTTP Put' request that are also in the "pbms_metadata_headers" table will be stored with the BLOB as metadata.

On retreival all metadata headers along with their values will be added to the HTTP reply.

Example, assume that the string "About_the_blob" exists in the "pbms_metadata_headers" table. The metadata header, along with the BLOB alias could be sent with the BLOB to the PBMS engine as:

$ curl -H "PBMS_BLOB_ALIAS:MyBLOB" -H "About_the_blob:An example" -d "This is an example text" http://localhost:8080/test

and when fetched back by:

$ curl -D - "http://localhost:8080/test/MyBLOB"

The result would be:

HTTP/1.1 200 OK
PBMS_CHECKSUM: C36B9D679BF48385082118FC61C7E709
PBMS_BLOB_SIZE: 23
PBMS_LAST_ACCESS: 1236370182
PBMS_ACCESS_COUNT: 0
PBMS_CREATION_TIME: 1236370182
PBMS_BLOB_ALIAS: MyBLOB
About_the_blob: An example
Content-Length: 23

This is an example text

8.2. BLOB Alias Header

The "PBMS_BLOB_ALIAS" header is a special metadata header that enables you to associate a string with the BLOB being uploaded which can then be used later to download it again. The alias can be any string upto 1024 bytes long that conforms to the rules for a valid URL and doesn't start with a '/' character.

8.3. Reply Headers

In addition to the metadata headers the following headers are also returned:

8.4. Control Headers

Control headers are headers which are sent to the PBMS engine and have special meaning. At this time there is only one control header "PBMS_RETURN_HEADER_ONLY" which signals the PBMS engine to return the header and the BLOB or the header only. Example:

$ curl -H "PBMS_RETURN_HEADER_ONLY:yes" -D - "http://localhost:8080/test/MyBLOB"

The result would be:

HTTP/1.1 200 OK
PBMS_CHECKSUM: C36B9D679BF48385082118FC61C7E709
PBMS_BLOB_SIZE: 23
PBMS_LAST_ACCESS: 1236370182
PBMS_ACCESS_COUNT: 0
PBMS_CREATION_TIME: 1236370182
PBMS_BLOB_ALIAS: MyBLOB
About_the_blob: An example
Content-Length: 0

In the future other control headers are antipated to return general information about the repository or when sent with a BLOB upload to indicate how the BLOB is to be stored.

9. The BLOB Repository

The PBMS BLOB repository contains BLOBs uploaded using the HTTP PUT or GET method, for example:

$ curl -d "This is an example text" http://localhost:8080/test
~*test/_1-246-57d672ac-0*23

This operation returns a reference to the BLOB in the repository. The BLOB can be retrieved from the repository using this URL, and the HTTP GET method. For example:

$ curl "http://localhost:8080/~*test/_1-246-57d672ac-0*23"
This is an example text

9.1. The Repository File Structure

Each database has its own BLOB repository. This means that if a BLOB reference is inserted into a table that belongs to another database, the BLOB will be copied over to the repository of the database containing the table. The repository doesn't reside in the database folder itself but in a folder with in the pbms folder with the same name as the database folder. The pbms folder resides in the MySQL databases folder along side the databases. When doing a 'show databases' it will apear as a database although it is not infact a database.

A repository consists of a number of repository files. The repository files have names of the form: repo-file_id.bs, for example repo-241.bs. file_id is the ID of the repository file.

Each repository file can contain a number of BLOBs. The number of BLOBs in a repository file depends on the size of the BLOBs and the current setting of the "repository threshold". The repository threshold is determined by setting the pbms_repository_threshold system variable. This is a value in bytes unless you specify: Kb, MB or GB after the value (note: the parameter must be set as a string value).

BLOBs are written sequentially to the repository files. Once a file reaches the threshold, a new repository files is created.

PBMS writes multiple repository files simultaneously. Each BLOB that is being uploaded is written to a different repository file in parallel.

Repository files may be a maximum of 256 Terabytes in size, and the maximum number of repository files is 16777215.

This provides a theoretically maximum repository size of 4 Zettabytes (2^72 bytes). The maximum size of a single BLOB is also 256 Terabytes.

9.2. Temporary BLOB Timeout

Once a BLOB has been uploaded, PBMS sets a timer. When the timer expires, PBMS deletes the BLOB automatically, unless the BLOB has been retained in the meantime by the act of inserting it into a streaming-enabled table.

This also applies to BLOBs that are deleted or no longer referenced due to a database UPDATE.

The amount of time waited is determined by the pbms_temp_blob_timeout system parameter. This is a value in seconds. By default the value is set to 10 minutes.

The information about which BLOBs are to be deleted and when, is stored in the temporary log files. These files are stored in the bs-logs sub-directory of each database. The temporary log files have names of the form temp-log_id.bs. log_id is the temporary log file ID.

Once a log is full, a new one is created with the next highest ID. The maximum size of a temporary log file is determined by the pbms_temp_log_threshold system variable, which is a value in bytes, unless one of the following is specified: Kb, MB or GB after the value (note: the parameter must be set as a string value).

The maximum temporary log size is 4GB.

9.3 Automatic Compaction

When BLOBs are deleted from the repository files, the free space is not immediately re-used. Instead, the system records the amount of free space in the repository file. This is also know as the garbage level of the file, and is measured as a percentage of the size of the file.

When the garbage level of a repository file exceeds a certain threshold PBMS begins to compact the repository file. It does this by creating a new repository file and copying all valid data to the new file. After this is done, it deletes the old repository file.

The garbage threshold is determined by the pbms_garbage_threshold system variable. Set this variable to a value between 1 and 100. The default value is 50, which means that a repository file is compacted as soon as it contains 50% garbage or more.

Note that compaction may not begin immediately. This is due to the fact that some repository files may be locked. A repository file is locked when it is being used to receive uploaded BLOB data. In this case, the file will only be unlocked when it has reached the repository threshold (pbms_repository_threshold).


10. The System Tables

PBMS has five system tables which provide a view of the BLOB repository. The tables are called pbms_repository , pbms_blob, pbms_metadata, pbms_reference, and pbms_metadata_header.

The pbms_repository, pbms_blob and pbms_metadata tables can be thought of as views into the PBMS BLOB repository files. A reord in the pbms_repository table and it's related records in the pbms_blob and pbms_metadata constitute 1 phsical block of data in a BLOB repository file. The pbms_reference table contains a record for each reference to a BLOB including references from temporary log files.

The system tables are created automatically the first time they are referenced but to clarify their structure a create statement will be given for each table.

10.1 The pbms_repository Table

CREATE TABLE pbms_repository (
Blob_alias CHAR(1024) COMMENT "User supplied BLOB identifier",
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Blob_size BIGINT COMMENT 'The size of the BLOB in bytes',
MD5_Checksum CHAR(32) COMMENT "The MD5 Digest of the BLOB data",
Head_size SMALLINT UNSIGNED COMMENT 'The size of the BLOB header - proceeds the BLOB data',
Access_code INT COMMENT 'The 4-byte authorisation code required to access the BLOB - part of the BLOB URL',
Creation_time TIMESTAMP COMMENT 'The time the BLOB was created',
Last_ref_time TIMESTAMP COMMENT 'The last time the BLOB was referenced',
Last_access_time TIMESTAMP COMMENT 'The last time the BLOB was accessed (read)',
Access_count INT COMMENT 'The count of the number of times the BLOB has been read'
) ENGINE=PBMS;

As mentioned above, this table contains one row for every BLOB in the repository. The BLOB is identified by the repository file ID (Repository_id column), and the offset into the file (Repo_blob_offset column). The Blob_alias, when supplied on upload, can also be used as a unique identifier of the BLOB with in the database. This table is read only.

Blob_size is the uncompressed size of the BLOB in bytes. Head_size is the size of the BLOB header which proceeds the BLOB data.

MD5_Checksum is the hex representation of the MD5 digest of the BLOB data.

The Access_code column contains the BLOBs unique 4-byte access code.

Creation_time is the time the BLOB was created. Last_ref_time is updated whenever the BLOB is referenced and Last_access_time is updated each time the BLOB is downloaded.

The Content_type field contains the HTTP content type of the data that was specified when the BLOB was uploaded. It is returned in the HTTP header when the BLOB is downloaded.

10.2. The pbms_blob Table

CREATE TABLE pbms_blob (
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Blob_data LONGBLOB COMMENT 'The data of the BLOB',
) ENGINE=PBMS;

This table contains the actual BLOB data. This is a read only table.

Repository_id and Repo_blob_offset identify the BLOB in the pbms_repository table.

The Blob_data contains the actual BLOB data of the BLOB. This mean that the data can be retrieved from this table using a normal SQL SELECT statement.

However, BLOBs in the PBMS repository may be larger the 4GB, which is the maximum size of a MySQL LONGBLOB value. Note that other limitations, such as computer memory and the MySQL maximum packet size will also affect the maximum size of BLOBs that can be returned by a SELECT statement.

10.3. The pbms_metadata Table

CREATE TABLE pbms_metadata (
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Name CHAR( 32) COMMENT 'The name of the metadata',
Value CHAR( 1024) COMMENT 'The value of the metadata',
) ENGINE=PBMS;

This table contains the metadata associated with a BLOB including the BLOB alias. This table is updatable.

Repository_id and Repo_blob_offset identify the BLOB in the pbms_repository table.

Name is the name of the metadata value and when combined with the Repository_id and Repo_blob_offset provide the primary key to this table.

You can add, modify, or delete metadata associated with a BLOB by performing INSERTS, UPDATES, or DELTES on this table. This includes assigning aliases to BLOBs.

10.4. The pbms_reference Table

CREATE TABLE pbms_reference (
Table_name CHAR(64) COMMENT 'The name of the referencing table',
Blob_id BIGINT COMMENT 'The BLOB reference number - part of the BLOB URL',
Column_ordinal INTCOMMENT 'The column position in the table starting at position 1',
Blob_url VARCHAR(200) COMMENT 'The BLOB URL for HTTP GET access',
Repository_id INT COMMENT 'The repository file number of the BLOB',
Repo_blob_offset BIGINT COMMENT 'The offset in the repository file',
Blob_size BIGINT COMMENT 'The size of the BLOB in bytes',
Deletion_time TIMESTAMP COMMENT 'The time the BLOB was deleted',
Remove_in INT COMMENT 'The number of seconds before the reference/BLOB is removed perminently',
Temp_log_id INT COMMENT 'Temporary log number of the referencing deletion entry',
Temp_log_offset BIGINT COMMENT 'Temporary log offset of the referencing deletion entry'
) ENGINE=PBMS;

This table contains one row for every reference to every BLOB in the database. The reference is identified by the table name (Table_name column) and the BLOB ID (Blob_id column).

The Column_ordinal identifys the table column that references the BLOB.  This column may be NULL if the BLOB is currently not referenced by any row in the table. In the case of an unreferenced blob, the Remove_in will indicate when the BLOB will be removed if there are no references at all.

Blob_url contains the BLOB repository reference required to retrieve the BLOB.

Repository_id and Repo_blob_offset identify the BLOB in the pbms_repository table.

Blob_size is the uncompressed size of the BLOB in bytes.

Deletion_time is the time the BLOB was deleted. This is the time the BLOB temporary timer was set. The Remove_in column contains the remaining time in seconds, before the reference is deleted. If the reference is removed, and there are not other references to the BLOB, the BLOB will be deleted as well.

Temp_log_id and Temp_log_offset identify where the timer is stored in the temporary log files.

10.5. The pbms_metadata_header Table

CREATE TABLE pbms_metadata_header (
Name CHAR( 32) COMMENT 'The name of the metadata',
) ENGINE=PBMS;

This table contains the HTTP headers that will be treated as metadata names when a BLOB is uploaded. Any header sent with the BLOB that contained in this table will be stored, along with it's value, as metadata in the BLOB repository. This table is updatable.


11. PBMS System Variables

The following system variables are available for controlling the behavior of the PBMS engine.

Name Default Description
pbms_garbage_threshold 50 The percentage of garbage in a repository file before it is compacted.
pbms_port 8080 The port for the PBMS stream-based communications. Setting this value to 0 will disable HTTP communication with the engine.
pbms_repository_threshold 128MB The maximum size of a BLOB repository file. You may use Kb, MB or GB to indicate the unit of the value. A value in bytes is assumed when no unit is specified.
pbms_temp_blob_timeout 600 The timeout, in seconds, for temporary BLOBs. Uploaded BLOB data is removed after this time, unless they are referenced by a record in the database.
pbms_temp_log_threshold 32MB The maximum size of a temporary BLOB log file. You may use Kb, MB or GB to indicate the unit of the value. A value in bytes is assumed when no unit is specified.
pbms_max_keep_alive 10ms The timeout for inactive connection with the keep-alive flag set. After this time the connection will be closed. The time-out is in milliseconds (1/1000).
pbms_metadata_headers "Content-Type" A ':' delimited list of metadata headers to be used to initialize the pbms_metadata_header table when a database is created.

All variables can be set at runtime except pbms_port.

To set variables on the mysqld command line, use the syntax: --variable_name=value, for example:
--pbms_repository_threshold=10GB.

pbms_field_references is a boolean value and can be enabled on the mysqld command line as follows:
--pbms_field_references, and disabled as follows: --skip_pbms_field_references.

On the command line '-' may be used in place of '_', for example: --pbms-garbage-threshold=20.

Variable that can be set at runtime are set using the SET GLOBAL statement, for example:

mysql> set global pbms_temp_log_threshold = "80MB";

Note that pbms_repository_threshold and pbms_temp_log_threshold are string values and so the value assigned must be quoted.


12. MySQL Connector/J Streaming Enabled

The MySQL Connector/J Streaming Enabled is a version of the standard MySQL JDBC driver which has been modified to enable BLOB streaming with the PBMS BLOB streaming engine.

The modified source code is available with the standard PBMS source code distribution. An already built .jar file with the modifications is also available from http://www.blobstreaming.org/download.

12.1. How It Works

The Connector/J SE makes working with the PBMS BLOB repository transparent to the programmer. It hides the fact that a reference is stored in the table field instead of the actual BLOB data.

BLOBs of any length can be stored and retrieved from the database without setting the value of the max_allowed_packet system variable. This is because BLOB data is actually streamed to and from the database. Values inserted using setBinaryStream(), for example, are automatically uploaded to the BLOB repository by the JDBC driver using HTTP PUT, as described in the section: 5. Storing BLOB References in Tables.

The JDBC driver automatically retrieve data stored in the repository when it is selected from the database. Using getBinaryStream(), you can take control of the stream of data returned by the HTTP GET operation as described in section 6. Retrieving BLOBs from Tables.

Note that the BLOB data is only retrieved when it is actually required. Just selecting rows which contains BLOB stored in the repository does not cause the JDBC driver to fetch the data from the repository. This means table rows that contain BLOBs stored in the repository do not influence the speed of the SELECT. Only when the data is accessed, for example using getBytes(), getString(), getBinaryStream(), etc. is the data actually retrieved from the repository.

12.2. Using the JDBC Driver

Enable BLOB streaming by adding the following option to the JDBC URL: enableBlobStreaming=true. For example:

 Class.forName("com.mysql.jdbc.Driver").newInstance();

 Connection conn = null;
 try {
     conn = DriverManager.getConnection("jdbc:mysql://localhost/test"+
                                          "?user=root&enableBlobStreaming=true");
     // Do something with the connection
 }
 catch (Exception e) {
     // Handle the exception
     e.printStackTrace();
 }
 finally {
     if (conn != null)
         conn.close();
 }

When set to true the JDBC driver will upload values set with setBinaryStream(), setAsciiStream() and setBlob() to the PBMS BLOB repository, but only if a valid length is specified. For example, a value inserted using setBinaryStream() with a length of -1 will be not be uploaded to the repository. Instead the value will by inserted directly into the table, as if enableBlobStreaming were set to false.

Here is an example of how to insert a BLOB into a table.

 public void insertBLOB(Connection conn, InputStream in, int len)
                                                 throws SQLException
 {
     PreparedStatement ps = null;

     try {
         ps = conn.prepareStatement("insert my_test_tab(n_text) values(?)");
         ps.setBinaryStream(1, in, len);
         int rows = ps.executeUpdate();
      }
      finally {
          if (ps != null)
              ps.close();
      }
 }

Note that currently only LONGBLOB type columns of PBXT tables may contain streamable BLOB data. This means that when enableBlobStreaming=true you should only use setBinaryStream(), setAsciiStream() and setBlob() with a valid length on columns of type LONGBLOB belonging to a PBXT table.

** WARNING ** The JDBD driver does not check that the database column or table is the correct type . If use setBinaryStream(), setAsciiStream() or setBlob() with a length greater or equal to one on columns that are not of type LONGBLOB, or table type other then PBXT, data will be lost!

To avoid this problem, specify a length of -1 when using setBinaryStream(), setAsciiStream() or setBlob() on columns that are not of type LONGBLOB, or when using a table type other than PBXT.

Selected values of type LONGBLOB are automatically checked for valid BLOB repository references. The reference is then used to automatically download the BLOB data from the BLOB repository. Using a getBinaryStream(), getAsciiStream() or getBlob() you have direct access to the BLOB data stream from the repository.

Here is an example of how to retrieve a BLOB from a table:

 public static InputStream selectBLOB(Connection conn, int id) throws SQLException
 {
     Statement st = null;
     InputStream in = null;
     ResultSet rs;

     try {
         st = conn.createStatement();
         rs = st.executeQuery(
                       "select n_text from pbms_test_tab where n_id = "+id);
         rs.next();
         in = rs.getBinaryStream("n_text");
     }
     finally {
         if (st != null)
             st.close();
     }
     return in;
 }

If you required the length of the BLOB before you begin to retrieve the data, then use getBlob(), instead of rs.getBinaryStream().

12.3. Limitations

The current implementation is based on JDK 1.5.x, which only allows for lengths in setBinaryStream(), setAsciiStream() and setBlob() of type 'int'. This means that the maximum BLOB size that can currently permitted is 2GB.

As mentioned above, streamable BLOBs can only be stored in LONGBLOB type columns in PBXT tables.

12.4. Building and Testing JDBC

This source code distribution of the PBMS storage engine includes the modified source code of the MySQL Connector/J JDBC driver, in a sub-directory called java. This directory also contains a ready build JDBC driver in the form of a .jar file.

For debugging purposes you can include the source code in your project or build the JDBC driver yourself. To do this you will need Apache Ant version 1.6 or newer (available from http://ant.apache.org/). You will also need the Java JDK-1.4.2 or later.

Enter the following command to compile and create the driver:

$ ant dist

After this you can test the BLOB streaming using the 'runtest.sh' shell script:

$ ./runtest.sh

The test connects to a local MySQL server and creates a PBXT table. The test assumes that you have already installed the PBXT and PBMS storage engines.

13. PBMS UDFs

PBMS provides a set of UDFs that enable tables from non strreaming-enabled engines to be used with the PBMS engine. They should not be used with tables created by streaming-enabled engines. The UDFs can either be used directly in SQL statements sent to the MySQL server or indirectly through PBMS client API calls.

Before they can be used they must be declared. This only needs to be done once on any given MySQL server. You can do this by calling the client API call pbms_init_udfs() or executing the following MySQL statements:

CREATE FUNCTION pbms_insert_blob_trig RETURNS STRING SONAME "libpbms.so";
CREATE FUNCTION pbms_update_blob_trig RETURNS STRING SONAME "libpbms.so";
CREATE FUNCTION pbms_delete_blob_trig RETURNS INTEGER SONAME "libpbms.so";
CREATE FUNCTION pbms_delete_all_blobs_in_table RETURNS INTEGER SONAME "libpbms.so";
CREATE FUNCTION pbms_rename_table_with_blobs RETURNS INTEGER SONAME "libpbms.so";
CREATE FUNCTION pbms_enabled_engine RETURNS INTEGER SONAME "libpbms.so";

13.1 UDF Descriptions

Name
Description
pbms_insert_blob_trig() Notifies PBMS of data being inserted into a BLOB streaming-enabled column.
pbms_update_blob_trig() Notifies PBMS of data in a BLOB streaming-enabled column being updated.
pbms_delete_blob_trig() Notifies PBMS of data in a BLOB streaming-enabled column being deleted.
pbms_delete_all_blobs_in_table() Notifies PBMS to remove all BLOB references for a table.
pbms_rename_table_with_blobs() Notifies PBMS that a streaming-enabled table has been renamed.
pbms_enabled_engine() Test is a specific engine is streaming-enabled.


13.2 UDF Definitions


STRING pbms_insert_blob_trig(STRING database, STRING table, INTEGER col_position, STRING blob_url)

Description

Notifies PBMS of data being inserted into a BLOB streaming-enabled column.

Parameters

Return value


STRING pbms_update_blob_trig(STRING database, STRING table, INTEGER col_position, STRING old_blob_url, STRING new_blob_url)

Description

Notifies PBMS of data in a BLOB streaming-enabled column being updated update.

Parameters

Return value


INTEGER pbms_delete_blob_trig(STRING database, STRING table, INTEGER col_position)

Description

Notifies PBMS of data in a BLOB streaming-enabled column being deleted.

Parameters

Return value


INTEGER pbms_delete_all_blobs_in_table(STRING database, STRING table)

Description

Notifies PBMS to remove all BLOB references for a table. This function should be called after a streaming-enabled table has been dropped.

Parameters

Return value


INTEGER pbms_rename_table_with_blobs(STRING database, STRING old_table, STRING new_table)

Description

Notifies PBMS that a streaming-enabled table has been renamed This function should be called after a streaming-enabled table has been renamed.

Parameters

Return value


INTEGER pbms_enabled_engine(STRING engine)

Description

Test is a specific engine is streaming-enabled.

Parameters

Return value


14. PBMS C API

The C API is provided through the pbmscl library which is installed in the same location as the MySQL libraries when PBMS is installed.

One of the features of the API is that the 'get' and 'put' functions used to transfer the BLOB data between the client and server do so by using callback functions that get called one or more times during the transfer. This enables the client to transfer large BLOBs with out having too buffer the entire BLOB during the transfer.

Example code

An example is included with the PBMS source code under the 'pbmstest' directory. The example demonstrates a couple of different ways in which the API can be used to handle BLOB data.

The test demonstrates inserting and fetching of BLOB data in the form of text entered by the user or files. The insertion and fetching of files is a good example of how BLOB data can be transferred with out having to buffer the entire BLOB. The test also includes examples on how to set and get BLOB metadata including the use of BLOB aliases.

14.1 Function Descriptions

Name
Description
pbms_library_init() Initialize PBMS API. This should be called to initialize the PBMS library before you call any other PBMS library function.
pbms_library_end() Terminate PBMS API use. This should be called to free all PBMS library resources when you are done using the PBMS library.
pbms_errno() Returns the error code for the most recently invoked API function that can succeed or fail.
pbms_error() Returns a null-terminated string containing the error message for the most recently invoked API function that can succeed or fail.
pbms_connect() Establishes a connection to a PBMS BLOB streaming server via a host name and port number.
pbms_mysql_connect() Establishes a connection to a PBMS BLOB streaming server via a MySQL connection handle.
pbms_close() Closes a statement handle or a connection to the PBMS BLOB streaming server.
pbms_set_option() Sets a connection option value for the PBMS connection handle.
pbms_get_option() Gets a connection option value for the PBMS connection handle.
pbms_add_metadata() Adds a metadata header to a PBMS statement.
pbms_clear_metadata() Removes all metadata headers from a statement.
pbms_reset_metadata() Resets the metadata cursor to the start returning the number of metadata values in the statement.
pbms_next_metadata() Gets the next metadata value from the statement's metadata cursor.
pbms_get_metadata_value() Gets a specific metadata value for the BLOB just downloaded on the statement.
pbms_set_md5_digest() Set the MD5 digest for the BLOB about to be uploaded on the statement.
pbms_get_md5_digest()
Get the Md5 digest for the BLOB just downloaded on the statement.
pbms_get_info() Get BLOB headers/metadata only.
pbms_put_data() Sends data to the PBMS BLOB streaming server and receives a BLOB reference after completion.
pbms_get_data() Get data from the PBMS BLOB streaming server.
pbms_is_blob_reference() Checks the format of the BLOB reference and returns true if it looks like a valid reference.
pbms_get_blob_size() Get the size of the BLOB for the specified BLOB reference.
pbms_init_udfs() Declares the PBMS UDFs on the server if they do not already exist.
pbms_enabled() Test is a specific engine is streaming-enabled.
Functions used with tables created by non streaming-enabled engines
pbms_init_table() Converts any longblob columns in the table into streaming-enabled columns by creating PBMS notification triggers on them.
pbms_reset_table_blobs() Recreates PBMS notification triggers on 'longblob' columns in the table.
pbms_drop_table_blobs() Notifies PBMS to remove all BLOB references for a table.
pbms_table_renamed() Notifies PBMS that a streaming-enabled table has been renamed.


14.2 PBMS Connection Options

Option
Parameter type
Default Description
PBMS_OPTION_
KEEP_ALIVE
unsigned int* false A boolean flag to indicate if the keep-alive flag should be set on HTTP requests.
PBMS_OPTION_
HOST
(const char *)* localhost The connection's Blob Streaming server host IP address. (Read Only)
PBMS_OPTION_
PORT
unsigned int* 8080 The connection's Blob Streaming server host port number. (Read Only)
PBMS_OPTION_
TRANSMITION_TIMEOUT
unsigned int* 0 If greater than zero this sets a limit to how long a blob transfer can take.
PBMS_OPTION_
DATABASE
(const char*)* none The database associated with the connection.


14.3 Function Definitions
pbms_bool pbms_library_init(pbms_bool thread_safe)

Description

Initialize PBMS API. This should be called to initialize the PBMS library before you call any other PBMS library function.

Parameters

Return value


void pbms_library_end()

Description

Terminate PBMS API use. This should be called to free all PBMS library resources when you are done using the PBMS library.


int pbms_errno(PBMS pbms)

Description

For the handle specified by pbms, pbms_errno() returns the error code for the most recently invoked API function that can succeed or fail.

Parameters

Return value


const char *pbms_error(PBMS pbms)

Description

For the handle specified by pbms, pbms_error() returns a null-terminated string containing the error message for the most recently invoked API function that can succeed or fail.

Parameters

  Return value
PBMSC pbms_connect(const char* host, unsigned int port, const char *database)

Description

Establishes a connection to a PBMS BLOB streaming server running on host. pbms_connect() must complete successfully before you can execute any other API functions that require a valid PBMS connection handle structure. 

Internal connections to the PBMS server are only opened as needed and are closed after each access unless the PBMS_OPTION_KEEP_ALIVE is set to true. Even then the connection will only remain open for as long as the server's max_keep_alive setting allows.

Parameters

Return value
PBMS pbms_mysql_connect(struct st_mysql *mysql, const char *database)

Description

Establishes a connection to a PBMS BLOB streaming server associated with the MYSQL connection handle. 

Parameters

Return value
void pbms_close(PBMS pbms)

Description

Closes the connection and frees all of it's resources.  

Parameters


pbms_bool pbms_set_option(PBMS pbms, enum pbms_option option, const void *in_value)

Description

Sets a connection option value for the PBMS handle.

Parameters

Return value
pbms_bool pbms_get_option(PBMS pbms, enum pbms_option option, const void *out_value)

Description

Gets a connection option value for the PBMS handle.

Parameters

Return value
pbms_bool pbms_add_metadata(PBMS pbms, const char *name, const char *value)

Description

Adds the metadata to the connection to be associated with the next BLOB uploaded on the connection.

Parameters

Return value


pbms_bool pbms_clear_metadata(PBMS pbms)

Description

Removes all metadata associated with the connection.

Parameters

Return value


int pbms_reset_metadata(PBMS pbms)

Description

Resets the position of the metadata cursor to the start of the metadata for the downloaded BLOB.

Parameters

Return value


pbms_bool pbms_next_metadata(PBMS pbms, char *name, char *value, size_t *v_size)

Description

Fetches the next metadata record.

Parameters

Return value


pbms_bool pbms_get_metadata_value(PBMS pbms, const char *name, char *value, size_t v_size)

Description

Gets a specific metadata value from the connection's metadata cursor.

Parameters

Return value


pbms_bool pbms_set_md5_digest(PBMS pbms, const char *md5_digest)

Description

Specifies the MD5 digest for the BLOB data about to be sent. This is not required but is a good thing to provide if you are concerned about possible corruption of data in transit.

Parameters

Return value


pbms_bool pbms_get_md5_digest(PBMS pbms, char *md5_digest)

Description

Gets the MD5 digest string of the BLOB just downloaded..

Parameters

Return value


typedef char *(* PBMS_CALLBACK_FUNC) (void *caller_data, char *buffer, size_t *buffer_size, size_t size);

Description

The user defined callback function is called when transferring data. The callback will continue to be called until an error occurs or the callback returns NULL.

Parameters

Return value

Example:

See the examples provided for pbms_put_data() and pbms_get_data().


pbms_bool pbms_put_data(PBMS pbms, const char *table, const char *alias, char *ref, size_t total_data_size, char *buffer, size_t data_size, PBMS_CALLBACK_FUNC cb, void *caller_data)

Description

Sends data to the PBMS BLOB streaming server and receives a BLOB reference after completion.

Parameters

Return value

Example:

#define BUFFER_SIZE 100
char *my_put_file_callback(void *caller_data, char *buffer, size_t *buffer_size, size_t size)
{
    int *fh = (int*) caller_data;
    void *data = NULL;

    if (size) {
        data = buffer;
        *buffer_size = read(fh, data, BUFFER_SIZE);
    }
    return data;
}

main ()
{
    PBMS pbms;
    PBMS_BLOB_REF blob_ref;
    char buffer[BUFFER_SIZE];

    pbms_library_init(false);

    pbms = pbms_connect("localhost", 8080, "test");

    char *data = "A test string.";
    size_t size = strlen(info.data) +1;

    // Send a blob directly with out using the callback:
    pbms_put_data(pbms, NULL, "myBlob1", &blob_ref, size, data, size, NULL, NULL);

    printf("Blob reference = \"%s\"\n", (char *) &blob_ref);

    // Send a file using a callback:
    int fh = open("myblob.jpg", O_RDONLY, 0000777);

    data = NULL;
    size = lseek(fh, 0, SEEK_END);
    lseek(fh, 0, SEEK_SET);
    pbms_put_data(pbms, NULL , "MyBlob2", &blob_ref, size, buffer, 0, my_put_file_callback, &fh);

    printf("Blob reference = \"%s\"\n", (char *) &blob_ref);
    close(fh);

    pbms_close(pbms);
    pbms_library_end();
}


pbms_bool pbms_get_data(PBMS pbms, const char *ref, char *buffer, size_t buffer_size, PBMS_CALLBACK_FUNC cb, void *caller_data)

Description

Get data from the PBMS BLOB streaming server for the specified BLOB reference or alias.

Parameters

Return value

Example:

 #define BUFFER_SIZE 100
char *blob_to_file_callback(void *caller_data, char *buffer, size_t *buffer_size, size_t size)
{
i    nt *fh = (int*) caller_data;

     // Write out the last block of data received.
     write(*fh, buffer, *buffer_size);

     if (!size)
        return NULL; // No more date.

     buffer_size = BUFFER_SIZE;
     return buffer;
}

char *my_get_callback(void *caller_data, char *buffer, size_t *buffer_size, size_t size)
{
     MY_INFO *info = (MY_INFO*) caller_data;

     if (!size)
         return NULL;

    info->size = 0;
    return info->data;
}

main ()
{
     PBMS pbms;
     char buffer[BUFFER_SIZE];
     int fh;

     pbms_library_init(false);

    pbms = pbms_connect("localhost", 8080, "test");

     fh = open("myblob.jpg", O_WRONLY | O_CREAT | O_TRUNC, 0000777);

    pbms_get_data(pbms, "MyBlob1", buffer, BUFFER_SIZE, blob_to_file_callback, &fh);

    printf("Blob size = %d"\n", lseek(fh, 0, SEEK_END));
     close(fh);

     pbms_close(pbms);
     pbms_library_end();
}


pbms_bool pbms_get_info(PBMS pbms, const char *ref )

Description

Get metadata and headers from the PBMS BLOB streaming server for the specified BLOB reference or alias.

Parameters

Return value

Example:

main ()
{
   PBMS pbms;
   char buffer[PBMS_META_VALUE_SIZE];
   time_t secs;

   pbms_library_init(false);

   pbms = pbms_connect("localhost", 8080, "test");

   pbms_get_info(pbms, "MyBlob1");

   printf("BLOB Info:\n");
   pbms_get_metadata_value(pbms, MS_ALIAS_TAG, buffer, NULL);
   printf("ALIAS: %s", buffer);

   pbms_get_metadata_value(pbms, MS_BLOB_SIZE, buffer, NULL);
   printf("SIZE: %s", buffer);

   pbms_get_metadata_value(pbms, MS_CHECKSUM_TAG, buffer, NULL);
   printf("CHECKSUM: %s", buffer);

   pbms_get_metadata_value(pbms, MS_ACCESS_COUNT, buffer, NULL);
   printf("ACCESS_COUNT: %s", buffer);

   pbms_get_metadata_value(pbms, MS_CREATION_TIME, buffer, NULL);
   secs = atol(buffer);
   ctime_r(secs, buffer);
   printf("CREATION_TIME: %s", buffer);

   pbms_get_metadata_value(pbms, MS_LAST_ACCESS, buffer, NULL);
   secs = atol(buffer);
   ctime_r(secs, buffer);
   printf("LAST_ACCESS: %s", buffer);

   pbms_close(pbms);
   pbms_library_end();
}


pbms_bool pbms_is_blob_reference(PBMS pbms, const char *ref)

Description

Checks the format of the BLOB reference and returns true if it looks like a valid reference. This doesn't guarantee that the reference references a valid BLOB on the PBMS BLOB streaming server only that the format of the BLOB reference is correct.  

Parameters

Return value
pbms_bool pbms_get_blob_size(PBMS pbms, const char *ref, size_t *size)

Description

Get the size of the BLOB for the specified BLOB reference.

If the reference is a PBMS BLOB URL then the blob size is imbedded in the URL and so the size can be returned directly. But it the ref is not a URL it is assumed to be a BLOB alias and so a request will be sent to the PBMS server to get the information.

Parameters

Return value
pbms_bool pbms_init_udfs(struct st_mysql *mysql)

Description

Declares the PBMS UDFs on the server if they do not already exist.

Parameters

Return value


pbms_bool pbms_enabled(struct st_mysql *mysql, const char *engine, pbms_bool *enabled)

Description

Tests if an engine engine is streaming-enabled.

Parameters

Return value


pbms_bool pbms_init_table(struct st_mysql *mysql, const char *database, const char *table, pbms_bool no_blobs_ok)

Description

Converts any longblob columns in the table into streaming-enabled columns by creating PBMS notification triggers on them.

Parameters

Return value


pbms_bool pbms_reset_table_blobs(struct st_mysql *mysql, const char *database, const char *table)

Description

Similar to .pbms_init_table() except that it is meant to be called if an alter table has been done that may have dropped or added longblob columns or changed the ordinals of longblob columns.

Parameters

Return value


pbms_bool pbms_drop_table_blobs(struct st_mysql *mysql, const char *database, const char *table)

Description

Notifies PBMS to remove all BLOB references for a table. This should be called after a streaming-enabled table has been dropped.

Parameters

Return value


pbms_bool pbms_table_renamed(struct st_mysql *mysql, const char *database, const char *old_table, const char *new_table )

Description

Notifies PBMS that a streaming-enabled table has been renamed.

Parameters

Return value

_________________________________________________________________________________________________
Copyright 2009 by PrimeBase Technologies GmbH