Documention - MyBS v0.5.04
MyBS stands for BLOB Streaming for MySQL. The MyBS engine is a MySQL storage engine which enables the storage and streaming of BLOB data in and out of a MySQL database.
MySQL 5.1 provides a pluggable storage engine API which allows 3rd parties to extend the storage capabilities of the server. Storage engines are built as shared libraries and can be loaded at runtime by the server. Most storage engines provide an alternative way of storing the table data. By doing this they add functionality to the server in terms of transactional support, special indexing methods, data warehousing and archiving facilities or even clustering and high-availability capabilities.
To use a particular engine, you specify the name of the engine when creating a table in MySQL. For example:
CREATE TABLE my_test_tab (
n_id INTEGER PRIMARY KEY AUTO_INCREMENT,
n_text LONGBLOB,
n_string VARCHAR(255)
) ENGINE=PBXT;
This example creates a table using the PBXT storage engine.
The MyBS engine specializes in the storage and retrieval of BLOB data. Unlike other storage engines, it does not actually provide table storage. So you would never actually create tables of type MyBS (the exception being the MyBS system tables - see below).
Instead, MyBS works together with a standard storage engine, in order to add functionality to MySQL tables. The most basic additional functionality provided by MyBS is:
- Direct HTTP stream-based access to database information. For this purpose, the MyBS engine includes a lightweight HTTP server. The port of the server can by setting the mybs_port system variable on the mysqld command line. BLOB data can be uploaded using the HTTP PUT or POST method, and downloaded using the HTTP GET method.
- A repository for BLOB data. If an engine is not suitable for BLOB storage it can use the BLOB repository provided by MyBS. In this case, the engine simply stores a BLOB reference in the table column, instead of the actual BLOB data. MyBS tracks references to the repository in order to know when to free up resources.
2. "Streaming-enabled" Engines
A streaming-enabled storage engine is a MySQL storage engine that supports the MyBS server-side streaming API. Data stored in the tables of a streaming-enabled engine can be uploaded and downloaded via the MyBS HTTP server capabilities. The tables of a streaming-enabled engine can also store BLOB data in the MyBS BLOB repository.
Currently, the only engine that is streaming-enabled is PBXT, version 0.9.88 later. This means that in order to use MyBS you must install both the MyBS and the PBXT storage engines.
The most recent version of both engines are provided on this web-site. You can download the source code and compile the engines yourself or you can use one of the binaries that are provided. However, binaries are only available for a very limited number of platforms.
3. Building and Installing MyBS
Please refer to the INSTALL document in the source code distribution for details of how to build and install the MyBS plug. NOTE: Building and installing the PBXT storage engine is similar.
4. Accessing the database using BLOB References
MyBS uses the concept of a BLOB reference. A BLOB reference is, in fact, a Uniform Resource Locator (URL), similar to the Web address of an internet page.
There are currently 3 types of BLOB references:
This URL identifies a particular field in the database, and has the form:
http://host-name:mybs-port//database-name/table-name/column-name/row-condition
Where row-condition has the following format: col1=val1&col2=val2&...
BLOB field references can be used to download data from a particular database field using an HTTP GET command. For example, assuming we inserted the following row in the example table from above:
mysql> INSERT my_test_tab (n_id, n_string) VALUES (1, "This is an example string");
To download the 'n_string' field you could do the following:
$ curl http://localhost:8080/test/my_test_tab/n_string/n_id=1
The result returned by this command will be "This is an example string".
Currently, only download of data is supported using BLOB field references. In addition, repository BLOB data cannot yet be downloaded using a BLOB field reference. Further development of this access method will depend on the implementation of adequate security.
NOTE: For security reasons, BLOB field references are not enabled by default. To enable BLOB field reference set the mybs_field_references system variable to TRUE. This can be done on the mysqld command line using --mybs-field-referecnes, or when the server is running as follows:
mysql> set global mybs_field_references=TRUE;
** SECURITY WARNING - MyBS v0.5.02 or EARLIER ** Do not expose the MyBS HTTP port to a non-secure network, since the access to streaming-enabled tables using a BLOB field reference bypasses standard SQL-based security.
4.2. A BLOB Repository Reference
This type of URL identifies a BLOB in the MyBS BLOB repository, and has two forms:
http://host-name:mybs-port//~*database-name/~repository-id
or
http://host-name:mybs-port//~*database-name/_repository-id
For example: http://localhost:8080/~*test/~1-150-660294d3-0 or http://localhost:8080/~*test/_1-246-57d672ac-0
The prefix '~*' helps to identify a BLOB repository URL. This is used by BLOB enabled engines to identify BLOB URLs when inserted into the database. The prefix is also used by the JDBC driver and other client software to recognize BLOB URLs.
BLOB repository reference with a '~' character before the repository-id are references that are already associated with a database table. BLOB repository reference with a '_' character before the repository-id are references that are not associated with a particular table, and are the result of uploading a BLOB using a BLOB database reference (see below).
The repository-id is a value generated by the MyBS BLOB repository. It has the following form: table_id-blob_id-access_code-server_id for references that are associated with a database table, and repo_id-repo_offset-access_code-server_id for references that are not associated with a database table.
The access code (access_code) is a unique 4-byte value generated by MyBS whenever a BLOB is uploaded into the BLOB repository. This code is required to retrieve the BLOB making it very unlikely for anyone to retrieve the BLOB without the correct URL.
This URL identifies a table in the database, and has the following form:
http://host-name:mybs-port//database-name/table-name
BLOB table references are used to upload data to a particular table in the BLOB repository. For example:
$ curl -d "This is an example text" http://localhost:8080/test/my_test_tab
The result returned by this command will be a partial BLOB repository reference, of the form:
~*database-name/~repository-id
In order to retrieve the value from the repository, add the host name and MyBS port to the URL, for example:
$ curl "http://localhost:8080/~*test/~1-150-660294d3-0"
4.4. A BLOB Database Reference
This type of URL identifies a database repository, and has the following form:
http://host-name:mybs-port//database-name
BLOB database references are used to upload data to a BLOB repository without specifying a table. For example:
$ curl -d "This is an example text" http://localhost:8080/test
The result returned by this command will be a partial BLOB repository reference, of the form:
~*database-name/_repository-id
In order to retrieve the value from the repository, add the host name and MyBS port to the URL, for example:
$ curl "http://localhost:8080/~*test/_1-246-57d672ac-0"
5. Storing BLOB References in Tables
As explained above, BLOB data can be uploaded to the BLOB repository using a BLOB database or table references. The upload is done using any standard HTTP client that supports the HTTP PUT or POST method.
For example, using curl I can upload BLOB data as follows:
$ curl -d "This is an example text" http://localhost:8080/test
The result will be a partial BLOB repository reference, for example: ~*test/_1-246-57d672ac-0.
Now to store this BLOB in a database table, you just insert the BLOB reference. For example:
mysql> INSERT my_test_tab (n_text) VALUES ("~*test/_1-246-57d672ac-0");
When the streaming-enable storage engine receives this insert, it will call the MyBS engine in order to "retain" the BLOB repository reference. Retain basically means that MyBS increments the reference count for the BLOB. MyBS keeps a list of references to each BLOB so that it knows when a BLOB can be freed in order to reclaim resources.
Note that BLOB references should currently only be inserted in PBXT table fields of type LONGBLOB. Only in the case of LONGBLOB fields will PBXT call MyBS to retain the BLOB
The following diagram illustrates the steps that are taken to insert a BLOB into the database.
In step (1) the client software uploads the BLOB to the repository using an HTTP PUT request method. On reply, the repository returns a BLOB repository reference in step (2). Steop (3), the reference is placed in the row to be inserted in the database. Using a standard SQL INSERT, the client inserts the row into the database table in step (4). In step (5), the streaming enabled engine recognizes that BLOB URL has been inserted and automatically calls the BLOB Streaming engine to retain the BLOB.
6. Retrieving BLOBs from Tables
Once a BLOB reference has been stored in a table, a standard SELECT will return the BLOB reference, not the actual BLOB data.
For example:
mysql> select n_id, n_text from my_test_tab where n_id=5;
+------+--------------------------+
| n_id | n_text |
+------+--------------------------+
| 5 | ~*test/~2-128-57d672ac-0 |
+------+--------------------------+
The BLOB reference can then be used to retrieve the BLOB data using the HTTP GET method, for example:
$ curl "http://localhost:8080/~*test/~2-128-57d672ac-0"
This is an example text
The following diagram illustrates the process of downloading a BLOB from the repository:

In step (1) the row containing a BLOB repository reference is retrieved from the database using a standard SQL SELECT statement. In step (2) the client software recognizes that a valid BLOB URL is in the database field. With this information it creates a complete URL and downloads the BLOB data from the BLOB repository in step (3), using a standard HTTP GET request method. In step (4) the BLOB data is streamed back to the client application.
The MyBS BLOB repository contains BLOBs uploaded using the HTTP PUT or GET method, for example:
$ curl -d "This is an example text" http://localhost:8080/test
~*test/_1-246-57d672ac-0
This operation returns a reference to the BLOB in the repository. BLOB can be retrieved from the repository using this URL, and the HTTP GET method. For example:
$ curl "http://localhost:8080/~*test/_1-246-57d672ac-0"
This is an example text
7.1. The Repository File Structure
Each database has its own BLOB repository. This means that if a BLOB reference is inserted into a table that belongs to another database, the BLOB will be copied over to the repository of the database containing the table.
A repository consists of a number of repository files. These files are stored in a sub-directory called bs-repository in the MySQL database directory. The repository files have names of the form: repo-file_id.bs, for example repo-241.bs. file_id is the ID of the repository file.
Each repository file can contain a number of BLOBs. The number of BLOBs in a repository file depends on the size of the BLOBs and the current setting of the "repository threshold". The repository threshold is determined by setting the mybs_repository_threshold system variable. This is a value in bytes unless you specify: Kb, MB or GB after the value (note: the parameter must be set as a string value).
BLOBs are written sequentially to the repository files. Once a file reaches the threshold, a new repository files is created.
MyBS writes multiple repository files simultaneously. Each BLOB that is being uploaded is written to a different repository file in parallel.
Repository files may be a maximum of 256 Terabytes in size, and the maximum number of repository files is 16777215.
This provides a theoretically maximum repository size of 4 Zettabytes (2^72 bytes). The maximum size of a single BLOB is also 256 Terabytes.
Once a BLOB has been uploaded, MyBS sets a timer. When the timer expires, MyBS deletes the BLOB automatically, unless the BLOB has been retained in the meantime by a storage engine.
This also applies to BLOBs that are deleted or no longer referenced due to a database UPDATE.
The amount of time waited is determined by the mybs_temp_blob_timeout system parameter. This is a value in seconds. By default the value is set to 10 minutes.
The information about which BLOBs are to be deleted and when, is stored in the temporary log files. These files are stored in the bs-logs sub-directory of each database. The temporary log files have names of the form temp-log_id.bs. log_id is the temporary log file ID.
Once a log is full, a new one is created with the next highest ID. The maximum size of a temporary log file is determined by the mybs_temp_log_threshold system variable, which is a value in bytes, unless one of the following is specified: Kb, MB or GB after the value (note: the parameter must be set as a string value).
The maximum temporary log size is 4GB.
When BLOBs are deleted from the repository files, the free space is not immediately re-used. Instead, the system records the amount of free space in the repository file. This is also know as the garbage level of the file, and is measured as a percentage of the size of the file.
When the garbage level of a repository file exceeds a certain threshold MyBS begins to compact the repository file. It does this by creating a new repository file and copying all valid data to the new file. After this is done, it deletes the old repository file.
The garbage threshold is determined by the mybs_garbage_threshold system variable. Set this variable to a value between 1 and 100. The default value is 50, which means that a repository file is compacted as soon as it contains 50% garbage or more.
Note that compaction may not begin immediately. This is due to the fact that some repository files may be locked. A repository file is locked when it is being used to receive uploaded BLOB data. In this case, the file will only be unlocked when it has reached the repository threshold (mybs_repository_threshold).
MyBS has two system tables which provide a view of the BLOB repository. The tables are called mybs_repository and mybs_reference.

The mybs_repository table contains all the BLOBs in the repository. The mybs_reference table contains a row for each reference to a BLOB. References from the temporary log files are also counted.
8.1. Creating the System Tables
The system tables are currently not created automatically. If you wish to use the system tables, then you must create them in each database that contains a BLOB repository. Create the tables as follows:
CREATE TABLE mybs_repository (
Repository_id INT COMMENT 'The repository file number',
Repo_blob_offset BIGINT COMMENT 'The offset of the BLOB in the repository file',
Blob_size BIGINT COMMENT 'The size of the BLOB in bytes',
Head_size SMALLINT UNSIGNED COMMENT 'The size of the BLOB header - proceeds the BLOB data',
Access_code INT COMMENT 'The 4-byte authorisation code required to access the BLOB - part of the BLOB URL',
Creation_time TIMESTAMP COMMENT 'The time the BLOB was created',
Last_ref_time TIMESTAMP COMMENT 'The last time the BLOB was referenced',
Last_access_time TIMESTAMP COMMENT 'The last time the BLOB was accessed (read)',
Content_type CHAR(128) COMMENT 'The content type of the BLOB - returned by HTTP GET calls',
Blob_data LONGBLOB COMMENT 'The data of this BLOB'
) ENGINE=MyBS;
CREATE TABLE mybs_reference (
Table_name CHAR(64) COMMENT 'The name of the referencing table',
Blob_id BIGINT COMMENT 'The BLOB reference number - part of the BLOB URL',
Column_name CHAR(64) COMMENT 'The column name of the referencing field',
Row_condition VARCHAR(255) COMMENT 'This condition identifies the row in the table',
Blob_url VARCHAR(200) COMMENT 'The BLOB URL for HTTP GET access',
Repository_id INT COMMENT 'The repository file number of the BLOB',
Repo_blob_offset BIGINT COMMENT 'The offset in the repository file',
Blob_size BIGINT COMMENT 'The size of the BLOB in bytes',
Deletion_time TIMESTAMP COMMENT 'The time the BLOB was deleted',
Remove_in INT COMMENT 'The number of seconds before the reference/BLOB is removed perminently',
Temp_log_id INT COMMENT 'Temporary log number of the referencing deletion entry',
Temp_log_offset BIGINT COMMENT 'Temporary log offset of the referencing deletion entry'
) ENGINE=MyBS;
As mentioned above, this table contains one row for every BLOB in the repository. The BLOB is identified by the repository file ID (Repository_id column), and the offset into the file (Repo_blob_offset column).
Blob_size is the uncompressed size of the BLOB in bytes. Head_size is the size of the BLOB header which proceeds the BLOB data.
The Access_code column contains the BLOBs unique 4-byte access code.
Creation_time is the time the BLOB was created. Last_ref_time is updated whenever the BLOB is referenced and Last_access_time is updated each time the BLOB is downloaded.
The Content_type field contains the HTTP content type of the data that was specified when the BLOB was uploaded. It is returned in the HTTP header when the BLOB is downloaded.
The Blob_data contains the actual BLOB data of the BLOB. This mean that the data can be retrieved from this table using a normal SQL SELECT statement.
However, BLOBs in the MyBS repository may be larger the 4GB, which is the maximum size of a MySQL LONGBLOB value. Note that other limitations, such as computer memory and the MySQL maximum packet size will also affect the maximum size of BLOBs that can be returned by a SELECT statement.
This table contains one row for every reference to every BLOB in the database. The reference is identified by the table name (Table_name column) and the BLOB ID (Blob_id column).
The Column_name and Row_condition identify the database field that references the BLOB. Row_condition contains a string of the form col1=val1&col2=val2&...
These columns may be NULL if the BLOB is currently not referenced by any row in the table. In this case, the Remove_in will indicate when the BLOB will be removed if there are no references at all.
Blob_url contains the BLOB repository reference required to retrieve the BLOB.
Repository_id and Repo_blob_offset identify the BLOB in the mybs_repository table.
Blob_size is the uncompressed size of the BLOB in bytes.
Deletion_time is the time the BLOB was deleted. This is the time the BLOB temporary timer was set. The Remove_in column contains the remaining time in seconds, before the reference is deleted. If the reference is removed, and there are not other references to the BLOB, the BLOB will be deleted as well.
Temp_log_id and Temp_log_offset identify where the timer is stored in the temporary log files.
The following system variables are available for controlling the behavior of the MyBS engine.
| Name | Default | Description |
| mybs_field_references | OFF | Enable BLOB field references. Field reference allow table columns to be accessed directly through the BLOB streaming interface. |
| mybs_garbage_threshold | 50 | The percentage of garbage in a repository file before it is compacted. |
| mybs_port | 8080 | The port for the MyBS stream-based communications. |
| mybs_repository_threshold | 128MB | The maximum size of a BLOB repository file. You may use Kb, MB or GB to indicate the unit of the value. A value in bytes is assumed when no unit is specified. |
| mybs_temp_blob_timeout | 600 | The timeout, in seconds, for temporary BLOBs. Uploaded blob data is removed after this time, unless committed to the database. |
| mybs_temp_log_threshold | 32MB | The maximum size of a temorary BLOB log file. You may use Kb, MB or GB to indicate the unit of the value. A value in bytes is assumed when no unit is specified. |
All variables can be set at runtime except mybs_port.
To set variables on the mysqld command line, use the syntax: --variable_name=value, for example:
--mybs_repository_threshold=10GB.
mybs_field_references is a boolean value and can be enabled on the mysqld command line as follows:
--mybs_field_references, and disabled as follows: --skip_mybs_field_references.
On the command line '-' may be used in place of '_', for example: --mybs-garbage-threshold=20.
Variable that can be set at runtime are set using the SET GLOBAL statement, for example:
mysql> set global mybs_temp_log_threshold = "80MB";
Note that mybs_repository_threshold and mybs_temp_log_threshold are string values and so the value assigned must be quoted. To set the value of mybs_field_references which is a boolean value you can use the constant values TRUE, FALSE or ON, OFF. For example:
mysql> set global mybs_field_references = ON;
10. MySQL Connector/J Streaming Enabled
The MySQL Connector/J Streaming Enabled is a version of the standard MySQL JDBC driver which has been modified to enable BLOB streaming with the MyBS BLOB streaming engine.
The modified source code is available with the standard MyBS source code distribition. An already built .jar file with the modifications is also available from http://www.blobstreaming.org/download.
The Connector/J SE makes working with the MyBS BLOB repository transparent to the programmer. It hides the fact that a reference is stored in the table field instead of the actual BLOB data.
BLOBs of any length can be stored and retrieved from the database without setting the value of the max_allowed_packet system variable. This is because BLOB data is actually streamed to and from the database. Values inserted using setBinaryStream(), for example, are automatically uploaded to the BLOB repository by the JDBC driver using HTTP PUT, as described in the section: 5. Storing BLOB References in Tables.
The JDBC driver automatically retrieve data stored in the repository when it is selected from the database. Using getBinaryStream(), you can take control of the stream of data returned by the HTTP GET operation as described in section 6. Retrieving BLOBs from Tables.
Note that the BLOB data is only retrieved when it is actually required. Just selecting rows which contains BLOB stored in the repository does not cause the JDBC driver to fetch the data from the repository. This means table rows that contain BLOBs stored in the repository do not influence the speed of the SELECT. Only when the data is accessed, for example using getBytes(), getString(), getBinaryStream(), etc. is the data actually retrieved from the repository.
Enable BLOB streaming by adding the following option to the JDBC URL: enableBlobStreaming=true. For example:
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection conn = null;
try {
conn = DriverManager.getConnection("jdbc:mysql://localhost/test"+
"?user=root&enableBlobStreaming=true");
// Do something with the connection
}
catch (Exception e) {
// Handle the exception
e.printStackTrace();
}
finally {
if (conn != null)
conn.close();
}
When set to true the JDBC driver will upload values set with setBinaryStream(), setAsciiStream() and setBlob() to the MyBS BLOB repository, but only if a valid length is specified. For example, a value inserted using setBinaryStream() with a length of -1 will be not be uploaded to the repository. Instead the value will by inserted directly into the table, as if enableBlobStreaming were set to false.
Here is an example of how to insert a BLOB into a table.
public void insertBLOB(Connection conn, InputStream in, int len)
  throws SQLException
{
PreparedStatement ps = null;
try {
ps = conn.prepareStatement("insert my_test_tab(n_text) values(?)");
ps.setBinaryStream(1, in, len);
int rows = ps.executeUpdate();
}
finally {
if (ps != null)
ps.close();
}
}
Note that curently only LONGBLOB type columns of PBXT tables may contain streamable BLOB data. This means that when enableBlobStreaming=true you should only use setBinaryStream(), setAsciiStream() and setBlob() with a valid length on columns of type LONGBLOB belonging to a PBXT table.
** WARNING ** The JDBD driver does not check that the database column or table is the correct type . If use setBinaryStream(), setAsciiStream() or setBlob() with a length greater or equal to one on columns that are not of type LONGBLOB, or table type other then PBXT, data will be lost!
To avoid this problem, specify a length of -1 when using setBinaryStream(), setAsciiStream() or setBlob() on columns that are not of type LONGBLOB, or when using a table type other than PBXT.
Selected values of type LONGBLOB are automatically checked for valid BLOB repository references. The reference is then used to automatically download the BLOB data from the BLOB repository. Using a getBinaryStream(), getAsciiStream() or getBlob() you have direct access to the BLOB data stream from the repository.
Here is an example of how to retrieve a BLOB from a table:
public static InputStream selectBLOB(Connection conn, int id) throws SQLException
{
Statement st = null;
InputStream in = null;
ResultSet rs;
try {
st = conn.createStatement();
rs = st.executeQuery(
"select n_text from mybs_test_tab where n_id = "+id);
rs.next();
in = rs.getBinaryStream("n_text");
}
finally {
if (st != null)
st.close();
}
return in;
}
If you required the length of the BLOB before you begin to retrieve the data, then use getBlob(), instead of rs.getBinaryStream().
The curent implementation is based on JDK 1.5.x, which only allows for lengths in setBinaryStream(), setAsciiStream() and setBlob() of type 'int'. This means that the maximum BLOB size that can currently permitted is 2GB.
As mentioned above, streamable BLOBs can only be stored in LONGBLOB type columns in PBXT tables.
10.4. Building and Testing JDBC
This source code distribution of the MyBS storage engine includes the modified source code of the MySQL Connector/J JDBC driver, in a sub-directory called java. This directory also contains a ready build JDBC driver in the form of a .jar file.
For debugging purposes you can include the source code in your project or build the JDBC driver yourself. To do this you will need Apache Ant version 1.6 or newer (available from http://ant.apache.org/). You will also need the Java JDK-1.4.2 or later.
Enter the following command to compile and create the driver:
$ ant dist
After this you can test the BLOB streaming using the 'runtest.sh' shell script:
$ ./runtest.sh
The test connects to a local MySQL server and creates a PBXT table. The test assumes that you have already installed the PBXT and MyBS storage engines.