4 Running the database

Back to TOC

This chapter explains all aspects to run a cego database environment.

4.1 cego - Database backend

All database backend functionality is provided by the cego program. Cego provides two major run modes for the database backend. The batch mode is used for an initial database load or database testing reasons. To serve client requests in a multithreading environment, cego should be started in daemon mode. Both runtime modes are described in more detail in the following.

4.1.1 User setup mode

User information is stored with encrypted password information in the database XML configuration file. Database users are used to connect to the database db or admin port ( with the cgclt or cgadm program ) and perform database or administration tasks for any tableset. To access the database, first a user has to be created and initialized. To create it, the cego program is called using the adduser mode.
$ cego --mode=adduser --dbxml=cegodb.xml --user=cgadm/cgadm --role=admin
The role identifier describes the permission context of the user. Roles can be defined in a powerful way to control access on table level. For the database administration, the default role admin is used.

4.1.2 Role setup mode

Database roles are used to describe the access to the database. If created, a role can be assigned to any defined user or by creating a new user.
$ cego --mode=addrole --dbxml=cegodb.xml --role=myrole

4.1.3 Role permission setup mode

For a created role, a number of permission entries can be setup. A permission entry describes the following parameters

The tableset parameter identifies the corresponding tableset, the permission should be defined on. The filter is regular expression for the object names, which should be covered. The permission parameter specifies the kind of access, which should be allowed. It is either read, write, modify, exec or all. To identify the permission entry, a permission identifier has to specified To allow read access for all database objects starting with letter a, use the following permission entry
$ cego --mode=addperm --role=myrole --permid=P1 --tableset=TS1 --perm=read --filter="a.*" --dbxml=cegodb.xml
To allow any access to tableset TS1, use the following permission entry
$ cego --mode=addperm --role=myrole --permid=P2 --tableset=TS1 --perm=all --filter=all --dbxml=cegodb.xml

4.1.4 Tableset define mode

New tablesets can be defined using the tableset define mode. Since the command line options are pretty long, it is recommended to create a shell script around the cego definition command.
The script show below illustrates the complete definition of a tableset
$ cat defcgdb
TSROOT=./db
TSTICKET=./db/tsticket.xml
SYSSIZE=100
TMPSIZE=100
LOGFILESIZE=1000000
LOGFILENUM=3
APPFILE=./db/data01.dbf
APPSIZE=5000
SORTAREASIZE=5000000
cego --mode=define -tsdef=tsroot:${TSROOT},tsticket:${TSTICKET},syssize:${SYSSIZE},\
tmpsize:${TMPSIZE},logfilesize:${LOGFILESIZE},logfilenum:${LOGFILENUM},appfile:${APPFILE},\
appsize:${APPSIZE},sortareasize:${SORTAREASIZE} --tableset=TS1 --dbxml=cegodb.xml
After the definition of the tableset has been done, the tableset is just known by the database but still can not be used. It first have to be created using the tableset creation mode.

4.1.5 Tableset creation mode

Any previous defined tableset just can be used, if all required datafiles have been created und initialized. For this, the creation mode can be used.
$ cego --mode=create --tableset=TS1 --dbxml=cegodb.xml
Please note : If the specified tableset already exists, any contained data is lost by the creation procedure

4.1.6 Tableset import/export mode

For backup or database copy reasons, any tableset can be exported in a binary or XML formated file. The file can be saved and imported if required. cego supports the following import and export modes

For machine independent exports, the XML export should be used. The data is stored in a readable form in a XML structure. Any contained blob data are uuenoded to ASCII.
$ cego --mode=xmlexport --tableset=TS1 --dbxml=cegodb.xml --expfile=exp.xml
Tableset TS1 is exported to the export file exp.xml. The data can be imported using the corresponding import mode
$ cego --mode=xmlimport --tableset=TS1 --dbxml=cegodb.xml --impfile=imp.xml --doLogging
The dologging option enabled redo logging and might be ommited, if a checkpoint is forced after the import has been completed. For performance reasons, the binary export can be used. Binary exports and imports are up to 10 times faster than XML exports and are recommended for high volume exports.
$ cego --mode=binexport --tableset=TS1 --dbxml=cegodb.xml --expfile=exp.bin
The binary export can be imported with the following options
$ cego --mode=binimport --tableset=TS1 --dbxml=cegodb.xml --impfile=imp.bin --doLogging
For further speed up, the doLogging option might be ommited. In this case, no online redo logs are written but the database should be synched or restarted after the import has been completed.

4.1.7 Tableset batch mode

For any batch processing reasons, the cego database can be started up in batch mode. The batch command sequence in the given sql file is processed subsequently and the program is terminated after the batch file has processed completely or any error has been occured.

If starting in batch mode, a number of command line arguments can be defined
Option Meaning
--dbxml=<xmlfile> Name fo the database XML configuration file
--batchfile=<batchfile> Name of the SQL batch file to process
--tableset=<tableset> Name of target table set
--poolsize=<poolsize> Number of database pages, allocated for the buffer pool. Depending on the size of the buffer pool, this may have impact on the batch processing performance
--ignore Ignore single statement errors. The batch run is continued anyway
Please note : The -w option may be useful, if any drop table statements should prepare the database for the batch run and it is not clear, if any of the tables still exists.

Sample:
$ cego --mode=batch --batchfile=batch.sql --dbxml=cegodb.xml --tableset=TS1

4.1.8 Daemon mode

To serve client database requests, cego must be started in daemon mode. For this, a minimal valid XML database configuration file must be available. This may be
<?xml  version="1.0" ?>
<!DOCTYPE CEGO_DB_SPEC>
<DATABASE NAME="cegodb" PAGESIZE="16384" ADMINPORT="2000" LOGPORT="3000" DATAPORT="2200" PIDFILE="./pid">
</DATABASE>
Please note, that this minmal configuration file does not contain any tableset or admin user information. To be able to connect to the running damoen later on, you should create an admin user. See section Creating an admin user how to create an admin user.

If the configuration file is prepared, the daemon can be start up. For this, a number of command line arguments can be defined
Option Meaning
--dbxml=<xmlfile> Name fo the database XML configuration file
--numdbthread=<numDbThread> The amount of server threads available for handling client database requests
--numadminthread=<numAdminThread> The amount of admin threads available for handling admin requests. For Version 1.x, this can be ignored
--numlogthread=<numLogThread> The amount of log threads available for handling log requests For Version 1.x, this can be ignored
--poolsize=<poolsize> Number of database pages, allocated for the database bufferpool. Depending on the amount of available physical memory, this parameter should be set up.
--mediator<mediator> Hostname of the mediator host to connect to. For Version 1.x, this can be ignored
--tableset<tableset>,<tableset>... List of the tablesets, that should be started up when the daemon is invoked. This is optional and is just useful, if any valid tablesets already exist.
--cleanup The cleanup option enables a datafile check to verify each file for allocated but unused pages. This may happen, if the allocatePage method was called, but for any reason ( e.g. a system crash ) the page could never be written back to file. Turning on this option delays the startup of the database but ensures consistency of the datafiles.
--forecload To get all defined view and procedure compiled directly after database startup, the forceload option can be used. Otherwise, views and procedures are compiled on demand. The demand compile strategy is the default and reduces database startup time significantly
--nolockstat Statistics for locks are not calculated. This switch can be enabled, if no lock monitoring is needed. Disabling lock statistics can increase the performance behaviour of the database.
--pidfile=<file> Filename of the process id file. This option overwrites the the corresponding PIDFILE entry in the database xml. It may be used for multi instance installations, where all instance specific parameters should be set up on command line level
The following sample starts cego in daemon mode. The database buffer pool is allocated with 1000 pages.
$ cego --mode=daemon --dbxml=cegodb.xml --poolsize=1000
To terminate the database daemon session, it is recommended to bring cego down softly. This can be achieved by sending a SIGINT signal to the daemon process. If the daemon process was not sent to the background, a Ctrl-C key sequence can be used. Otherwise the process should be killed using the unix kill command
$ kill -s SIGINT <cego pid>
If the PIDFILE attribute is set up in the XML configuration file or via command line, the process id for the cego process can be retrieved from this file Assuming the pidfile was set to the file ./pidfile the database can be stopped with
$ kill -s SIGINT `cat ./pidfile`

4.1.9 General options

The following general options are available for all modes
Option Meaning
--logfile=<logfile> Filename to write all log messages
--lockfile=<lockfile> Name of the lock file. The lock file avoids to run several cego processes on a single database, since this may result in data corruption. The default location for the lockfile is /tmp/cego.lock. In case of multiple cego instance running on one host, this should be changed using the command line option

4.2 cgclt - Console database client

If the cego server program was successfully started in daemon mode, a client database connection could be established using the client program cgclt. The program is invoked with valid tablset connection parameters to the cego server host. Tableset connection parameters are <servername> , <portnumber> and <tableset> . Portnumber is the defined DBPORT attribute value in the database XML file. See section General attributes , how to configure the database portnumber. The target tableset must be started up on a sever. For authorization reasons, a valid username and password must be given with the command.

In the following sample, we use the host geek and port number 2200. We connect to the tableset TS1 with user lemke
$ cgclt --server=localhost --port=2200 --tablset=TS1 --user=lemke/lemke
CGCLT >
If the prompt appears, the connection has been established. Now we can input any Cego SQL or database information retrievel command. See section The Cego SQL language for a detailed description of all available SQL commands.

Each command is completed with a semicolon ( ; ). So you are able to put in complex sql commands over several lines. Just if the semicolon is scanned, the command is sent to the cego server for further processing. To finish you client session, put in the quit command followed by a semicolon.
CGCLT > quit;
ok ( 0.000 s )

4.2.1 Batch file processing

Any cego database request can also be triggered from a client via batchprocessing. For this a corresponding batchfile is given as an argument to the client program.
$ cgclt --server=localhost --port=2200 --tablset=TS1 --user=lemke/lemke --batchfile=mybatch.sql
If any error occures during the batch processing, the client program terminates execution and returns an exitcode 1; To ignore any batch processing errors, you can give the ignore option to the command line
$ cgclt --server=localhost --port=2200 --tablset=TS1 --user=lemke/lemke --batchfile=mybatch.sql --ignore

4.2.2 SQL dump file creation

Database objects, which have been build can be dumped out to a SQL file. This might be useful for several purposes. For example a database SQL dump can be edited and reloaded to a different database. The cgclt supports the dump mode with the following command
$ cgclt --server=localhost --port=2200 --tablset=TS1 --user=lemke/lemke --dumpfile=mydump.sql
In addition with the structure option --structure, the tableset is exported without any table content.

4.2.3 Single command processing

Sometimes it might be useful to execute just a single statement. For this, the command processing mode can be used.
$ cgclt --server=localhost --port=2200 --tablset=TS1 --user=lemke/lemke --cmd="select a from t1;"
The given command must be a complete SQL command including a closing semicolon .

4.2.4 Flow control

For large table data retrievel, the flow control can be enabled. If enabled, the row output stops after the the flow limit number and the user is requested to confirm further rows, abort the flow or reset to the beginning. In the following, there is a sample cgclt session using the flowctl feature.
CGCLT > flowctl 5;
Flow control enabled
ok ( 0.000 s )
CGCLT > select * from t1;
+-----------+-----------+-------------------------------+
|      t1_1 |      t1_1 |                          t1_1 |
|         a |         b |                             c |
+-----------+-----------+-------------------------------+
|        10 |        12 |                           AAA |
|        11 |         7 |                           BBB |
|        14 |         7 |                           CCC |
|         3 |         2 |                           CCC |
|        14 |         6 |                           CCC |
Next, Reset, Abort ? ( n, r, a ) n
|        14 |         6 |                           CCC |
|        14 |         6 |                           CCC |
|        14 |         6 |                           CCC |
|        14 |         6 |                           CCC |
|        14 |         6 |                           CCC |
Next, Reset, Abort ? ( n, r, a ) r
|        10 |        12 |                           AAA |
|        11 |         7 |                           BBB |
|        14 |         7 |                           CCC |
|         3 |         2 |                           CCC |
|        14 |         6 |                           CCC |
Next, Reset, Abort ? ( n, r, a ) a
+-----------+-----------+-------------------------------+
ok ( 17.649 s )
CGCLT > flowctl off;
Flow control disabled
ok ( 0.000 s )

4.2.5 Query abortion

In case of any select, update or delete operation, it is possible to cancel a ( long ) running query request. For delete operations, some of the tuples might be lost, if there has not been started any embedding transaction. Since update operations are always embedded in a transaction ( user or forced ), a rollback is performed in any case.
Queries are canceled using the Cntrl-C key sequence.
CGCLT > update t1 set a = 10000;
^CAborting query ..
Error : Update aborted by user
Query failed
ok ( 2.705 s )
CGCLT > select count(*) from t1 where a = 10000;
+---------------------+
|                AGGR |
|            count(*) |
+---------------------+
|                   0 |
+---------------------+
ok ( 0.104 s )

4.2.6 Profiling

Using profiles enables the user to use a predefined database connection record for building up database connections. The connections are defined in a profile called .cgprofile located int the users home directory. The profile can contains several profile entries of the following form
<pid>:<hostname>:<port>:<tableset>:<prompt>:<user>=<pwd>

Here is a sample .cgprofile
TS1:dude.local:2200:TS1:TS1 > :lemke:lemke
TS2:cego.local:2200:TS2:TS2 > :hugo:hugo
To use any of the profile entries, the cego client program has to be invoked with the profile command line parameter.
$ cgclt --profile=TS1

4.3 cgblow - Load simulation client

With the cgblow program, insert, delete and update operations with random data can be done on any user tables. I used this program for any kind of performance tests but also for stability tests of the database. The program connects to the cego database as a normal cego database client program and performs its operations as given in the argument line. The supported command line options for the program are described in the following
Option Meaning
--mode=<insert | delete | update | arbitrary | proc> For insert, update, delete, update or procedure simulation use corresponding option value
--server=<host> Name of the cego database server
--port=<port> Number of the database port
--user=<user/password> User connection string
--tableset=<tableset> Name of the tableset
--table=<table> Name of the table to operate on
--proc=<procedure> Name of the procedure to operate on
--simulate Instead of connecting to a running database, the query commands are printed to stdout
--iset=<format string> Insert format string of the form <type>,<dim>
--pset=<format string> Procedure parameter format string of the form <type>:<dim>,...
--uset=<format string> Update format string of the form <attr>:<type>:<dim>,...
--dcond=<format string> Delete condition format string of the form <attr>:<type>:<dim>,...
--ucond=<format string> Update condition format string of the form <attr>:<type>:<dim>,...
--interval=<report interval>> After each interval, the required execution time is printed
--count=<count>> Number of operations to perform
--append Set insert at last flag. This appends new tuples to the table, instead of skip through the table and search for a free entry
--dotransaction Run as one transaction. All operations are embedded in a transaction.
--version Show version
--help Show help
If server or port arguments are ommited, the query operations are printed to stdout.

The insert format string is a comma separated list of insert column specs for int and string values. A column spec is of the form i:<int> for integer values or s:<int> for string values. The integer argument for integer values specified the range of the value while the integer argument for the string value specifies the length of the string.

We give two samples for using insert and delete mode.

$ cgblow --mode=insert --simulate --table=mytab --iset=i:10,s:10 --count=10 --append
set append on;
insert into mytab values (8,'EZQWNFAONB');
insert into mytab values (2,'HJLPKUIJAG');
insert into mytab values (0,'WATDXQRDRJ');
insert into mytab values (7,'NEVKXMOIZB');
insert into mytab values (8,'LYMPURYSYS');
insert into mytab values (9,'VNKWXUEOTU');
insert into mytab values (9,'FKNCNOOWGF');
insert into mytab values (4,'EXWRMMMVPL');
insert into mytab values (5,'FLZUIKCMDY');
insert into mytab values (0,'SPNQZQDQMI');
This inserts tuples to table mytab with an arbitrary integer column value between 0 and 10 and an arbitray string value of 10 byte.
$ cgblow --mode=delete --simulate --table=mytab --dcond=a:i:10 --count=10
delete from mytab where a=3;
delete from mytab where a=9;
delete from mytab where a=6;
delete from mytab where a=2;
delete from mytab where a=0;
delete from mytab where a=2;
delete from mytab where a=1;
delete from mytab where a=3;
delete from mytab where a=2;
delete from mytab where a=6;
This deletes tuples of table mytab, where the delete condition matches attribute value a as an integer value between 0 and 10.

4.4 cglog - Log reader

With the cglog program, anywritte online and offline redo log file can be read and analyzed. This might be useful for debugging reasons but also to trace the current transaction log for information about the current running transactions.
Option Meaning
[ --log <log> ] The log argument specified the logfile file, which should be read
--version Prints out the current version
--help Prints help for usage
The following sample illustrates the usage of cglog
$ cglog --log=db/TS1redo0.log 
Read len : 25
LSN=1984:TS=02.05.2009 22:11:35:LogAction=SYNC:
--- No log data ---
Read len : 115
LSN=1985:TS=02.05.2009 22:11:35:ObjName=t1:ObjType=1:LogAction=CREATE:
--- Log Data Start ---
ObjectName: t1
ObjectType: table
TableSetId: 2
Table: t1
Schema: 
   TableName: 
   TableAlias: 
   AttrName: a
   Type: int
   Len: 4
   TableName: 
   TableAlias: 
   AttrName: b
   Type: string
   Len: 30
EntrySize = 63
FileId: 32
PageId: 0
LastFileId: 32
LastPageId: 0
EntrySize = 83
--- Log Data End   ---
Read len : 80
LSN=1986:TS=02.05.2009 22:11:35:ObjName=t1:ObjType=1:LogAction=INSERT:
--- Log Data Start ---
1|xxx|
--- Log Data End   ---
Read len : 80
LSN=1987:TS=02.05.2009 22:11:35:ObjName=t1:ObjType=1:LogAction=INSERT:
--- Log Data Start ---
1|ggg|
--- Log Data End   ---
Read len : 80
LSN=1988:TS=02.05.2009 22:11:35:ObjName=t1:ObjType=1:LogAction=INSERT:
--- Log Data Start ---
4|yyy|
--- Log Data End   ---
Read len : 80
LSN=1989:TS=02.05.2009 22:11:35:ObjName=t1:ObjType=1:LogAction=INSERT:
--- Log Data Start ---
3|zzz|
--- Log Data End   ---
Read len : 32
LSN=1990:TS=02.05.2009 22:11:35:ObjName=t1:ObjType=1:LogAction=DROP:
--- No log data ---

4.5 Localization

List all supported languages
$ locale -a
de_DE
de_DE.ISO8859-1
de_DE.ISO8859-15
de_DE.UTF-8
...
Set up the appropriate language
$ export LANG=en_US
now cego or cgclt use the selected locale

4.6 Logging

The module base logging feature enables the database administrator to define a specific logging filter for individual logging requirements. The logging filter is defined in the database xml definition as module subtags of the database tag.
<DATABASE NAME="chkdb" PAGESIZE="16384" ....>
<MODULE NAME="ALL" LEVEL="NOTICE"></MODULE>
<MODULE NAME="CegoMain" LEVEL="NOTICE"></MODULE>
<MODULE NAME="CegoTableManager" LEVEL="DEBUG"></MODULE>
<MODULE NAME="CegoAdminThreadPool" LEVEL="DEBUG"></MODULE>
...
</DATABASE>
Four different logging levels are provided for each module

If the module name is ALL, the logging level is set up for all modules. You also may specify dedicated logging levels for the following registered modules
Module Meaning
CegoMain Main module
CegoDbThread Database thread
CegoLogThread Log thread
CegoAction Parser dispatcher
CegoDbHandler Database network handler
CegoSelect Select query module
CegoXPorter Import and export manager
CegoTableManager Database table manager
CegoObjectManager Database object manager
CegoBufferPool Database buferpool manager
CegoLogManager Database log manager
CegoLogHandler Log network handler
CegoAdminThread Admin thread
CegoObjectCursor Low level object cursor
CegoIndexManager Database index manager
CegoImpInStream Import manager
CegoExpOutStream Export manager
CegoFileHandler Database file manager
CegoDatabaseManager Database manager
CegoTransactionManager Transaction manager
CegoMediatorThread Mediator thread
CegoDistManager Database distribution manager
CegoDbThreadPool Database thread pool manager
CegoLogThreadPool Log thread pool manager
CegoAdminThreadPool Admin thread pool manager
CegoBeatThread Beat thread
CegoDistDbHandler Database distribution manager
CegoDistCursor Database distributed cursor
CegoAdminHandler Admin handler

4.7 Compatibility attributes

To be compliant to other database products, cego provides a set of attributes, which can be set up in the database XML configuration file. The attributes are specified for the whole database in the DATABASE tag.
Attribute Meaning
CSMODE Case sensitive mode, if values is set to on, identifiers and string values are compared case sensitive, else the case is ignored. Most common databases are case insensitive, so this might be useful
QESCMODE Quote escape mode. If the mode is enables ( value set to ON ), the quote character is used as the escape character for string values instead of the backslash character. This is rather oracle style
DTFORMAT Date format string ( e.g. DTFORMAT="%a %b %d %H:%M:%S %Y"). The data format as represented in query results or XML export files .
Sample
<DATABASE NAME="cegodb" ...  CSMODE="ON" QESCMODE="ON" DTFORMAT="%a %b %d %H:%M:%S %Y">

4.8 Tuning attributes

For performance tuning reasons, several database configuration parameters can be set up.
Attribute Meaning
NUMRECSEMA The number of allocated record semaphores
NUMSYSPAGESEMA The number of allocated system page semaphores
NUMDATAPAGESEMA The number of allocated data page semaphores
NUMIDXPAGESEMA The number of allocated index page semaphores
NUMRBPAGESEMA The number of allocated rollback page semaphores
NUMDATAFILESEMA The number of allocated data file semaphores
NUMBUFFERPOOLSEMA The number of allocated buffer pool semaphores
MAXFIXTRIES Range for the bufferpool hash to fix an appropriate buffer page A higher value increases the filling degree of the bufferpool but decreases the bufferpool access time. A common value is 10 to 100