Extending the Service
=====================
All of the stuff in this section is in the "future work" category
though none of it is particularly difficult.

Support for Different DBMSs
---------------------------
NExScI TAP was designed from the start with multiple DBMS support
in mind.  Python PEP 249 defines a generic interface for talking to
databases and it has been implemented for most systems.  We have so
far only fully integrated the Oracle packages (cx_Oracle) and SQLite3
as we use those operationally here but we have investigated several
others (starting with PostgreSQL and MySQL and plan to incorporate
them as time permits.  We are open to suggestion with regard to other
DBMSs and to overall priorities.


Python Database API (DB-API) 2.0 (PEP 249)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The API defines a straightforward interface to databases in general.
The initial connect() method takes a variable set of arguments,
depending on the DBMS.  This returns a "conection", which in turn
can be asked for a "cursor" (connection.cursor()).  Cursors can
be given a query (cursor.executeSQL(sql)) and then asked to step
through the results.

Oracle
~~~~~~
For Oracle,
the initialization takes the form::

   conn = cx_Oracle.connect (dbuser, dbpassword, dbserver)

PostgreSQL
~~~~~~~~~~
The most popular PostgreSQL database driver is psycopg2.  Initialization
is similar to Oracle::
conn = psycopg2.connect(host='localhost',database='exodev', user='exo_dbuser', password='XXXXXXXX')

and the same sort of INI block can be used (see the Configuration
section)::

   [pgdb]
   host=localhost
   database=exodev
   user=exo_dbuser
   password=XXXXXXXX

SQLite3
~~~~~~~
SQLite3 has the simplist of initializations.  Since it works directly
on files with no server/login, it can be initialized with just::

   conn = sqlite3.connect('exodev.db')

where 'example.db' is any SQLite3 file.  This is an intrinsic
part of the SQLite3 Python package.

MySQL
~~~~~
For MySQL,
we use the MySQLdb package::

   conn = MySQLdb.connect('localhost', 'dbuser', 'XXXXXXX', 'exodb')


Table Upload
------------
The TAP specification defines an (optional) UPLOAD capability, which 
nexsciTAP does not currently support.  Uploaded tables are loaded into
a separate database schema (visible to the user as "TAP_UPLOAD") and
tables in it are generally meant to support chaining queries together
across databases, though there is no such constraint on how they are 
used.

There are two situations where uploaded tables could be particularly
useful.  If the uploaded table contains extended information on records
in a archive table record (and the "ID" of that record), then a simple
join of the two would produce an augmented catalog.

Perhaps the most alluring possibility involves positional cross-comparison
as the first step in actual catalog cross-matching.  This is a 
complicated process (the nearest positional match between two tables 
is frequently not the best match).

NExScI TAP does not currently support UPLOAD, though it is a fairly
straightforward thing to add.  To date, none of the projects we support
have expressed a need for the above.


Dealing with Proprietary Data
-----------------------------
NExScI TAP has been extended to support access to proprietary data for
two projects.  In general, this involves two complications.  The first
is the requirement to authenticate the users.  Our projects are using
a simple approach of having a separate login service and temporary 
authorization cookies but there are a wide array of more advanced 
mechanisms.

THe second complication involves modifying the user query before 
submission to the database to include joins with additional tables
which identify which projects a user belongs and which data records
those projects currently have access to.

A simpler security setup (*e.g.* one where you had the luxury of identifying
data records directly with users) could be done in simpler manners.
However, since there is no one way to do this, we can't distribute a
single solution as part of NExScI TAP.  We would be happy to share the
code we do have if your needs are similar.


Dealing with Extended Objects (Images in Particular)
----------------------------------------------------
Tesselation-based spatial indexing works great for point-like data like 
astronomical catalogs and can even be used with small extended objects.
For instance, if you have metadata for images that are all small you
can index the image center coordinates and then pad the queries by the
maximum size of the images.

This falls apart for truly extended objects (image sets where some of
the images are ten degrees across and region specifications like
constellation outlines).  For this kind of data there are more effective 
approaches, most notably R-Trees, which work by creating a hierarchy of 
object clusters.  Building the tree involves reviewing the whole structure
every time a new object is added.  This is very slow but results in an
extremely efficient search framework.

The objects and the tree of clusters are defined by their bounding
boxes and at all levels it is perfectly OK for these boxes to overlap;
we are just trying to get to a state where we can quickly exclude large
subsets from consideration.

Unlike the tesselation approach, where you can leverage the basic 
DBMS B-Tree indexing,  R-Trees require a different internal indexing
structure.  Most DBMSs now support some form of this but often
at an extra cost and pretty much always in a way that requires additional
database software installation and configuration.

If you are interested in just spatial searching of image metadata and
don't need relational constraints (or are willing to apply these as
post-filters) there is a stand-alone tool that comes as part of the
Montage package that contains an optimized R-Tree implementation on
file-based data.

If you truly want to fold R-Tree processing into the DBMS framework,
the best idea is to adopt a DBMS where this has been done well 
(*e.g.* PostgreSQL) and update NExScI TAP to translate ADQL to the
PostgreSQL-based formalism.