Empress Fast Text Data Retrieval

Home

Empress Markets

News & Events

Presentations

Press Releases

Product Profile

Success Stories

Technical News

White Papers

Join Mailing List

For More Info

Empress Technical News – January 2009

Database Text SearchIndex – Fast Text Data Retrieval – Part 1

Introduction

Empress Ultra Embedded 10.20 offers many additional features to application developers. One of those features is a Text Search Index capability.

Empress Text Search Index

Empress Ultra Embedded V10.20 Text Search Index capability empowers application developers to implement an efficient search for database records using keywords/tokens/phrases. The most typical usage would be to associate those keywords with particular character/text based attribute in a database table.

The search index capability is developed as an additional set of C calls that are used in conjunction with Empress C/C++ Kernel Level API – mr Routines. The search index is a user maintained index – an index not maintained via Empress database engine calls.

Application would supply the list (array) of tokens/keywords/phrases on insertion in the Empress database in order to create a text search index.

Lets take an example of storing song titles in an Empress TEXT data type attribute. For the famous Beatles song “I Want To Hold Your Hand” the list of tokens/keywords/phrases could look like as follows:

Want
Hold
Hand
I
Hold Your
Your Hand
Hold Your Hand

In the above example the list contains seven tokens. Tokens can contain a single word or multiple words. Words do not necessarily have to be contained in the original song title. This could be beneficial when, for example, using either token Mc or Mac to get a song title “Old Macdonald”.

Once created an Empress text search index enables very powerful searches using inserted tokens/keywords.

In the presence of hundreds of thousands of song titles a search for the song title with a token Hold could result in the result set containing the following titles:

You Really Got A Hold on Me (Beatles)
If We Hold On (Diana Ross)
I Want To Hold Your Hand (Beatles)
Hold Me Back (Ac/Dc)
Hold Me, Thrill Me, Kill Me (U2)

Retrieval Using Search Index

The following example provides the actual program code (select_text.c) using Empress C/C++ Kernel Level API – mr Routines to show how to retrieve records from the table songs using the search index on the same table. The search index C API calls are in bold. The example performs the retrieval of all the records from the table that contain token “Hold”. When translated into SQL the example does something like:

SELECT id, title FROM songs WHERE title HAS “Hold

#include < mscc.h>
#include < tsi_api.hx>
#ifdef WINCE

global_shared_func (char* format, ...)
#define swprintf
#define printf mssio_printf
#endif
#define DATABASE L" karaokedb"
int msmain (int argc,
char* argv[])
{
void*
void*
void*
void*
void*
void*
wchar_t* id_value
wchar_t* title_value
void*
long*
char

}

= mropen (DATABASE, L" songs" , 'r')
songs_recdesc = mrmkrec (songs_tabdesc)
id_attrdesc = mrngeta (songs_tabdesc, L" id"
title_attrdesc = mrngeta (songs_tabdesc, L" title" )
index_handle= mstsi_open (songs_tabdesc , title_attrdesc, 'r')
id_value = mrspv (id_attrdesc)
title_value = mrspv (title_attrdesc)
printf (" Songs that contain token Hold\n\n"
printf (buf, %-12s %-22s\n" , " Id" Title" )
record_list = mstsi_get (index_handle, L" Hold" , 0)
if (record_list == 0)
{
printf (" No songs with token Hold\n"
return 0
}
qual = mrqlst (songs_tabdesc, record_list)
retrieve_desc = mrgetbegin (qual, songs_recdesc, (void*) 0)

(mrget (retrieve_desc))
{
mrcopyv (songs_recdesc, id_attrdesc, id_value)
mrcopyv (songs_recdesc, title_attrdesc, title_value)
sprintf (buf, " %-12ls %-22ls\n" , id_value, title_value)
printf (" %s" , buf)
}
mrgetend (retrieve_desc)
mrfree (id_value)
mrfree (title_value)
mrfrrec (songs_recdesc)
mstsi_close (index_handle)
mrclose (songs_tabdesc)
printf ( select1_text is done\n" )
return 0
}

Performance Considerations

When executing the above mr program the performance could be several orders of magnitude better than when performing the same search using an SQL query, e.g.:

SELECT id, title FROM songs WHERE title LIKE “%Hold%”

SELECT id, title FROM songs WHERE title SMATCH “*

In another example a search for a keyword " Embedded" was performed against all titles stored in Empress database containing Wikipedia data. The table pages contained 7,649,051 titles and related information such as and author.

Using the text search index, the search completed in 0.077 seconds with 131 titles in the result set.

For the comparison, an SQL query was executed to do the similar search on the TEXT attribute title

SELECT id, title FROM pages WHERE title SMATCH '

It required 35.314 seconds for the SQL query to complete.

The Empress text search index was about 500 times faster than the SQL query!!

EmpressSoftware Inc.
www.empress.com

	Company Information GSA Contracts Careers Privacy Policy Contact US	Products Empress RDBMS Empress Servers Empress API Free Trial Empress iPad Apps	Services Consulting Mobile Apps Development Technical Support Training Courses Empress Extreme	Partners Distributors Business Alliances Partner Programs GSA Program Distributors Program	Media Empress Markets News and Events Press releases Technical News Success Stories
USA: 301-220-1919 Canada & International: 905-513-8888 Copyright © 2014 Empress Software Inc. [email protected] Join Empress Mailing List