Recognizing entity names in text

1 Figa - Cedar
2 Profiling and time comparison
3 Time required to create a dictionary
4 Size

Figa - Cedar

Preserves the interface of figav08, but adds new features, no additional files are required for spellchecking and autocomplete. Adds the option to create a dictionary from namelist using switches -d and -n, and save the dictionary using switch -w FILE.

TODO: parameter -q, surround entities in quotations

Help:

 rebuilt and updated: FItGAzetter Cedar, Jan 4th, 2016, Andrej Rajcok, xrajco00@stud.fit.vutbr.cz
 updated: FItGAzetter v0.8, July 9th, 2014, Peter Hostacny, xhosta03@stud.fit.vutbr.cz
 updated: FItGAzetter v0.7c, November 16th, 2013, Karel Brezina, xbrezi13@stud.fit.vutbr.cz
 FItGAzetteer v0.35c, September 14th, 2010, Marek Visnovsky, xvisno00@stud.fit.vutbr.cz based on:
 fsa Ver. 0.49, March 18th, 2009, (c) Jan Daciuk,jandac@eti.pg.gda.pl

Usage:

 ./figav08 [options]..."

Parameters:

-a - enables autocomplete function
-b - returns offset in bytes instead of characters
-d FILE - specifies tree file or namelist
-n - file given in -d, is namelist
-w FILE - writes given tree from -d into file
-f FILE - specifies input file
-h - prints help
-m NUMBER - specifies number of returned entities (default 5) [ONLY AUTOCOMPLETE]
-o - enables entity overlapping
-p - prints out a string
*-s - enables spellchecking and specifies spellchecking automaton [ONLY FIGA]
-x - returns all possible entities [ONLY AUTOCOMPLETE]

Note: *Spellchecking is at least 7 times slower.

2 Profiling and time comparison

New system is 50-60% faster, comparison was made for 750.000 processed entities with time and gprof, but without parameters -g -pg.

Library	Time - program time	Time with -pg -g at build - program time	Time - program gprof
CEDAR	7,0s	10,8s	3s
FIGA	13,9s	29,7s	10s

Comparison between CEDAR and DARTS-CLONE for 750.000 processed entities.

Library	Time - program time
CEDAR	4,6s
DARTS-CLONE	4,4s

3 Time required to create a dictionary

Library	Number of items	Namelist size	Time	Dictionary size
CEDAR	15 192 879	412 MB	6m, 33s	1044 MB
CEDAR	747 215	17 MB	16s	51 MB
DARTS-CLONE	747 215	17 MB	14s	13 MB

4 Size

Comparison for 750.000 processed entities.

Library	Dictionary size
DARTS	13 MB
CEDAR	50 MB
FIGA	10 MB - may require another test

Recognizing entity names in text

Table of contents

Figa - Cedar

2 Profiling and time comparison

3 Time required to create a dictionary

4 Size