Recognizing entity names in text

Table of contents

Figa - Cedar

Preserves the interface of figav08, but adds new features, no additional files are required for spellchecking and autocomplete. Adds the option to create a dictionary from namelist using switches -d and -n, and save the dictionary using switch -w FILE.

TODO: parameter -q, surround entities in quotations

Help:

 rebuilt and updated: FItGAzetter Cedar, Jan 4th, 2016, Andrej Rajcok, xrajco00@stud.fit.vutbr.cz
 updated: FItGAzetter v0.8, July 9th, 2014, Peter Hostacny, xhosta03@stud.fit.vutbr.cz
 updated: FItGAzetter v0.7c, November 16th, 2013, Karel Brezina, xbrezi13@stud.fit.vutbr.cz
 FItGAzetteer v0.35c, September 14th, 2010, Marek Visnovsky, xvisno00@stud.fit.vutbr.cz based on:
 fsa Ver. 0.49, March 18th, 2009, (c) Jan Daciuk,jandac@eti.pg.gda.pl
        

Usage:

 ./figav08 [options]..."
        

Parameters:

Note: *Spellchecking is at least 7 times slower.


2 Profiling and time comparison

New system is 50-60% faster, comparison was made for 750.000 processed entities with time and gprof, but without parameters -g -pg.

Library Time - program time Time with -pg -g at build - program time Time - program gprof
CEDAR 7,0s 10,8s 3s
FIGA 13,9s 29,7s 10s

Comparison between CEDAR and DARTS-CLONE for 750.000 processed entities.

Library Time - program time
CEDAR 4,6s
DARTS-CLONE 4,4s

3 Time required to create a dictionary

Library Number of items Namelist size Time Dictionary size
CEDAR 15 192 879 412 MB 6m, 33s 1044 MB
CEDAR 747 215 17 MB 16s 51 MB
DARTS-CLONE 747 215 17 MB 14s 13 MB

4 Size

Comparison for 750.000 processed entities.

Library Dictionary size
DARTS 13 MB
CEDAR 50 MB
FIGA 10 MB - may require another test