Capability

Capability picture

Sintelix is a server-based application that provides a fully featured and integrated set of tools for extracting information of all types from files and electronic documents.  Sintelix also offers fast entity resolution to create entity networks and linked entity databases.  Its graphical user interface (GUI) enables it to be rapidly tested and configured.  The GUI supports users to carry out sophisticated data investigation tasks.  Sintelix can easily be integrated within a workflow via its API, which offers both web service and Java integration.  Sintelix is distributed via an installer for Microsoft Window's systems and as a WAR file for Linux/Tomcat.



Operate Use Sintelix straight out of the box.

Load imageLoad

As documents are loaded into Sintelix they are analysed and stored in it's database.  Sintelix processes data at great speed (2.8 GB per hour on a standard workstation).  Documents are organised in to collections and user interfaces are provided for management and review of loaded material.  Individual documents and their highlighted entities can be viewed and edited.

Normalize imageNormalize

The data loading process normalises documents from a wide variety of file types into HTML.  Structure, image references and links are preserved as they may be useful later for the extraction and presentation of information.

Recognize imageRecognize

Sintelix offers world-beating entity recognition performance for English text.  Besides the standard classes (person, location, etc.), it extracts topics, relationships (e.g. family relationships, employer relationships) and resolves anaphora (links pronouns to their base noun).  Sintelix extracts entity features including gender and name components of persons and map references of locations and many others.  Key-value pairs can be recognized and extracted.

Extract imageExtract

In many tasks, customers need to populate databases with data extracted from reports, e-mails and web pages.  Sintelix’s can also extract information using the structure of the document.  For example, this approach enables very specific items of information to be extracted from complex tables and combined with others to create multifield database records.  Sintelix also extracts metadata such as creation date, e-mail fields and document type.

Integrate imageIntegrate

Sintelix operates as a server process and offers its capabilities for integration into your existing work flows via web services or via Java API.  The integration system allows you to review results from your API calls via Sintelix’s user interface and makes for rapid debugging and trouble-free integration.

Optimize Get the highest possible performance.

Model imageModel

Frequently, users adopt a tree of concepts (an ontology) for classifying extracted entities.  Standard concepts include Person, Location, Organization, Date-Time, Money, etc.  However, in many cases where the information extraction task is focused on a specialised domain, new concepts are needed.  Sintelix offers a drag-and-drop concept editor for modelling new domains.

Learn imageLearn

Sintelix can learn to recognize the entities you want and highlight them in exactly the way you want. It can be tuned to give excellent performance for the types of documents you intend to process.  It also extracts entity features, so that each extracted entity becomes a multi-field database record. It learns from example documents and normally achieves exceptional entity recognition performances.  Sintelix's learning has the property that the more documents you train with, the better the results tend to become. Extending the training set with one type of document does not make performance worse for other types.

Evaluate imageEvaluate

Evaluation against a gold standard set of documents provides a good indicator of the quality of the system's output.  Sintelix provides a suite of excellent high-productivity tools for evaluation and improvement.  With well written text, you can expect to get auto-testing performances of 97% F1 and cross-validation performances of 95% F1. Cross-valiation performances provide a good estimate of the performance you would expect to obtain under operational conditions.

Understand Find patterns of entities, metadata and keywords.

Search imageSearch

Sintelix offers an advanced search capability with faceted search across document text, metadata and entities.  Search can be further refined via selection of collections and concept trees.  Results are listed on the left of the screen, documents are shown in the middle screen and history and favourites are listed at the right.