Integration image

Sintelix offers two-way integration.  It can be integrated into your workflow via its web services or via its Java API.  Additionally, your text processing and corporate databases can be linked into Sintelix's internal work flow to enhance its entity extraction and resolution capabilities and to insert links from documents and annotations back to your corporate data.

Two-Way Integration

Integration into External Work Flows

The Sintelix API allows access to all its key capabilities via web services or Java integration.  It's web services are versatile, quick to set up, and naturally allow distributed operation.  Java integration eliminates the (sizable) overheads from HTTP and message passing over a network.  In both approaches, information is passed in the form of XML text, so avoiding the complexities of conventional middleware and integration based on Java objects.

Integration into External Work Flows

The services Sintelix offers are:

Document Entity Recognition

All optional features such as topic-detection can be accessed via this service. Variants include:

  • Return a normalized XML document with entities placed in-line in text,
  • Return a normalized XML document with entities placed together after the text, and
  • Storage of the normalized document and extracted entities within Sintelix’s database; return of a document ID, and optionally, the IDs of the extracted entities.

The entity recognition process is configured and controlled from Sintelix’s Recognize IDE accessible from the navigation bar.  Multiple configurations can be made available simultaneously.  Document processing requests can specify the configuration they require.

Generic Document Processing

The document entity recognition service is just one possible document workflow that can be accessed.  Sintelix engineers can create entirely new workflows tailored to your needs.

Data Retrieval from Sintelix’s Database

All the data objects held in Sintelix’s database can be retrieved in serialized XML form. Sintelix's search results can be retrieved as an XML file; and a report definition language is provided so that you can specify the file's structure.

Information Extraction

Sintelix's full information extraction capability can be accessed by submitting a document and the name of the extraction template to be used. A set of database tables containing the information extracted from the document returned as an SQL document or as an XML file.

Protocols & Performance
  • Multiple HTTP modes:

    • Single request per socket
    • Multiple request per socket
    • Unlimited connections
  • Web service test suite
  • Direct Java API
  • Windows or Linux environments
  • Entity extraction at operates at about 2 million words per minute on a 4-core workstation of 2010 vintage.
  • Without optimization, F1 scores in the 90-93% range over a basket of entity types are likely.  Following some optimization, performances of better than 95% are achievable.

Software Integrations

Semantic Sciences offers integrations with:

Integrating External Services into Sintelix Work Flows

Sintelix offers the ability to create plug-ins that:

  • enable external services to extend or replace workflows
  • enable GUI components to be created for configuring how Sintelix uses these external services

Integrating External Services into Sintelix Work Flows

Web Browsers Supported by Sintelix’s GUI

  • Chrome (version 7+)
  • Firefox (version 7+)
  • Safari (version 5.1+)
  • Internet Explorer (version 9+)

Currently, we find Chrome gives the best results, can cope easily with large documents, and provides low latencies for user actions.  Choosing the right browser is quite important — as it is frequently the cause of slow response.

Server Hardware Requirements

Sintelix has been designed to make the best possible use of the hardware resources.  It works well on a dual core laptop with 4GB of RAM and an SSD hard drive to provide a very snappy response.  In operational applications we recommend that 5GB of RAM be made available to the program.  If processed documents are stored within the system’s database, we recommend budgeting six times the disk space used for the source documents.