What is document triage

Sunday, March 29, 2020

What is document triage

What is document triage, Define document triage, Purpose of document triage in natural language processing, what are the steps in document triage, why document triage is important

Document Triage

Document triage is the process of converting a set of digital files into well-defined text documents. It is one of two stages of text pre-processing.

Document triage process may involve one or more of the following steps based on the origin of the files being processed;

Character encoding identification – For any document to be machine readable, the characters and numbers should be represented in a character encoding. Character encoding is to store text as binary data and we have different character encoding schemes (ASCII, Unicode, UTF). Character encoding identification step is to determine the character encoding used in a text file.

Language identification – A document may consist of texts in a single language or multiple languages. This step is to identify the language(s) used in the document.

Text sectioning - Identifies the actual content within a file while discarding undesirable elements, such as images, tables, headers, links, and HTML formatting.

**********

Go to NLP Glossary

Go to Natural Language Processing Home page

TOPICS (Click to Navigate)

Sunday, March 29, 2020