Named Entity Recognition
Named entity recognition (NER) is a natural language processing task that identifies and classifies named entities in text — such as people, organisations, locations, and dates — into predefined categories.
Named entity recognition (NER), also referred to as entity identification or entity chunking, is a natural language processing (NLP) task concerned with locating and classifying named entities mentioned in unstructured text into predefined semantic categories. Standard entity categories include persons (PER), organisations (ORG), geographic locations (LOC), dates and times, monetary values, percentages, and miscellaneous named items. For example, in the sentence "Maybank announced a partnership with Microsoft in Kuala Lumpur on Monday", a NER system would identify Maybank as an organisation, Microsoft as an organisation, Kuala Lumpur as a location, and Monday as a date.
NER is typically framed as a sequence labelling task. Each token in a sentence receives a label indicating whether it is part of a named entity and, if so, the entity type. A common labelling scheme is the BIO notation: B marks the beginning of an entity, I marks tokens inside a continuing entity, and O marks tokens that are not part of any entity.
Historical Development
Early NER systems were built using hand-crafted rules and lexical resources such as gazetteers — lists of known named entities such as city names, company registries, and person name dictionaries. These rule-based systems were precise for well-defined domains but required extensive manual effort and did not generalise well across domains or languages.
Statistical sequence labelling models, particularly conditional random fields (CRFs), dominated the field through the 2000s and into the 2010s. CRFs model the conditional probability of a label sequence given an input sequence, taking into account both the input features and the dependencies between adjacent labels. They outperformed earlier generative models such as Hidden Markov Models by conditioning on rich, overlapping input features.
Bidirectional LSTM-CRF models, which combine a bidirectional LSTM for encoding context with a CRF layer for structured output prediction, became the dominant neural NER architecture from approximately 2016. These models learn character-level and word-level representations, capturing morphological patterns useful for recognising entities in unseen forms.
Transformer-Based NER
The introduction of pre-trained transformer models — particularly BERT (Bidirectional Encoder Representations from Transformers) in 2018 — substantially advanced NER performance. Fine-tuning BERT on labelled NER datasets achieves state-of-the-art results on standard benchmarks. The model's bidirectional context representation captures long-range dependencies that LSTM-based models handle less effectively.
Subsequent models including RoBERTa, ALBERT, and domain-specific variants such as BioBERT (for biomedical text) and FinBERT (for financial text) have been fine-tuned for NER in specialised domains. In multilingual settings, XLM-RoBERTa enables NER across more than 100 languages from a single model, which is particularly valuable for low-resource languages where sufficient training data for language-specific models is unavailable.
Large language models can perform NER through prompting — presenting the text and asking the model to identify and classify entities in its output — but fine-tuned smaller models typically achieve higher precision on well-defined entity taxonomies in production settings.
Entity Linking and Knowledge Graphs
NER is often the first step in a broader information extraction pipeline. Entity linking (EL) or entity disambiguation takes the entity spans identified by a NER system and maps them to canonical entries in a knowledge base such as Wikidata, DBpedia, or a domain-specific knowledge graph. This transforms ambiguous surface mentions — Apple could refer to the technology company, the fruit, or a person's surname — into unambiguous entity identifiers.
The combination of NER and entity linking enables the construction and population of knowledge graphs from unstructured text, where entities become nodes and the relationships between co-occurring entities become edges. Search engines, question answering systems, and document intelligence platforms rely on this pipeline to extract structured information from large text corpora.
Applications
Document intelligence and information extraction systems use NER to automatically process and structure large volumes of unstructured documents. Legal contract analysis tools extract parties, dates, obligations, and governed jurisdictions. Financial document processing identifies company names, monetary figures, and reporting periods in earnings filings and analyst reports. Medical record analysis extracts patient information, diagnoses, medications, and dosages from clinical notes.
In regulatory compliance, NER enables automated screening of communications, transactions, and documents for mentions of sanctioned entities, politically exposed persons (PEPs), and geographies subject to trade restrictions. News and media monitoring services use NER to track coverage of specific companies, individuals, and topics across large article corpora.
Cybersecurity applications apply NER to threat intelligence feeds and security reports to extract indicators of compromise (IoCs) such as IP addresses, domain names, and malware family names.
See Also
References
- Lample, G. et al. (2016). Neural Architectures for Named Entity Recognition. Proceedings of NAACL-HLT 2016.
- Devlin, J. et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019.
- Conneau, A. et al. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of ACL 2020.
- Abdullah, M.T. et al. (2023). A Survey of Named Entity Recognition for Bahasa Malaysia. Proceedings of the International Conference on Asian Language Processing (IALP 2023).
- Finkel, J.R., Grenager, T., and Manning, C. (2005). Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of ACL 2005.