SpaCy: Training a custom Named Entity Recognition (NER) model
3 min readSep 28, 2023
Training a custom Named Entity Recognition (NER) model using spaCy involves several steps: preparing a labeled dataset, training the model, and then evaluating its performance.
Preparing a Labeled Dataset
For training, you’ll need a dataset where each entity you want the model to recognize is labeled. The dataset format for spaCy should look like this:
TRAIN_DATA = [
# GPS_COORDINATES
("Coordinates: 123.45 N, 67.89 W", {"entities": [(13, 29, "GPS_COORDINATES")]}),
("Location: 12.34 S, 56.78 E", {"entities": [(11, 24, "GPS_COORDINATES")]}),
("Lat/Long: 98.76 S, 54.32 W", {"entities": [(10, 24, "GPS_COORDINATES")]}),
("Position: 65.43 N, 21.45 E", {"entities": [(10, 25, "GPS_COORDINATES")]}),
("Coords: 45.67 S, 78.90 W", {"entities": [(8, 22, "GPS_COORDINATES")]}),
("Latitude/Longitude: 34.56 N, 78.90 E", {"entities": [(20, 37, "GPS_COORDINATES")]}),
("GPS: 23.45 N, 56.78 W", {"entities": [(5, 19, "GPS_COORDINATES")]}),
("Place: 67.89 S, 12.34 E", {"entities": [(7, 21, "GPS_COORDINATES")]}),
("Geolocation: 45.67 N, 89.01 W", {"entities": [(13, 28, "GPS_COORDINATES")]}),
("Mapping: 78.90 N, 12.34 E", {"entities": [(9, 24, "GPS_COORDINATES")]}),
# BANK_ACCOUNT
("Bank account: 1234-5678-1234-5678", {"entities": [(13, 33…