SpaCy: Training a custom Named Entity Recognition (NER) model

FS Ndzomga
3 min readSep 28, 2023
Photo by Arthur Mazi on Unsplash

Training a custom Named Entity Recognition (NER) model using spaCy involves several steps: preparing a labeled dataset, training the model, and then evaluating its performance.

Preparing a Labeled Dataset

For training, you’ll need a dataset where each entity you want the model to recognize is labeled. The dataset format for spaCy should look like this:

TRAIN_DATA = [
# GPS_COORDINATES
("Coordinates: 123.45 N, 67.89 W", {"entities": [(13, 29, "GPS_COORDINATES")]}),
("Location: 12.34 S, 56.78 E", {"entities": [(11, 24, "GPS_COORDINATES")]}),
("Lat/Long: 98.76 S, 54.32 W", {"entities": [(10, 24, "GPS_COORDINATES")]}),
("Position: 65.43 N, 21.45 E", {"entities": [(10, 25, "GPS_COORDINATES")]}),
("Coords: 45.67 S, 78.90 W", {"entities": [(8, 22, "GPS_COORDINATES")]}),
("Latitude/Longitude: 34.56 N, 78.90 E", {"entities": [(20, 37, "GPS_COORDINATES")]}),
("GPS: 23.45 N, 56.78 W", {"entities": [(5, 19, "GPS_COORDINATES")]}),
("Place: 67.89 S, 12.34 E", {"entities": [(7, 21, "GPS_COORDINATES")]}),
("Geolocation: 45.67 N, 89.01 W", {"entities": [(13, 28, "GPS_COORDINATES")]}),
("Mapping: 78.90 N, 12.34 E", {"entities": [(9, 24, "GPS_COORDINATES")]}),

# BANK_ACCOUNT
("Bank account: 1234-5678-1234-5678", {"entities": [(13, 33…

--

--

FS Ndzomga
FS Ndzomga

Written by FS Ndzomga

Engineer passionate about data science, startups, philosophy and French literature. Built lycee.ai, discute.co and rimbaud.ai . Open for consulting gigs

No responses yet