Use this component when you wish to identify when two entities are the same entity or when they are related in some ways.
Featured Packages
-
JedAI (Java gEneric DAta Integration) toolkit
source code: https://github.com/scify/JedAIToolkit
JedAI is a modular, open-source toolkit for highly-scalable end-to-end
Entity Resolution (ER) that is domain-agnostic (no expert knowledge) and
structure-agnostic (works with structured, e.g., relational,
semi-structured, e.g., RDF, and un-structured, e.g., free-text, entity
descriptions). JedAI can be used in three different ways:
-
pydeduple
https://github.com/gpoulter/pydedupe
pydeduple is a deduplication tool developed in Python, originally developed as an internal tool for linking a directory database. It first identifies groups of records based on some measures, and then for each group, compare each pair of records within the group before classifying whether each pair is a match or not a match. There is also a data matching package in R called recordlinkage.
-
febrl
https://sourceforge.net/projects/febrl/
febrl (Freely Extensible Biomedical Record Linkage) developed in Python by Australian National University, matches entities by standardizing and cleaning data before “fuzzily matching” the records.
-
dedupe
https://github.com/datamade/dedupe
dedupe, developed in Python by datamade, uses machine learning techniques to match, deduplicate and match entities over structured data.
-
Magellan
https://sites.google.com/site/
anhaidgroup/projects/magellanMagellan is a publicly available entity matching tool in Python (py_entitymatching package) developed by University of Wisconsin. It enables matching two tables (or one table against itself) using supervised learning techniques. The website provides further documentation on the py_entitymatching package.
Both schema and data matching uses string matching as a fundamental building block. Magellan also includes a py_stringmatching package for string matching. See their website for a discussion on available open-source string matching packages.
Registered Packages
-
DeepMatcher
DeepMatcher is a python package for performing entity and text matching using deep learning. It provides built-in neural networks and utilities that enable you to train and apply state-of-the-art deep learning models for entity matching in less than 10 lines of code.
This is part of the Magellan project at UW-Madison, led by Prof. AnHai Doan.