The 3 Best Open-Source Data Catalog Tools to Consider

The Best Open-Source Data Catalog Tools to Consider

The editors at Solutions Review have compiled this list of the best open-source data catalog tools to consider for your next project.

Searching for data cataloging software can be a daunting (and expensive) process, one that requires long hours of research and deep pockets. The most popular enterprise data catalog tools often provide more than what’s necessary for non-enterprise organizations, with advanced functionality relevant to only the most technically savvy users. Thankfully, there are a distinct group of the best open-source data catalog tools out there. Some of these solutions are offered by vendors looking to eventually sell you on their enterprise product, and others are maintained and operated by a community of developers looking to democratize the process.

In this article, we will examine the best open-source data catalog tools, first by providing a brief overview of what to expect and also with short blurbs about each of the currently available options in the space. This is the most complete and up-to-date directory on the web.

Amundsen

Amundsen 106Developed by Lyft, Amundsen is an open-source data discovery and metadata engine for discovering data and generating context that shows how it is being used. It can be piloted by analysts and data scientists and data and software engineers depending on the use case. The product features a PageRank-inspired search algorithm that recommends results based on names, descriptions, tags and querying/viewing activity on the table or dashboard. There’s also automated and curated metadata that describes tables and columns, other frequent users, when the table was last updated, preview data, and more.

CKAN

CKAN 106CKAN is an open-source data management system that makes data accessible by providing tools to streamline publishing, sharing, finding, and using data. The tool helps you manage and publish collections of data. Once data is published, users can use its faceted search capabilities to browse and find the data they need and preview it using maps, graphs, and tables. CKAN is built with Python on the backend and Javascript on the frontend. It also uses The Pylons webA framework and SQLAlchemy as its ORM.

Magda

Magda 106Magda is a federated, open-source data catalog for cataloging, enrichment, searching, tracking, and prioritization. The tool lets users find useful data via data discovery features. Magda also offers metadata enhancement and authoring tools. It can quickly crawl external data sources, track changes, and make automatic enhancements to push notifications when changes occur as well. Magda touts an open architecture that is designed as a set of microservices, and easy setup and upgrades.

If you’re looking for an enterprise data management solution, consult our freshly updated Data Management Buyer’s Guide.

Follow Tim

Timothy King

Senior Editor at Solutions Review
Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.
Timothy King
Follow Tim