Energy — Named Entity Extraction from Energy Industry Documents

New

This DAG automates the extraction of named entities from various energy-related documents, enhancing search capabilities. It ensures data accuracy and provides accessible outputs via an API for improved decision-making.

Weeki Logo

Overview

The primary purpose of this DAG is to facilitate the extraction of named entities from a variety of documents within the energy sector, such as production reports and contracts. The pipeline begins by ingesting data from multiple sources, including PDF files and Word documents, which contain critical information for analysis. Once the data is ingested, the process involves several key steps: extracting named entities, normalizing these entities to ensure consistency, and storing the results in a

The primary purpose of this DAG is to facilitate the extraction of named entities from a variety of documents within the energy sector, such as production reports and contracts. The pipeline begins by ingesting data from multiple sources, including PDF files and Word documents, which contain critical information for analysis. Once the data is ingested, the process involves several key steps: extracting named entities, normalizing these entities to ensure consistency, and storing the results in a centralized data warehouse. Quality control measures are implemented throughout the pipeline to validate the accuracy of the extracted data, with specific checks on the extraction precision and processing time. The results of this extraction are made available through a RESTful API, allowing for easy integration with other systems and applications. Key performance indicators (KPIs) monitored include the extraction accuracy rate and the overall processing time, which are essential for assessing the efficiency of the workflow. In case of any failures during processing, the DAG is designed to automatically restart after a configurable delay, ensuring minimal disruption in data availability. The business value of this DAG lies in its ability to enhance search functionalities across energy documents, enabling stakeholders to quickly access relevant information and make informed decisions.

Part of the Data & Model Catalog solution for the Energy industry.

Use cases

  • Improved search capabilities for energy-related documents
  • Enhanced decision-making through accurate data extraction
  • Streamlined data management and accessibility
  • Reduced manual effort in data processing tasks
  • Increased operational efficiency in the energy sector

Technical Specifications

Inputs

  • Production reports in PDF format
  • Contracts in Word format
  • Compliance documents in PDF format

Outputs

  • Extracted named entities stored in a data warehouse
  • API endpoint for accessing extracted data
  • Quality control reports on extraction accuracy

Processing Steps

  1. 1. Ingest documents from various sources
  2. 2. Extract named entities from the ingested documents
  3. 3. Normalize extracted entities for consistency
  4. 4. Store normalized entities in the data warehouse
  5. 5. Perform quality checks on extracted data
  6. 6. Expose results via API for external access

Additional Information

DAG ID

WK-0888

Last Updated

2026-01-06

Downloads

71

Tags