feat: Added reading zone info from csv

This commit is contained in:
2025-11-12 12:02:58 +00:00
parent be2c89bcc2
commit e38d21598f
9 changed files with 150 additions and 108 deletions
+2
View File
@@ -13,4 +13,6 @@ dat_files/*
asc_files/* asc_files/*
csv_files/* csv_files/*
combined_files/* combined_files/*
zone_inputs/*
*.tar.gz *.tar.gz
+61 -67
View File
@@ -1,90 +1,86 @@
# UK Met Office Rain Radar NIMROD Data Processor # UK Met Office Rain Radar NIMROD Data Processor
This project provides tools for processing UK Met Office Rain Radar NIMROD image files. It allows extraction of raster data from NIMROD format files and conversion to ESRI ASCII (.asc) format with optional bounding box clipping. This project provides tools for processing UK Met Office Rain Radar NIMROD image files. It allows extraction of raster data from NIMROD .dat format files and conversion to ESRI ASCII (.asc) format with optional bounding box clipping.
## Overview ## Overview
The project consists of two main Python modules: The project consists of a main pipeline workflow that processes multiple modules in sequence:
- `nimrod.py`: Core library for parsing NIMROD files, extracting metadata, and converting to ASCII format - `main.py`: Main pipeline orchestrator that calls on the modules as needed
- `batch_nimrod.py`: Script for batch processing multiple NIMROD files with configurable bounding boxes - `batch_nimrod.py`: Module for batch processing multiple NIMROD files with configurable bounding boxes
- `generate_timeseries.py`: Module for extracting cropped rain data and creating rainfall timeseries
- `combine_timeseries.py`: Module for combining grouped timeseries CSVs into consolidated datasets
## Features ## Features
### nimrod.py ### main.py
- Parse NIMROD format files (v1.7 and v2.6-4) - Orchestrates the entire workflow pipeline
- Extract header information and metadata - Processes DAT files to ASC format
- Convert raster data to ESRI ASCII (.asc) format - Generates timeseries data for specified locations
- Apply bounding box clipping to extract specific regions - Combines grouped CSV files into consolidated datasets
- Support for command-line usage or import as module
### batch_nimrod.py ### batch_nimrod.py
- Process multiple NIMROD files in batches - Process multiple NIMROD dat files
- Apply configurable bounding boxes per area - Automatically extract datetime from file data
- Automatically extract datetime from filenames
- Export clipped raster data to ASC format - Export clipped raster data to ASC format
### generate_timeseries.py
- Extract cropped rain data based on specified locations
- Create rainfall timeseries CSVs for each location
- Parse datetime from filename and create proper datetime index
### combine_timeseries.py
- Combine multiple timeseries CSV files into grouped datasets
- Group locations by specified output groups
- Create consolidated CSV files for each group
## Usage ## Usage
### Command Line (nimrod_3.py)
```bash
python nimrod.py [-h] [-q] [-x] [-bbox XMIN XMAX YMIN YMAX] [infile] [outfile]
```
Options:
- `-h, --help`: Show help message
- `-q, --query`: Display metadata
- `-x, --extract`: Extract raster file in ASC format
- `-bbox XMIN XMAX YMIN YMAX`: Bounding box to clip raster data to
### Python Example Module Usage (nimrod.py)
```python
from nimrod import Nimrod
# Open the .dat or nimrod compliant file
a = Nimrod(open('filename.dat'))
# Show the information about the file
a.query()
# output the .asc file
a.extract_asc(open('output.asc', 'w'))
# shrink the file down to a box area
a.apply_bbox(279906, 285444, 283130, 290440)
# show the shrunken down information about the file
a.query()
# output the shrunken down .asc file
a.extract_asc(open('clipped_output.asc', 'w'))
```
### Batch Processing (batch_nimrod.py)
It is recommended to use UV for environment and package handling. It is recommended to use UV for environment and package handling.
[Link to uv install](https://docs.astral.sh/uv/getting-started/installation/) [Link to uv install](https://docs.astral.sh/uv/getting-started/installation/)
### Main Pipeline (main.py)
```bash ```bash
uv sync uv run main.py
uv run batch_nimrod.py
``` ```
The main pipeline will:
1. Process DAT files to ASC format if needed
2. Generate timeseries data for specified locations
3. Combine grouped CSV files into consolidated datasets
## Configuration ## Configuration
The `config.yaml` file defines bounding box information for different areas. Default configuration includes: The `config.py` file defines folder paths:
- BRISCS: (607000, 608000, 217000, 218000) - DAT_TOP_FOLDER: "./dat_files"
- WINTSC: (499000, 500000, 416000, 417000) - ASC_TOP_FOLDER: "./asc_files"
- CSV_TOP_FOLDER: "./csv_files"
- COMBINED_FOLDER: "./combined_files"
The `main.py` script defines locations and their properties:
- Location name (e.g., "BRICSC")
- Location ID (e.g., "TM0816")
- X coordinate (e.g., 608500)
- Y coordinate (e.g., 216500)
- Output group (e.g., 1)
## Directory Structure ## Directory Structure
Inside the dat_files folder, each site short code should be its own folder with the .dat files inside of them.
The site short code folder name should match EXACTLY to the config site short code
Each dat file should have a datetime in its name in the format of yyyymmddhhmm (e.g: 202405260905 )
``` ```
dat_files/ dat_files/
── BRISCS/ ──*.dat files
│ └── *.dat files
└── WINTSC/
└── *.dat files
asc_files/ asc_files/
── BRISCS/ ──*.dat files
│ └── YYYYMMDDHHMM_BRISCS.asc files
└── WINTSC/ csv_files/
└── YYYYMMDDHHMM_WINTSC.asc files ├── TQ1234_timeseries_data.csv
├── ...
└── TQ5678_timeseries_data.csv
combined_files/
├── zone_1_timeseries_data.csv
├── ...
└── zone_50_timeseries_data.csv
``` ```
## Requirements ## Requirements
@@ -92,16 +88,14 @@ asc_files/
- Python 3.12+ - Python 3.12+
- [UV Installed](https://docs.astral.sh/uv/getting-started/installation/) - [UV Installed](https://docs.astral.sh/uv/getting-started/installation/)
## License ## Acknowledgments
Copyright (c) 2015 [Richard Thomas](https://github.com/richard-thomas/MetOffice_NIMROD) [Richard Thomas - Original Nimrod dat to asc file conversion](https://github.com/richard-thomas/MetOffice_NIMROD)
[Declan Valters - building the timeseries from the asc files](https://github.com/dvalters/NIMROD-toolbox)
This program is free software: you can redistribute it and/or modify it under the terms of the Artistic License 2.0 as published by the Open Source Initiative. ## Version update 2025
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
## Version update 2025-10-31 👻
Update by Jake Pullen, for the use of Anglian Water. Update by Jake Pullen, for the use of Anglian Water.
Added the batch_nimrod module to convert large amounts of files Added the batch_nimrod module to convert large amounts of files
Cleaned up the original code and added docstrings & typehints Cleaned up the original codes and added docstrings & typehints
Added main pipeline workflow that calls on the modules as needed to take the dat files and create grouped timeseries data CSVs
+2
View File
@@ -3,4 +3,6 @@ class Config:
ASC_TOP_FOLDER = "./asc_files" ASC_TOP_FOLDER = "./asc_files"
CSV_TOP_FOLDER = "./csv_files" CSV_TOP_FOLDER = "./csv_files"
COMBINED_FOLDER = "./combined_files" COMBINED_FOLDER = "./combined_files"
ZONE_FOLDER = "./zone_inputs"
AREAS_FILE = "areas.csv" AREAS_FILE = "areas.csv"
delete_dat_after_processing = False
+26 -17
View File
@@ -1,6 +1,7 @@
import logging import logging
import time import time
import os import os
import csv
from pathlib import Path from pathlib import Path
from config import Config from config import Config
@@ -14,14 +15,25 @@ if __name__ == "__main__":
os.makedirs(Path(Config.ASC_TOP_FOLDER), exist_ok=True) os.makedirs(Path(Config.ASC_TOP_FOLDER), exist_ok=True)
os.makedirs(Path(Config.CSV_TOP_FOLDER), exist_ok=True) os.makedirs(Path(Config.CSV_TOP_FOLDER), exist_ok=True)
os.makedirs(Path(Config.COMBINED_FOLDER), exist_ok=True) os.makedirs(Path(Config.COMBINED_FOLDER), exist_ok=True)
dat_file_count = [f for f in os.listdir(Path(Config.DAT_TOP_FOLDER))]
asc_file_count = [f for f in os.listdir(Path(Config.ASC_TOP_FOLDER))]
locations = [ locations = []
# loc name, loc id, x loc, y loc, output group #load zone inputs here
["BRICSC", "TM0816", 608500, 216500, 1], for file in os.listdir(Path(Config.ZONE_FOLDER)):
["HEACSC", "TF6842", 568500, 342500, 1], with open(Path(Config.ZONE_FOLDER,file), 'r') as csvfile:
] reader = csv.reader(csvfile)
header = next(reader) # Skip header row
for row in reader:
# Extract the relevant fields: Ossheet (location ID), Easting, Northing, Zone
zone_id = row[1] # Ossheet column
easting = int(row[2]) # Easting column
northing = int(row[3]) # Northing column
zone = int(row[6]) # ZoneID column
locations.append([zone_id, easting, northing, zone])
# testing locations, can be removed.
locations.append(["TM0816", 608500, 216500, 1])
locations.append(["TF6842", 568500, 342500, 1])
batch = BatchNimrod(Config) batch = BatchNimrod(Config)
timeseries = GenerateTimeseries(Config) timeseries = GenerateTimeseries(Config)
@@ -29,24 +41,21 @@ if __name__ == "__main__":
start = time.time() start = time.time()
logging.info("Starting to process DAT to ASC") logging.info("Starting to process DAT to ASC")
if len(dat_file_count) != len(asc_file_count):
batch.process_nimrod_files() batch.process_nimrod_files()
batch_checkpoint = time.time() batch_checkpoint = time.time()
elapsed_time = batch_checkpoint - start elapsed_time = batch_checkpoint - start
logging.info(f"DAT to ASC completed in {elapsed_time:.2f} seconds") logging.info(f"DAT to ASC completed in {elapsed_time:.2f} seconds")
else:
logging.info("No need to process DAT files, skipping...")
batch_checkpoint = time.time()
time.sleep(1)
for place in locations: for place in locations:
logging.info(f"{place[0]} started generating timeseries data.") logging.info(f"{place[0]} started generating timeseries data.")
place_start = time.time()
timeseries.extract_cropped_rain_data(place) timeseries.extract_cropped_rain_data(place)
place_checkpoint = time.time() place_end = time.time()
since_asc_create = place_checkpoint - batch_checkpoint place_create_time = place_end - place_start
elapsed_time = place_checkpoint - start elapsed_time = place_end - start
logging.info(f"{place[0]} completed in {since_asc_create:.2f} seconds") logging.info(f"{place[0]} completed in {place_create_time:.2f} seconds")
logging.info(f"total time so far {elapsed_time:.2f} seconds") logging.info(f"Total time so far {elapsed_time:.2f} seconds")
logging.info("combining CSVs into groups") logging.info("combining CSVs into groups")
combiner.combine_csv_files() combiner.combine_csv_files()
+8 -4
View File
@@ -17,10 +17,10 @@ class BatchNimrod:
box for each area, and exports clipped raster data to OUT_TOP_FOLDER. box for each area, and exports clipped raster data to OUT_TOP_FOLDER.
""" """
# Read all file names in the folder # Read all file names in the folder
files_to_process = [f for f in os.listdir(Path(self.config.DAT_TOP_FOLDER))] files_to_process = len([f for f in os.listdir(Path(self.config.DAT_TOP_FOLDER))])
logging.info(f"Processing {len(files_to_process)} files...")
logging.info(f"Processing {files_to_process} files...")
file_counter = 0
for in_file in os.listdir(Path(self.config.DAT_TOP_FOLDER)): for in_file in os.listdir(Path(self.config.DAT_TOP_FOLDER)):
in_file_full = Path(self.config.DAT_TOP_FOLDER, in_file) in_file_full = Path(self.config.DAT_TOP_FOLDER, in_file)
@@ -33,9 +33,13 @@ class BatchNimrod:
with open(out_file_path, "w") as outfile: with open(out_file_path, "w") as outfile:
image.extract_asc(outfile) image.extract_asc(outfile)
# delete dat file here if self.config.delete_dat_after_processing:
os.remove(in_file_full)
file_counter += 1
logging.debug(f"Successfully processed: {in_file_full}") logging.debug(f"Successfully processed: {in_file_full}")
if file_counter %10 == 0:
logging.info(f'processed {file_counter} out of {files_to_process} files')
except Nimrod.HeaderReadError as e: except Nimrod.HeaderReadError as e:
logging.error(f"Failed to read file {in_file_full}, is it corrupt?") logging.error(f"Failed to read file {in_file_full}, is it corrupt?")
+6 -6
View File
@@ -1,4 +1,4 @@
import pandas as pd import polars as pd
class CombineTimeseries: class CombineTimeseries:
@@ -10,7 +10,7 @@ class CombineTimeseries:
def build_location_groups(self): def build_location_groups(self):
for location in self.locations: for location in self.locations:
group = location[4] # output group is at index 4 group = location[3] # zone number
if group not in self.grouped_locations: if group not in self.grouped_locations:
self.grouped_locations[group] = [] self.grouped_locations[group] = []
self.grouped_locations[group].append(location) self.grouped_locations[group].append(location)
@@ -20,12 +20,12 @@ class CombineTimeseries:
combined_df = None combined_df = None
for loc in loc_list: for loc in loc_list:
csv_to_load = f"./csv_files/{loc[0]}_timeseries_data.csv" csv_to_load = f"./csv_files/{loc[0]}_timeseries_data.csv"
df = pd.read_csv(csv_to_load, index_col=0) df = pd.read_csv(csv_to_load)
if combined_df is None: if combined_df is None:
combined_df = df combined_df = df
else: else:
combined_df = combined_df.join(df, how="inner") combined_df = combined_df.join(df, on='datetime')
output_file = ( output_file = (
f"{self.config.COMBINED_FOLDER}/group_{group}_timeseries_data.csv" f"{self.config.COMBINED_FOLDER}/zone_{group}_timeseries_data.csv"
) )
combined_df.to_csv(output_file) combined_df.write_csv(output_file)
+12 -10
View File
@@ -1,7 +1,7 @@
from __future__ import division, print_function from __future__ import division, print_function
import numpy as np import numpy as np
from pathlib import Path from pathlib import Path
import pandas as pd import polars as pd
from datetime import datetime from datetime import datetime
import os import os
@@ -36,8 +36,8 @@ class GenerateTimeseries:
y0_radar = radar_header[3] y0_radar = radar_header[3]
x0_radar = radar_header[2] x0_radar = radar_header[2]
y0_basin = basin_header[3] y0_basin = basin_header[2]
x0_basin = basin_header[2] x0_basin = basin_header[1]
nrows_radar = radar_header[1] nrows_radar = radar_header[1]
@@ -96,15 +96,17 @@ class GenerateTimeseries:
datetime_list.append(parsed_date) datetime_list.append(parsed_date)
# Create DataFrame with datetime index # Create DataFrame with datetime index
df = pd.DataFrame({"rainfall": rainfile}, index=datetime_list) df = pd.DataFrame({"datetime": datetime_list, location[0]: rainfile})
# Sort the dataframe into date order # Sort the dataframe into date order
sorted_df = df.sort_index() sorted_df = df.sort("datetime")
sorted_df.to_csv( # Set datetime as index
sorted_df = sorted_df.with_columns(
pd.Series(datetime_list).alias("datetime")
).set_sorted("datetime")
sorted_df.write_csv(
f"csv_files/{location[0]}_timeseries_data.csv", f"csv_files/{location[0]}_timeseries_data.csv",
sep=",", float_precision=4
float_format="%1.4f",
header=[location[1]],
index_label="datetime",
) )
+1
View File
@@ -7,6 +7,7 @@ requires-python = ">=3.12"
dependencies = [ dependencies = [
"numpy>=2.3.4", "numpy>=2.3.4",
"pandas>=2.3.3", "pandas>=2.3.3",
"polars>=1.35.2",
"pyyaml>=6.0.3", "pyyaml>=6.0.3",
"ruff>=0.14.3", "ruff>=0.14.3",
] ]
Generated
+28
View File
@@ -9,6 +9,7 @@ source = { virtual = "." }
dependencies = [ dependencies = [
{ name = "numpy" }, { name = "numpy" },
{ name = "pandas" }, { name = "pandas" },
{ name = "polars" },
{ name = "pyyaml" }, { name = "pyyaml" },
{ name = "ruff" }, { name = "ruff" },
] ]
@@ -17,6 +18,7 @@ dependencies = [
requires-dist = [ requires-dist = [
{ name = "numpy", specifier = ">=2.3.4" }, { name = "numpy", specifier = ">=2.3.4" },
{ name = "pandas", specifier = ">=2.3.3" }, { name = "pandas", specifier = ">=2.3.3" },
{ name = "polars", specifier = ">=1.35.2" },
{ name = "pyyaml", specifier = ">=6.0.3" }, { name = "pyyaml", specifier = ">=6.0.3" },
{ name = "ruff", specifier = ">=0.14.3" }, { name = "ruff", specifier = ">=0.14.3" },
] ]
@@ -131,6 +133,32 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/70/44/5191d2e4026f86a2a109053e194d3ba7a31a2d10a9c2348368c63ed4e85a/pandas-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:3869faf4bd07b3b66a9f462417d0ca3a9df29a9f6abd5d0d0dbab15dac7abe87", size = 13202175, upload-time = "2025-09-29T23:31:59.173Z" }, { url = "https://files.pythonhosted.org/packages/70/44/5191d2e4026f86a2a109053e194d3ba7a31a2d10a9c2348368c63ed4e85a/pandas-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:3869faf4bd07b3b66a9f462417d0ca3a9df29a9f6abd5d0d0dbab15dac7abe87", size = 13202175, upload-time = "2025-09-29T23:31:59.173Z" },
] ]
[[package]]
name = "polars"
version = "1.35.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "polars-runtime-32" },
]
sdist = { url = "https://files.pythonhosted.org/packages/fa/43/09d4738aa24394751cb7e5d1fc4b5ef461d796efcadd9d00c79578332063/polars-1.35.2.tar.gz", hash = "sha256:ae458b05ca6e7ca2c089342c70793f92f1103c502dc1b14b56f0a04f2cc1d205", size = 694895, upload-time = "2025-11-09T13:20:05.921Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b4/9a/24e4b890c7ee4358964aa92c4d1865df0e8831f7df6abaa3a39914521724/polars-1.35.2-py3-none-any.whl", hash = "sha256:5e8057c8289ac148c793478323b726faea933d9776bd6b8a554b0ab7c03db87e", size = 783597, upload-time = "2025-11-09T13:18:51.361Z" },
]
[[package]]
name = "polars-runtime-32"
version = "1.35.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/cb/75/ac1256ace28c832a0997b20ba9d10a9d3739bd4d457c1eb1e7d196b6f88b/polars_runtime_32-1.35.2.tar.gz", hash = "sha256:6e6e35733ec52abe54b7d30d245e6586b027d433315d20edfb4a5d162c79fe90", size = 2694387, upload-time = "2025-11-09T13:20:07.624Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/66/de/a532b81e68e636483a5dd764d72e106215543f3ef49a142272b277ada8fe/polars_runtime_32-1.35.2-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:e465d12a29e8df06ea78947e50bd361cdf77535cd904fd562666a8a9374e7e3a", size = 40524507, upload-time = "2025-11-09T13:18:55.727Z" },
{ url = "https://files.pythonhosted.org/packages/2d/0b/679751ea6aeaa7b3e33a70ba17f9c8150310792583f3ecf9bb1ce15fe15c/polars_runtime_32-1.35.2-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:ef2b029b78f64fb53f126654c0bfa654045c7546bd0de3009d08bd52d660e8cc", size = 36700154, upload-time = "2025-11-09T13:18:59.78Z" },
{ url = "https://files.pythonhosted.org/packages/e2/c8/fd9f48dd6b89ae9cff53d896b51d08579ef9c739e46ea87a647b376c8ca2/polars_runtime_32-1.35.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:85dda0994b5dff7f456bb2f4bbd22be9a9e5c5e28670e23fedb13601ec99a46d", size = 41317788, upload-time = "2025-11-09T13:19:03.949Z" },
{ url = "https://files.pythonhosted.org/packages/67/89/e09d9897a70b607e22a36c9eae85a5b829581108fd1e3d4292e5c0f52939/polars_runtime_32-1.35.2-cp39-abi3-manylinux_2_24_aarch64.whl", hash = "sha256:3b9006902fc51b768ff747c0f74bd4ce04005ee8aeb290ce9c07ce1cbe1b58a9", size = 37850590, upload-time = "2025-11-09T13:19:08.154Z" },
{ url = "https://files.pythonhosted.org/packages/dc/40/96a808ca5cc8707894e196315227f04a0c82136b7fb25570bc51ea33b88d/polars_runtime_32-1.35.2-cp39-abi3-win_amd64.whl", hash = "sha256:ddc015fac39735592e2e7c834c02193ba4d257bb4c8c7478b9ebe440b0756b84", size = 41290019, upload-time = "2025-11-09T13:19:12.214Z" },
{ url = "https://files.pythonhosted.org/packages/f4/d1/8d1b28d007da43c750367c8bf5cb0f22758c16b1104b2b73b9acadb2d17a/polars_runtime_32-1.35.2-cp39-abi3-win_arm64.whl", hash = "sha256:6861145aa321a44eda7cc6694fb7751cb7aa0f21026df51b5faa52e64f9dc39b", size = 36955684, upload-time = "2025-11-09T13:19:15.666Z" },
]
[[package]] [[package]]
name = "python-dateutil" name = "python-dateutil"
version = "2.9.0.post0" version = "2.9.0.post0"