Compare commits
14 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
7b929b3b84
|
|||
|
2c4c4a3f4e
|
|||
| ad6b31e644 | |||
|
d386317957
|
|||
|
1c6418e044
|
|||
|
85deee7843
|
|||
|
e4f8c2d502
|
|||
|
59f459d4d0
|
|||
|
84ba6c837c
|
|||
|
c415b81bc8
|
|||
|
bd0a421bb9
|
|||
|
4bd32641bd
|
|||
|
83405eb17e
|
|||
|
009c40e08a
|
+6
-1
@@ -9,10 +9,15 @@ wheels/
|
|||||||
# Virtual environments
|
# Virtual environments
|
||||||
.venv
|
.venv
|
||||||
|
|
||||||
|
dat_other/*
|
||||||
|
tar_files/*
|
||||||
|
gz_files/*
|
||||||
dat_files/*
|
dat_files/*
|
||||||
asc_files/*
|
asc_files/*
|
||||||
csv_files/*
|
csv_files/*
|
||||||
combined_files/*
|
combined_files/*
|
||||||
zone_inputs/*
|
zone_inputs/*
|
||||||
|
|
||||||
*.tar.gz
|
*.tar.gz
|
||||||
|
|
||||||
|
generate_test_data.py
|
||||||
@@ -1,35 +1,43 @@
|
|||||||
# UK Met Office Rain Radar NIMROD Data Processor
|
# UK Met Office Rain Radar NIMROD Data Processor
|
||||||
|
|
||||||
This project provides tools for processing UK Met Office Rain Radar NIMROD image files. It allows extraction of raster data from NIMROD .dat format files and conversion to ESRI ASCII (.asc) format. It also allows the creation of timeseries data from the ASC files.
|
This project provides tools for processing UK Met Office Rain Radar NIMROD image files. It allows extraction of raster data from NIMROD .dat format files and conversion to ESRI ASCII (.asc) format. It also allows the creation of timeseries data from the ASC files, formatted for Infoworks ICM.
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The project consists of a main pipeline workflow that processes multiple modules in sequence:
|
The project consists of a main pipeline workflow that processes multiple modules in sequence:
|
||||||
|
|
||||||
- `main.py`: Main pipeline orchestrator that calls on the modules as needed
|
- `main.py`: Main pipeline orchestrator that calls on the modules as needed
|
||||||
- `batch_nimrod.py`: Module for batch processing multiple NIMROD files with configurable bounding boxes
|
- `batch_nimrod.py`: Module for batch processing multiple NIMROD files with configurable bounding boxes
|
||||||
- `generate_timeseries.py`: Module for extracting cropped rain data and creating rainfall timeseries
|
- `generate_timeseries.py`: Module for extracting cropped rain data and creating rainfall timeseries
|
||||||
- `combine_timeseries.py`: Module for combining grouped timeseries CSVs into consolidated datasets
|
- `extract.py`: Module for extracting the dat files from the .gz.tar files that are downloaded from source
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
### main.py
|
### main.py
|
||||||
- Orchestrates the entire workflow pipeline
|
|
||||||
- Processes DAT files to ASC format
|
- **Startup Safety Check**: Scans the `COMBINED_FOLDER` at startup and warns the user if existing files are found, offering a chance to abort to prevent accidental data mixing.
|
||||||
- Generates timeseries data for specified locations
|
- **Batch Processing**: Processes input tar files in configurable batches to manage resource usage.
|
||||||
- Combines grouped CSV files into consolidated datasets
|
- **End-to-End Processing**: Extracts GZ files, processes DAT/ASC, and appends to CSV in a single thread per file.
|
||||||
|
- **Concurrency**: Uses multi-threading to process individual GZ files within a batch concurrently.
|
||||||
|
- **Cumulative Data**: Automatically appends new query results to the existing CSV files in `COMBINED_FOLDER` for each batch, ensuring no data is lost and columns are correctly aligned.
|
||||||
|
- **Dynamic ETA**: Provides a real-time estimate of completion time.
|
||||||
|
|
||||||
|
### extract.py
|
||||||
|
|
||||||
|
- Converts all .gz.tar files first to 288 (1 day) of .gz files
|
||||||
|
- Converts all .gz files to .dat files ready for processing.
|
||||||
|
|
||||||
### batch_nimrod.py
|
### batch_nimrod.py
|
||||||
|
|
||||||
- Process multiple NIMROD dat files
|
- Process multiple NIMROD dat files
|
||||||
- Automatically extract datetime from file data
|
- Automatically extract datetime from file data
|
||||||
- Export clipped raster data to ASC format
|
- Export clipped raster data to ASC format
|
||||||
|
|
||||||
### generate_timeseries.py
|
### generate_timeseries.py
|
||||||
|
|
||||||
- Extract cropped rain data based on specified locations
|
- Extract cropped rain data based on specified locations
|
||||||
- Create rainfall timeseries CSVs for each location
|
- Create rainfall timeseries CSVs for each location
|
||||||
- Parse datetime from filename and create proper datetime index
|
- Parse datetime from filename and create proper datetime index
|
||||||
|
|
||||||
### combine_timeseries.py
|
|
||||||
- Combine multiple timeseries CSV files into grouped datasets
|
|
||||||
- Group locations by specified output groups
|
- Group locations by specified output groups
|
||||||
- Create consolidated CSV files for each group
|
- Create consolidated CSV files for each group
|
||||||
|
|
||||||
@@ -44,34 +52,41 @@ It is recommended to use UV for environment and package handling.
|
|||||||
|
|
||||||
1. Ensure all required packages are installed `uv sync`
|
1. Ensure all required packages are installed `uv sync`
|
||||||
1. Adjust the config.py file to match your needs.
|
1. Adjust the config.py file to match your needs.
|
||||||
1. Ensure your .dat files are in the DAT_TOP_FOLDER (as per config location)
|
1. Ensure your .gz.tar files are in the TAR_TOP_FOLDER (as per config location)
|
||||||
1. Ensure your zone csv files are in the ZONE_FOLDER (as per config location)
|
1. Ensure your zone csv files are in the ZONE_FOLDER (as per config location)
|
||||||
1. RunMain Pipeline `uv run main.py
|
1. RunMain Pipeline `uv run main.py` Note that you will have to set your environment variable `PYTHON_GIL=0` first
|
||||||
1. find the output in the COMBINED_FOLDER (as per config location)
|
1. find the output in the COMBINED_FOLDER (as per config location)
|
||||||
|
|
||||||
|
|
||||||
The main pipeline will:
|
The main pipeline will:
|
||||||
1. Process DAT files to ASC format if needed
|
|
||||||
2. Generate timeseries data for specified locations
|
1. Uncompress the .gz.tar files ready for processing
|
||||||
3. Combine grouped CSV files into consolidated datasets
|
1. Process DAT files to ASC format
|
||||||
|
1. Generate timeseries data for specified locations
|
||||||
|
1. Combine grouped locations into consolidated datasets
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
The `config.py` file defines folder paths:
|
The `config.py` file defines folder paths and file deletion options:
|
||||||
- DAT_TOP_FOLDER: "./dat_files"
|
|
||||||
- ASC_TOP_FOLDER: "./asc_files"
|
- TAR_TOP_FOLDER = "./tar_files"
|
||||||
- CSV_TOP_FOLDER: "./csv_files"
|
- GZ_TOP_FOLDER = "./gz_files"
|
||||||
- COMBINED_FOLDER: "./combined_files"
|
- DAT_TOP_FOLDER = "./dat_files"
|
||||||
|
- ASC_TOP_FOLDER = "./asc_files"
|
||||||
|
- COMBINED_FOLDER = "./combined_files"
|
||||||
|
- ZONE_FOLDER = "./zone_inputs"
|
||||||
|
- BATCH_SIZE = 5 (Number of tar files to process per batch)
|
||||||
|
|
||||||
Example of how the zone csv files should look:
|
Example of how the zone csv files should look:
|
||||||
```
|
|
||||||
filler, zone_name, easting, northing, other_filler, last_filler, zone_number
|
```csv
|
||||||
aa, TM0816, 608500, 216500, a, a, 1
|
1K Grid, easting, northing, zone_number
|
||||||
aa, TF6842, 568500, 342500, a, a, 1
|
TM0816, 608500, 216500, 1
|
||||||
|
TF6842, 568500, 342500, 1
|
||||||
```
|
```
|
||||||
|
|
||||||
## Acknowledgments
|
## Acknowledgments
|
||||||
|
|
||||||
Thank you to the following projects for their inspiration and code:
|
Thank you to the following projects for their inspiration and code:
|
||||||
* [Richard Thomas - Original Nimrod dat to asc file conversion](https://github.com/richard-thomas/MetOffice_NIMROD)
|
|
||||||
* [Declan Valters - building the timeseries from the asc files](https://github.com/dvalters/NIMROD-toolbox)
|
- [Richard Thomas - Original Nimrod dat to asc file conversion](https://github.com/richard-thomas/MetOffice_NIMROD)
|
||||||
|
- [Declan Valters - building the timeseries from the asc files](https://github.com/dvalters/NIMROD-toolbox)
|
||||||
|
|||||||
@@ -1,10 +1,15 @@
|
|||||||
class Config:
|
class Config:
|
||||||
|
TAR_TOP_FOLDER = "./tar_files"
|
||||||
|
GZ_TOP_FOLDER = "./gz_files"
|
||||||
DAT_TOP_FOLDER = "./dat_files"
|
DAT_TOP_FOLDER = "./dat_files"
|
||||||
ASC_TOP_FOLDER = "./asc_files"
|
ASC_TOP_FOLDER = "./asc_files"
|
||||||
CSV_TOP_FOLDER = "./csv_files"
|
|
||||||
COMBINED_FOLDER = "./combined_files"
|
COMBINED_FOLDER = "./combined_files"
|
||||||
|
|
||||||
ZONE_FOLDER = "./zone_inputs"
|
ZONE_FOLDER = "./zone_inputs"
|
||||||
|
|
||||||
delete_dat_after_processing = False
|
delete_tar_after_processing = False
|
||||||
|
delete_gz_after_processing = True
|
||||||
|
delete_dat_after_processing = True
|
||||||
delete_asc_after_processing = True
|
delete_asc_after_processing = True
|
||||||
delete_csv_after_combining = True
|
|
||||||
|
BATCH_SIZE = 5
|
||||||
|
|||||||
@@ -2,59 +2,222 @@ import logging
|
|||||||
import time
|
import time
|
||||||
import os
|
import os
|
||||||
import csv
|
import csv
|
||||||
|
import concurrent.futures
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
import shutil
|
||||||
|
|
||||||
from config import Config
|
from config import Config
|
||||||
from modules import BatchNimrod, GenerateTimeseries, CombineTimeseries
|
from modules import BatchNimrod, GenerateTimeseries, Extract
|
||||||
|
|
||||||
logging.basicConfig(
|
logging.basicConfig(
|
||||||
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
|
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def process_pipeline(gz_file_path):
|
||||||
|
# 1. Extract GZ to DAT
|
||||||
|
gz_path = Path(gz_file_path)
|
||||||
|
# The dat file name is derived from the gz file name (removing .gz or .dat.gz)
|
||||||
|
# gz files are named like 'NAME.dat.gz' often.
|
||||||
|
dat_filename = gz_path.name.replace(".gz", "")
|
||||||
|
dat_path = Path(Config.DAT_TOP_FOLDER, dat_filename)
|
||||||
|
|
||||||
|
# Extract
|
||||||
|
try:
|
||||||
|
extraction.process_single_gz(gz_path, dat_path)
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"Failed to extract {gz_path}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not dat_path.exists():
|
||||||
|
logging.error(f"DAT file not found after extraction: {dat_path}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# 2. Process DAT to ASC
|
||||||
|
# BatchNimrod._process_single_file expects just the filename, not full path
|
||||||
|
asc_file = batch._process_single_file(dat_filename)
|
||||||
|
if not asc_file:
|
||||||
|
# Cleanup failed DAT file if needed (BatchNimrod might have done it or not)
|
||||||
|
if Config.delete_dat_after_processing and dat_path.exists():
|
||||||
|
try:
|
||||||
|
os.remove(dat_path)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
# 3. Extract data from ASC
|
||||||
|
file_results = timeseries.process_asc_file(asc_file, locations)
|
||||||
|
|
||||||
|
return file_results
|
||||||
|
|
||||||
|
|
||||||
|
def initialise_folders():
|
||||||
|
folder_list = [
|
||||||
|
Config.ASC_TOP_FOLDER,
|
||||||
|
Config.COMBINED_FOLDER,
|
||||||
|
Config.GZ_TOP_FOLDER,
|
||||||
|
Config.DAT_TOP_FOLDER,
|
||||||
|
Config.TAR_TOP_FOLDER,
|
||||||
|
]
|
||||||
|
for path in folder_list:
|
||||||
|
Path(path).mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
os.makedirs(Path(Config.ASC_TOP_FOLDER), exist_ok=True)
|
initialise_folders()
|
||||||
os.makedirs(Path(Config.CSV_TOP_FOLDER), exist_ok=True)
|
|
||||||
os.makedirs(Path(Config.COMBINED_FOLDER), exist_ok=True)
|
|
||||||
|
|
||||||
locations = []
|
locations = []
|
||||||
#load zone inputs here
|
zones = set()
|
||||||
|
# load zone inputs here
|
||||||
for file in os.listdir(Path(Config.ZONE_FOLDER)):
|
for file in os.listdir(Path(Config.ZONE_FOLDER)):
|
||||||
with open(Path(Config.ZONE_FOLDER,file), 'r') as csvfile:
|
with open(Path(Config.ZONE_FOLDER, file), "r") as csvfile:
|
||||||
reader = csv.reader(csvfile)
|
reader = csv.reader(csvfile)
|
||||||
header = next(reader) # Skip header row
|
header = next(reader) # Skip header row
|
||||||
for row in reader:
|
for row in reader:
|
||||||
# Extract the relevant fields: Ossheet (location ID), Easting, Northing, Zone
|
# Extract the relevant fields: 1K Grid, Easting, Northing, Zone
|
||||||
zone_id = row[1] # Ossheet column
|
grid_name = row[0] # 1k Grid name
|
||||||
easting = int(row[2]) # Easting column
|
easting = int(row[1]) # Easting column
|
||||||
northing = int(row[3]) # Northing column
|
northing = int(row[2]) # Northing column
|
||||||
zone = int(row[6]) # ZoneID column
|
zone = int(row[3]) # ZoneID column
|
||||||
locations.append([zone_id, easting, northing, zone])
|
locations.append([grid_name, easting, northing, zone])
|
||||||
|
zones.add(zone)
|
||||||
|
logging.info(f"Count of 1km Grids: {len(locations)}")
|
||||||
|
logging.info(f"Count of Zones: {len(zones)}")
|
||||||
|
|
||||||
|
# Check for existing combined files
|
||||||
|
existing_combined = os.listdir(Config.COMBINED_FOLDER)
|
||||||
|
if existing_combined:
|
||||||
|
logging.warning("!" * 80)
|
||||||
|
logging.warning(
|
||||||
|
f"Found {len(existing_combined)} files in {Config.COMBINED_FOLDER}"
|
||||||
|
)
|
||||||
|
logging.warning(
|
||||||
|
"You may want to remove these before continuing to avoid duplicates or messy data."
|
||||||
|
)
|
||||||
|
logging.warning("!" * 80)
|
||||||
|
response = input("Continue? (Y/N): ").strip().lower()
|
||||||
|
if response != "y":
|
||||||
|
logging.info("Aborting...")
|
||||||
|
exit(0)
|
||||||
|
|
||||||
|
extraction = Extract(Config)
|
||||||
batch = BatchNimrod(Config)
|
batch = BatchNimrod(Config)
|
||||||
timeseries = GenerateTimeseries(Config)
|
timeseries = GenerateTimeseries(Config, locations)
|
||||||
combiner = CombineTimeseries(Config, locations)
|
|
||||||
|
|
||||||
start = time.time()
|
start = time.time()
|
||||||
logging.info("Starting to process DAT to ASC")
|
logging.info(
|
||||||
|
"Starting interleaved processing of GZ files -> DAT -> ASC -> Timeseries"
|
||||||
|
)
|
||||||
|
|
||||||
batch.process_nimrod_files()
|
# Get list of all tar files
|
||||||
batch_checkpoint = time.time()
|
all_tar_files = [f for f in os.listdir(Config.TAR_TOP_FOLDER) if f.endswith(".tar")]
|
||||||
elapsed_time = batch_checkpoint - start
|
all_tar_files.sort()
|
||||||
logging.info(f"DAT to ASC completed in {elapsed_time:.2f} seconds")
|
total_tars = len(all_tar_files)
|
||||||
|
files_per_tar = 288
|
||||||
|
estimated_total_files = total_tars * files_per_tar
|
||||||
|
logging.info(f"Found {total_tars} tar files to process")
|
||||||
|
|
||||||
logging.info("Starting generating timeseries data for all locations.")
|
# Process in batches
|
||||||
place_start = time.time()
|
for i in range(0, total_tars, Config.BATCH_SIZE):
|
||||||
timeseries.extract_data_for_all_locations(locations)
|
batch_files = all_tar_files[i : i + Config.BATCH_SIZE]
|
||||||
place_end = time.time()
|
logging.info(
|
||||||
place_create_time = place_end - place_start
|
f"Processing batch {i // Config.BATCH_SIZE + 1}: {len(batch_files)} tar files"
|
||||||
elapsed_time = place_end - start
|
)
|
||||||
logging.info(f"Timeseries generation completed in {place_create_time:.2f} seconds")
|
|
||||||
logging.info(f"Total time so far {elapsed_time:.2f} seconds")
|
|
||||||
|
|
||||||
logging.info("combining CSVs into groups")
|
# Initialize results structure for this batch
|
||||||
combiner.combine_csv_files()
|
results = {loc[0]: {"dates": [], "values": []} for loc in locations}
|
||||||
logging.info("CSVs combined!")
|
|
||||||
|
# 1. Extract batch (TAR -> GZ)
|
||||||
|
logging.info("Extracting tar files for batch")
|
||||||
|
extraction.extract_tar_batch(batch_files)
|
||||||
|
# Note: We do NOT run extract_gz_batch anymore. We will find GZ files and process them.
|
||||||
|
|
||||||
|
# Get list of GZ files (recursively or flat?)
|
||||||
|
# extract_tar_batch puts them in GZ_TOP_FOLDER/tar_name_without_ext
|
||||||
|
# So we need to look there.
|
||||||
|
# Ideally we know where we put them.
|
||||||
|
|
||||||
|
gz_files_to_process = []
|
||||||
|
for tar_file in batch_files:
|
||||||
|
extract_folder = Path(Config.GZ_TOP_FOLDER, tar_file.replace(".tar", ""))
|
||||||
|
if extract_folder.exists():
|
||||||
|
for root, _, files in os.walk(extract_folder):
|
||||||
|
for file in files:
|
||||||
|
if file.endswith(".gz"):
|
||||||
|
gz_files_to_process.append(Path(root, file))
|
||||||
|
|
||||||
|
total_files = len(gz_files_to_process)
|
||||||
|
logging.info(f"Found {total_files} GZ files to process concurrently...")
|
||||||
|
|
||||||
|
with concurrent.futures.ThreadPoolExecutor() as executor:
|
||||||
|
future_to_file = {
|
||||||
|
executor.submit(process_pipeline, gz_file): gz_file
|
||||||
|
for gz_file in gz_files_to_process
|
||||||
|
}
|
||||||
|
|
||||||
|
completed_count = 0
|
||||||
|
try:
|
||||||
|
for future in concurrent.futures.as_completed(future_to_file):
|
||||||
|
file_results = future.result()
|
||||||
|
if file_results:
|
||||||
|
for res in file_results:
|
||||||
|
zone_id = res["zone_id"]
|
||||||
|
results[zone_id]["dates"].append(res["date"])
|
||||||
|
results[zone_id]["values"].append(res["value"])
|
||||||
|
|
||||||
|
completed_count += 1
|
||||||
|
if completed_count % 100 == 0:
|
||||||
|
files_processed_previous = i * files_per_tar
|
||||||
|
files_processed_so_far = (
|
||||||
|
files_processed_previous + completed_count
|
||||||
|
)
|
||||||
|
|
||||||
|
elapsed_time = time.time() - start
|
||||||
|
rate_per_second = files_processed_so_far / elapsed_time
|
||||||
|
|
||||||
|
remaining_files = estimated_total_files - files_processed_so_far
|
||||||
|
|
||||||
|
if rate_per_second > 0:
|
||||||
|
eta_seconds = remaining_files / rate_per_second
|
||||||
|
|
||||||
|
if eta_seconds < 60:
|
||||||
|
eta_str = f"{int(eta_seconds)}s"
|
||||||
|
elif eta_seconds < 3600:
|
||||||
|
eta_str = f"{int(eta_seconds // 60)}m {int(eta_seconds % 60)}s"
|
||||||
|
else:
|
||||||
|
eta_str = f"{int(eta_seconds // 3600)}h {int((eta_seconds % 3600) // 60)}m"
|
||||||
|
else:
|
||||||
|
eta_str = "Unknown"
|
||||||
|
|
||||||
|
logging.info(f"""Progress: {files_processed_so_far}/{estimated_total_files} files ({files_processed_so_far / estimated_total_files * 100:.1f}%)
|
||||||
|
Speed: {rate_per_second * 60:.2f} files/min. ETA: {eta_str}""")
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
logging.warning(
|
||||||
|
"KeyboardInterrupt received. Cancelling pending tasks..."
|
||||||
|
)
|
||||||
|
executor.shutdown(wait=False, cancel_futures=True)
|
||||||
|
raise
|
||||||
|
|
||||||
|
logging.info("Appending batch results to CSV files...")
|
||||||
|
timeseries.append_results_to_csv(results, locations)
|
||||||
|
|
||||||
|
# Cleanup GZ folders for this batch
|
||||||
|
# We loop through batch_files again to delete the folders we created
|
||||||
|
for tar_file in batch_files:
|
||||||
|
extract_folder = Path(Config.GZ_TOP_FOLDER, tar_file.replace(".tar", ""))
|
||||||
|
if extract_folder.exists():
|
||||||
|
try:
|
||||||
|
shutil.rmtree(extract_folder)
|
||||||
|
except OSError as e:
|
||||||
|
logging.warning(f"Failed to remove GZ folder {extract_folder}: {e}")
|
||||||
end = time.time()
|
end = time.time()
|
||||||
elapsed_time = end - start
|
elapsed_time = end - start
|
||||||
|
|
||||||
logging.info(f"All Complete total time {elapsed_time:.2f} seconds")
|
if elapsed_time < 60:
|
||||||
|
elapsed_time_str = f"{int(elapsed_time)}s"
|
||||||
|
elif elapsed_time < 3600:
|
||||||
|
elapsed_time_str = f"{int(elapsed_time // 60)}m {int(elapsed_time % 60)}s"
|
||||||
|
else:
|
||||||
|
elapsed_time_str = f"{int(elapsed_time // 3600)}h {int((elapsed_time % 3600) // 60)}m"
|
||||||
|
|
||||||
|
logging.info(f"All Complete total time {elapsed_time_str}")
|
||||||
|
|||||||
+2
-7
@@ -1,11 +1,6 @@
|
|||||||
from .nimrod import Nimrod
|
from .nimrod import Nimrod
|
||||||
from .batch_nimrod import BatchNimrod
|
from .batch_nimrod import BatchNimrod
|
||||||
from .generate_timeseries import GenerateTimeseries
|
from .generate_timeseries import GenerateTimeseries
|
||||||
from .combine_timeseries import CombineTimeseries
|
from .extract import Extract
|
||||||
|
|
||||||
__all__ = [
|
__all__ = ["Nimrod", "BatchNimrod", "GenerateTimeseries", "Extract"]
|
||||||
"Nimrod",
|
|
||||||
"BatchNimrod",
|
|
||||||
"GenerateTimeseries",
|
|
||||||
"CombineTimeseries"
|
|
||||||
]
|
|
||||||
|
|||||||
+26
-15
@@ -5,27 +5,26 @@ import logging
|
|||||||
import concurrent.futures
|
import concurrent.futures
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
class BatchNimrod:
|
class BatchNimrod:
|
||||||
def __init__(self, config) -> None:
|
def __init__(self, config) -> None:
|
||||||
self.config = config
|
self.config = config
|
||||||
|
|
||||||
def _process_single_file(self, in_file):
|
def _process_single_file(self, in_file):
|
||||||
"""Process a single Nimrod DAT file.
|
"""Process a single Nimrod DAT file.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
in_file (str): Filename of the DAT file.
|
in_file (str): Filename of the DAT file.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
bool: True if successful, False otherwise.
|
bool: True if successful, False otherwise.
|
||||||
"""
|
"""
|
||||||
in_file_full = Path(self.config.DAT_TOP_FOLDER, in_file)
|
in_file_full = Path(self.config.DAT_TOP_FOLDER, in_file)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# We need to open the file here, inside the thread
|
# We need to open the file here, inside the thread
|
||||||
with open(in_file_full, "rb") as f:
|
with open(in_file_full, "rb") as f:
|
||||||
image = Nimrod(f)
|
image = Nimrod(f)
|
||||||
|
|
||||||
out_file_name = f"{image.get_validity_time()}.asc"
|
out_file_name = f"{image.get_validity_time()}.asc"
|
||||||
out_file_path = Path(self.config.ASC_TOP_FOLDER, out_file_name)
|
out_file_path = Path(self.config.ASC_TOP_FOLDER, out_file_name)
|
||||||
|
|
||||||
@@ -36,7 +35,7 @@ class BatchNimrod:
|
|||||||
os.remove(in_file_full)
|
os.remove(in_file_full)
|
||||||
|
|
||||||
logging.debug(f"Successfully processed: {in_file_full}")
|
logging.debug(f"Successfully processed: {in_file_full}")
|
||||||
return True
|
return out_file_name
|
||||||
|
|
||||||
except Nimrod.HeaderReadError as e:
|
except Nimrod.HeaderReadError as e:
|
||||||
logging.error(f"Failed to read file {in_file_full}, is it corrupt?")
|
logging.error(f"Failed to read file {in_file_full}, is it corrupt?")
|
||||||
@@ -59,21 +58,33 @@ class BatchNimrod:
|
|||||||
box for each area, and exports clipped raster data to OUT_TOP_FOLDER.
|
box for each area, and exports clipped raster data to OUT_TOP_FOLDER.
|
||||||
"""
|
"""
|
||||||
# Read all file names in the folder
|
# Read all file names in the folder
|
||||||
files_to_process = [f for f in os.listdir(Path(self.config.DAT_TOP_FOLDER)) if not f.startswith('.')]
|
files_to_process = [
|
||||||
|
f
|
||||||
|
for f in os.listdir(Path(self.config.DAT_TOP_FOLDER))
|
||||||
|
if not f.startswith(".")
|
||||||
|
]
|
||||||
total_files = len(files_to_process)
|
total_files = len(files_to_process)
|
||||||
|
|
||||||
logging.info(f"Processing {total_files} files concurrently...")
|
logging.info(f"Processing {total_files} files concurrently...")
|
||||||
|
|
||||||
with concurrent.futures.ThreadPoolExecutor() as executor:
|
with concurrent.futures.ThreadPoolExecutor() as executor:
|
||||||
# Submit all tasks
|
# Submit all tasks
|
||||||
future_to_file = {
|
future_to_file = {
|
||||||
executor.submit(self._process_single_file, in_file): in_file
|
executor.submit(self._process_single_file, in_file): in_file
|
||||||
for in_file in files_to_process
|
for in_file in files_to_process
|
||||||
}
|
}
|
||||||
|
|
||||||
completed_count = 0
|
|
||||||
for future in concurrent.futures.as_completed(future_to_file):
|
|
||||||
completed_count += 1
|
|
||||||
if completed_count % 10 == 0:
|
|
||||||
logging.info(f'processed {completed_count} out of {total_files} files')
|
|
||||||
|
|
||||||
|
completed_count = 0
|
||||||
|
try:
|
||||||
|
for future in concurrent.futures.as_completed(future_to_file):
|
||||||
|
completed_count += 1
|
||||||
|
if completed_count % 10 == 0:
|
||||||
|
logging.info(
|
||||||
|
f"processed {completed_count} out of {total_files} files"
|
||||||
|
)
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
logging.warning(
|
||||||
|
"KeyboardInterrupt received. Cancelling pending tasks..."
|
||||||
|
)
|
||||||
|
executor.shutdown(wait=False, cancel_futures=True)
|
||||||
|
raise
|
||||||
|
|||||||
@@ -1,37 +0,0 @@
|
|||||||
import polars as pd
|
|
||||||
import os
|
|
||||||
|
|
||||||
|
|
||||||
class CombineTimeseries:
|
|
||||||
def __init__(self, config, locations):
|
|
||||||
self.config = config
|
|
||||||
self.locations = locations
|
|
||||||
self.grouped_locations = {}
|
|
||||||
self.build_location_groups()
|
|
||||||
|
|
||||||
def build_location_groups(self):
|
|
||||||
for location in self.locations:
|
|
||||||
group = location[3] # zone number
|
|
||||||
if group not in self.grouped_locations:
|
|
||||||
self.grouped_locations[group] = []
|
|
||||||
self.grouped_locations[group].append(location)
|
|
||||||
|
|
||||||
def combine_csv_files(self):
|
|
||||||
for group, loc_list in self.grouped_locations.items():
|
|
||||||
combined_df = None
|
|
||||||
for loc in loc_list:
|
|
||||||
csv_to_load = f"./csv_files/{loc[0]}_timeseries_data.csv"
|
|
||||||
df = pd.read_csv(csv_to_load)
|
|
||||||
if combined_df is None:
|
|
||||||
combined_df = df
|
|
||||||
else:
|
|
||||||
combined_df = combined_df.join(df, on='datetime')
|
|
||||||
|
|
||||||
if self.config.delete_csv_after_combining:
|
|
||||||
os.remove(csv_to_load)
|
|
||||||
|
|
||||||
output_file = (
|
|
||||||
f"{self.config.COMBINED_FOLDER}/zone_{group}_timeseries_data.csv"
|
|
||||||
)
|
|
||||||
sorted_df = combined_df.sort('datetime')
|
|
||||||
sorted_df.write_csv(output_file)
|
|
||||||
Executable
+72
@@ -0,0 +1,72 @@
|
|||||||
|
import tarfile
|
||||||
|
import gzip
|
||||||
|
import shutil
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
import concurrent.futures
|
||||||
|
|
||||||
|
|
||||||
|
class Extract:
|
||||||
|
# Directory containing .tar files
|
||||||
|
def __init__(self, Config):
|
||||||
|
self.config = Config
|
||||||
|
|
||||||
|
def extract_tar_batch(self, tar_files):
|
||||||
|
for tar_file in tar_files:
|
||||||
|
tar_path = Path(self.config.TAR_TOP_FOLDER, tar_file)
|
||||||
|
|
||||||
|
# Create a folder for extracted tar contents
|
||||||
|
extract_folder = Path(
|
||||||
|
self.config.GZ_TOP_FOLDER, tar_file.replace(".tar", "")
|
||||||
|
)
|
||||||
|
Path(extract_folder).mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Extract .tar file
|
||||||
|
with tarfile.open(tar_path, "r") as tar:
|
||||||
|
tar.extractall(path=extract_folder)
|
||||||
|
|
||||||
|
if self.config.delete_tar_after_processing:
|
||||||
|
os.remove(tar_path)
|
||||||
|
|
||||||
|
def process_single_gz(self, gz_path, dat_path):
|
||||||
|
try:
|
||||||
|
with gzip.open(gz_path, "rb") as f_in:
|
||||||
|
with open(dat_path, "wb") as f_out:
|
||||||
|
shutil.copyfileobj(f_in, f_out)
|
||||||
|
|
||||||
|
if self.config.delete_gz_after_processing:
|
||||||
|
os.remove(gz_path)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error extracting {gz_path}: {e}")
|
||||||
|
|
||||||
|
def extract_gz_batch(self):
|
||||||
|
gz_tasks = []
|
||||||
|
for root, _, files in os.walk(self.config.GZ_TOP_FOLDER):
|
||||||
|
for file in files:
|
||||||
|
# only handle .gz files
|
||||||
|
if not file.endswith(".dat.gz"):
|
||||||
|
continue
|
||||||
|
|
||||||
|
gz_path = Path(root, file)
|
||||||
|
dat_path = Path(self.config.DAT_TOP_FOLDER, file.replace(".gz", ""))
|
||||||
|
gz_tasks.append((gz_path, dat_path))
|
||||||
|
|
||||||
|
print(f"Extracting {len(gz_tasks)} gz files concurrently...")
|
||||||
|
|
||||||
|
with concurrent.futures.ThreadPoolExecutor() as executor:
|
||||||
|
futures = [
|
||||||
|
executor.submit(self.process_single_gz, gz_path, dat_path)
|
||||||
|
for gz_path, dat_path in gz_tasks
|
||||||
|
]
|
||||||
|
concurrent.futures.wait(futures)
|
||||||
|
|
||||||
|
try:
|
||||||
|
shutil.rmtree(self.config.GZ_TOP_FOLDER)
|
||||||
|
# Recreate the folder for the next batch
|
||||||
|
Path(self.config.GZ_TOP_FOLDER).mkdir(exist_ok=True)
|
||||||
|
print("processing complete and GZ files deleted")
|
||||||
|
except Exception as e:
|
||||||
|
print(str(e))
|
||||||
|
print(
|
||||||
|
f"processing complete but GZ folder delete failed. Please delete manually ({self.config.GZ_TOP_FOLDER})"
|
||||||
|
)
|
||||||
+116
-51
@@ -5,12 +5,13 @@ import polars as pd
|
|||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
import os
|
import os
|
||||||
import concurrent.futures
|
import concurrent.futures
|
||||||
|
import logging
|
||||||
|
|
||||||
|
|
||||||
class GenerateTimeseries:
|
class GenerateTimeseries:
|
||||||
def __init__(self, config):
|
def __init__(self, config, locations):
|
||||||
self.config = config
|
self.config = config
|
||||||
|
self.locations = locations
|
||||||
|
|
||||||
def _read_ascii_header(self, ascii_raster_file: str) -> list:
|
def _read_ascii_header(self, ascii_raster_file: str) -> list:
|
||||||
"""Reads header information from an ASCII DEM
|
"""Reads header information from an ASCII DEM
|
||||||
@@ -63,19 +64,19 @@ class GenerateTimeseries:
|
|||||||
|
|
||||||
return int(start_col), int(start_row), int(end_col), int(end_row)
|
return int(start_col), int(start_row), int(end_col), int(end_row)
|
||||||
|
|
||||||
def _process_single_file(self, file_name, locations):
|
def process_asc_file(self, file_name, locations):
|
||||||
"""Process a single ASC file and extract data for all locations.
|
"""Process a single ASC file and extract data for all locations.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
file_name (str): Name of the ASC file.
|
file_name (str): Name of the ASC file.
|
||||||
locations (list): List of locations.
|
locations (list): List of locations.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
list: A list of dictionaries containing extracted data for each location,
|
list: A list of dictionaries containing extracted data for each location,
|
||||||
or None if processing fails.
|
or None if processing fails.
|
||||||
Format: [{'zone_id': id, 'date': datetime, 'value': float}, ...]
|
Format: [{'zone_id': id, 'date': datetime, 'value': float}, ...]
|
||||||
"""
|
"""
|
||||||
if not file_name.endswith('.asc'):
|
if not file_name.endswith(".asc"):
|
||||||
return None
|
return None
|
||||||
|
|
||||||
file_path = Path(self.config.ASC_TOP_FOLDER, file_name)
|
file_path = Path(self.config.ASC_TOP_FOLDER, file_name)
|
||||||
@@ -83,7 +84,7 @@ class GenerateTimeseries:
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
radar_header = self._read_ascii_header(str(file_path))
|
radar_header = self._read_ascii_header(str(file_path))
|
||||||
|
|
||||||
# Read grid once
|
# Read grid once
|
||||||
cur_rawgrid = np.loadtxt(file_path, skiprows=6, dtype=float, delimiter=None)
|
cur_rawgrid = np.loadtxt(file_path, skiprows=6, dtype=float, delimiter=None)
|
||||||
|
|
||||||
@@ -96,14 +97,14 @@ class GenerateTimeseries:
|
|||||||
# Extract data for each location
|
# Extract data for each location
|
||||||
for location in locations:
|
for location in locations:
|
||||||
zone_id = location[0]
|
zone_id = location[0]
|
||||||
|
|
||||||
# Calculate crop coordinates
|
# Calculate crop coordinates
|
||||||
start_col, start_row, end_col, end_row = self._calculate_crop_coords(
|
start_col, start_row, end_col, end_row = self._calculate_crop_coords(
|
||||||
location, radar_header
|
location, radar_header
|
||||||
)
|
)
|
||||||
|
|
||||||
cur_croppedrain = cur_rawgrid[start_row:end_row, start_col:end_col]
|
cur_croppedrain = cur_rawgrid[start_row:end_row, start_col:end_col]
|
||||||
|
|
||||||
if cur_croppedrain.size > 2:
|
if cur_croppedrain.size > 2:
|
||||||
val = cur_croppedrain.flatten()[2] / 32
|
val = cur_croppedrain.flatten()[2] / 32
|
||||||
else:
|
else:
|
||||||
@@ -111,17 +112,12 @@ class GenerateTimeseries:
|
|||||||
# print(f"Warning: Crop too small for {zone_id} in {file_name}")
|
# print(f"Warning: Crop too small for {zone_id} in {file_name}")
|
||||||
val = 0.0
|
val = 0.0
|
||||||
|
|
||||||
results.append({
|
results.append({"zone_id": zone_id, "date": parsed_date, "value": val})
|
||||||
'zone_id': zone_id,
|
|
||||||
'date': parsed_date,
|
|
||||||
'value': val
|
|
||||||
})
|
|
||||||
|
|
||||||
if self.config.delete_asc_after_processing:
|
if self.config.delete_asc_after_processing:
|
||||||
os.remove(file_path)
|
os.remove(file_path)
|
||||||
|
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error processing file {file_name}: {e}")
|
print(f"Error processing file {file_name}: {e}")
|
||||||
@@ -134,7 +130,7 @@ class GenerateTimeseries:
|
|||||||
locations (list): List of location data [zone_id, easting, northing, zone]
|
locations (list): List of location data [zone_id, easting, northing, zone]
|
||||||
"""
|
"""
|
||||||
# Initialize data structure to hold results: {zone_id: {'dates': [], 'values': []}}
|
# Initialize data structure to hold results: {zone_id: {'dates': [], 'values': []}}
|
||||||
results = {loc[0]: {'dates': [], 'values': []} for loc in locations}
|
results = {loc[0]: {"dates": [], "values": []} for loc in locations}
|
||||||
|
|
||||||
# Get list of ASC files
|
# Get list of ASC files
|
||||||
asc_files = sorted(os.listdir(Path(self.config.ASC_TOP_FOLDER)))
|
asc_files = sorted(os.listdir(Path(self.config.ASC_TOP_FOLDER)))
|
||||||
@@ -143,50 +139,119 @@ class GenerateTimeseries:
|
|||||||
|
|
||||||
# Use ThreadPoolExecutor for concurrent processing
|
# Use ThreadPoolExecutor for concurrent processing
|
||||||
# Since we are using Python 3.14t (free-threaded), this should scale well even for CPU work
|
# Since we are using Python 3.14t (free-threaded), this should scale well even for CPU work
|
||||||
# mixed with I/O.
|
|
||||||
with concurrent.futures.ThreadPoolExecutor() as executor:
|
with concurrent.futures.ThreadPoolExecutor() as executor:
|
||||||
# Submit all tasks
|
# Submit all tasks
|
||||||
future_to_file = {
|
future_to_file = {
|
||||||
executor.submit(self._process_single_file, file_name, locations): file_name
|
executor.submit(self.process_asc_file, file_name, locations): file_name
|
||||||
for file_name in asc_files
|
for file_name in asc_files
|
||||||
}
|
}
|
||||||
|
|
||||||
completed_count = 0
|
completed_count = 0
|
||||||
for future in concurrent.futures.as_completed(future_to_file):
|
try:
|
||||||
file_results = future.result()
|
for future in concurrent.futures.as_completed(future_to_file):
|
||||||
if file_results:
|
file_results = future.result()
|
||||||
for res in file_results:
|
if file_results:
|
||||||
zone_id = res['zone_id']
|
for res in file_results:
|
||||||
results[zone_id]['dates'].append(res['date'])
|
zone_id = res["zone_id"]
|
||||||
results[zone_id]['values'].append(res['value'])
|
results[zone_id]["dates"].append(res["date"])
|
||||||
|
results[zone_id]["values"].append(res["value"])
|
||||||
completed_count += 1
|
|
||||||
if completed_count % 100 == 0:
|
|
||||||
print(f"Processed {completed_count}/{total_files} files")
|
|
||||||
|
|
||||||
# Write CSVs for each location
|
completed_count += 1
|
||||||
print("Writing CSV files...")
|
if completed_count % 100 == 0:
|
||||||
for location in locations:
|
print(f"Processed {completed_count}/{total_files} files")
|
||||||
zone_id = location[0]
|
except KeyboardInterrupt:
|
||||||
data = results[zone_id]
|
print("KeyboardInterrupt received. Cancelling pending tasks...")
|
||||||
|
executor.shutdown(wait=False, cancel_futures=True)
|
||||||
if not data['dates']:
|
raise
|
||||||
print(f"No data found for {zone_id}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
df = pd.DataFrame({"datetime": data['dates'], zone_id: data['values']})
|
def append_results_to_csv(self, results, locations):
|
||||||
|
"""Append extracted data to CSV files for each zone.
|
||||||
|
|
||||||
# Sort the dataframe into date order
|
Args:
|
||||||
sorted_df = df.sort("datetime")
|
results (dict): Aggregated results {zone_id: {'dates': [], 'values': []}}
|
||||||
|
locations (list): List of location data [zone_id, easting, northing, zone]
|
||||||
# Format datetime column
|
"""
|
||||||
sorted_df = sorted_df.with_columns(
|
# Group results by zone and collect all unique dates
|
||||||
|
zone_data = {}
|
||||||
|
for loc in locations:
|
||||||
|
zone_id = loc[0]
|
||||||
|
zone_name = loc[3]
|
||||||
|
|
||||||
|
if zone_name not in zone_data:
|
||||||
|
zone_data[zone_name] = {"dates": [], "values": {}}
|
||||||
|
|
||||||
|
zone_data[zone_name]["values"][zone_id] = results[zone_id]["values"]
|
||||||
|
zone_data[zone_name]["dates"].extend(results[zone_id]["dates"])
|
||||||
|
|
||||||
|
# Get unique sorted dates across all zones
|
||||||
|
for zone_name, data in zone_data.items():
|
||||||
|
data["dates"] = sorted(set(data["dates"]))
|
||||||
|
|
||||||
|
# Now write one CSV per zone with aligned timestamps
|
||||||
|
for zone_name, data in zone_data.items():
|
||||||
|
dates = data["dates"]
|
||||||
|
values_dict = data["values"]
|
||||||
|
|
||||||
|
# Create aligned DataFrame
|
||||||
|
df_dict = {"datetime": dates}
|
||||||
|
for grid_square, values in values_dict.items():
|
||||||
|
# Align values to the common dates
|
||||||
|
aligned_values = []
|
||||||
|
value_iter = iter(values)
|
||||||
|
date_iter = iter(dates)
|
||||||
|
|
||||||
|
current_date = next(date_iter, None)
|
||||||
|
current_value = next(value_iter, None)
|
||||||
|
|
||||||
|
for expected_date in dates:
|
||||||
|
if current_date == expected_date:
|
||||||
|
aligned_values.append(current_value)
|
||||||
|
try:
|
||||||
|
current_date = next(date_iter)
|
||||||
|
current_value = next(value_iter)
|
||||||
|
except StopIteration:
|
||||||
|
current_date = None
|
||||||
|
current_value = None
|
||||||
|
else:
|
||||||
|
aligned_values.append(None) # Missing value
|
||||||
|
|
||||||
|
df_dict[grid_square] = aligned_values
|
||||||
|
|
||||||
|
new_df = pd.DataFrame(df_dict)
|
||||||
|
|
||||||
|
# Format datetime column in new_df
|
||||||
|
new_df = new_df.with_columns(
|
||||||
pd.col("datetime").dt.strftime("%Y-%m-%d %H:%M:%S")
|
pd.col("datetime").dt.strftime("%Y-%m-%d %H:%M:%S")
|
||||||
)
|
)
|
||||||
|
|
||||||
output_path = Path(self.config.CSV_TOP_FOLDER) / f"{zone_id}_timeseries_data.csv"
|
output_path = (
|
||||||
sorted_df.write_csv(
|
Path(self.config.COMBINED_FOLDER) / f"{zone_name}_timeseries_data.csv"
|
||||||
output_path,
|
|
||||||
float_precision=4
|
|
||||||
)
|
)
|
||||||
print("All CSV files written.")
|
|
||||||
|
if output_path.exists():
|
||||||
|
# Load existing CSV
|
||||||
|
existing_df = pd.read_csv(output_path)
|
||||||
|
|
||||||
|
# Reorder new_df to match existing_df
|
||||||
|
new_df = new_df.select(existing_df.columns)
|
||||||
|
|
||||||
|
# Concatenate
|
||||||
|
combined_df = pd.concat([existing_df, new_df])
|
||||||
|
# Sort by datetime
|
||||||
|
combined_df = combined_df.sort("datetime")
|
||||||
|
# Write back
|
||||||
|
combined_df.write_csv(output_path, float_precision=4)
|
||||||
|
else:
|
||||||
|
# Write new CSV
|
||||||
|
# Sort columns to ensure deterministic order (datetime first)
|
||||||
|
cols = new_df.columns
|
||||||
|
cols.remove("datetime")
|
||||||
|
cols.sort()
|
||||||
|
sorted_cols = ["datetime"] + cols
|
||||||
|
new_df = new_df.select(sorted_cols)
|
||||||
|
|
||||||
|
# Sort by datetime (already sorted but good practice)
|
||||||
|
sorted_df = new_df.sort("datetime")
|
||||||
|
sorted_df.write_csv(output_path, float_precision=4)
|
||||||
|
|
||||||
|
logging.info("All CSV files updated.")
|
||||||
|
|||||||
+7
-5
@@ -258,12 +258,12 @@ class Nimrod:
|
|||||||
# Read data as big-endian 16-bit integers
|
# Read data as big-endian 16-bit integers
|
||||||
# numpy.frombuffer is efficient for reading from bytes
|
# numpy.frombuffer is efficient for reading from bytes
|
||||||
data_bytes = infile.read(array_size * 2)
|
data_bytes = infile.read(array_size * 2)
|
||||||
self.data = np.frombuffer(data_bytes, dtype='>h').astype(np.int16)
|
self.data = np.frombuffer(data_bytes, dtype=">h").astype(np.int16)
|
||||||
|
|
||||||
# Reshape to (nrows, ncols) for easier 2D manipulation
|
# Reshape to (nrows, ncols) for easier 2D manipulation
|
||||||
# Note: NIMROD data is row-major (C-style), starting from top-left
|
# Note: NIMROD data is row-major (C-style), starting from top-left
|
||||||
self.data = self.data.reshape((self.nrows, self.ncols))
|
self.data = self.data.reshape((self.nrows, self.ncols))
|
||||||
|
|
||||||
except Exception:
|
except Exception:
|
||||||
infile.close()
|
infile.close()
|
||||||
raise Nimrod.PayloadReadError
|
raise Nimrod.PayloadReadError
|
||||||
@@ -392,7 +392,9 @@ class Nimrod:
|
|||||||
# Use numpy slicing to extract the sub-array
|
# Use numpy slicing to extract the sub-array
|
||||||
# Note: y indices correspond to rows, x indices to columns
|
# Note: y indices correspond to rows, x indices to columns
|
||||||
# Slicing is [start:end], so we need +1 for the end index
|
# Slicing is [start:end], so we need +1 for the end index
|
||||||
self.data = self.data[yMinPixelId : yMaxPixelId + 1, xMinPixelId : xMaxPixelId + 1]
|
self.data = self.data[
|
||||||
|
yMinPixelId : yMaxPixelId + 1, xMinPixelId : xMaxPixelId + 1
|
||||||
|
]
|
||||||
|
|
||||||
# Update object where necessary
|
# Update object where necessary
|
||||||
self.x_right = self.x_left + xMaxPixelId * self.x_pixel_size
|
self.x_right = self.x_left + xMaxPixelId * self.x_pixel_size
|
||||||
@@ -435,7 +437,7 @@ class Nimrod:
|
|||||||
|
|
||||||
# Write raster data to output file using numpy.savetxt
|
# Write raster data to output file using numpy.savetxt
|
||||||
# This is significantly faster than iterating in Python
|
# This is significantly faster than iterating in Python
|
||||||
np.savetxt(outfile, self.data, fmt='%d', delimiter=' ')
|
np.savetxt(outfile, self.data, fmt="%d", delimiter=" ")
|
||||||
outfile.close()
|
outfile.close()
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -1,6 +1,6 @@
|
|||||||
[project]
|
[project]
|
||||||
name = "met-office"
|
name = "met-office"
|
||||||
version = "1.0.0"
|
version = "1.3.1"
|
||||||
description = "Convert .dat nimrod files to .asc files"
|
description = "Convert .dat nimrod files to .asc files"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.14"
|
requires-python = ">=3.14"
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ requires-python = ">=3.14"
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "met-office"
|
name = "met-office"
|
||||||
version = "1.0.0"
|
version = "1.3.1"
|
||||||
source = { virtual = "." }
|
source = { virtual = "." }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "numpy" },
|
{ name = "numpy" },
|
||||||
|
|||||||
Reference in New Issue
Block a user