Commit 9bf68beb authored by Carlos H. Brandt's avatar Carlos H. Brandt
Browse files

Init script to compute datasets sizes

parent 6628bae9
This script calculates the size volume of PDS* datasets.
In general, actually, this script is meant to read the sizes of files in a
given file server; Initially thought to access HTTP file servers, FTP may
be included at some point if necessary.
PDS data archives are typically provided through HTTP file servers, structured
in a predictive way -- following mostly PDS-3 standard.
The script uses Requests and Pandas to read the index of files from a given
URL. From the table containing files of interest (look for 'match' attribute)
information of size, for instance, is kept. Size values are given in Megabytes.
* PDS stands for Planetary Data Systems
# Sizes are used in units of Megabytes
_factor = {
'K':0.001, # If value in Kilobytes, divide by K
'M':1, # If value in Megabytes, do nothing
'G':1000 # If value in Gigabytes, multiply by K
def url_table2df(url, match=None):
Return a DF from first table of 'url' with files matching 'match'
import pandas as pd
tabs = pd.read_html(url, match=match)
tab = tabs[0]
# Remove columns and rows with invalid values
tab = tab.dropna(axis=1, how='all').dropna()
# Transform file sizes to Megabytes
tab['Size'] = tab['Size'].apply(lambda s:s[:-1]*_factor[s[-1].upper()])
return tab
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment