Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve storage aggregation #882

Merged
merged 2 commits into from
Apr 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bin/xdmod-ingestor
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ function main()
}

if ($realmToAggregate == 'storage' || $realmToAggregate === false) {
$dwi->aggregateStorageData();
$dwi->aggregateStorageData($lastModifiedStartDate);
}
} catch (Exception $e) {
$logger->crit(array(
Expand Down
11 changes: 9 additions & 2 deletions classes/OpenXdmod/DataWarehouseInitializer.php
Original file line number Diff line number Diff line change
Expand Up @@ -299,16 +299,23 @@ public function aggregateCloudData($lastModifiedStartDate)
* Aggregate storage data.
*
* If the storage realm is not enabled then do nothing.
*
* @param string $lastModifiedStartDate Aggregate data ingested on or after
* this date.
*/
public function aggregateStorageData()
public function aggregateStorageData($lastModifiedStartDate)
{
if (!$this->isRealmEnabled('Storage')) {
$this->logger->notice('Storage realm not enabled, not aggregating');
return;
}

$this->logger->notice('Aggregating storage data');
Utilities::runEtlPipeline(['xdw-aggregate-storage'], $this->logger);
Utilities::runEtlPipeline(
['xdw-aggregate-storage'],
$this->logger,
['last-modified-start-date' => $lastModifiedStartDate]
);
$filterListBuilder = new FilterListBuilder();
$filterListBuilder->setLogger($this->logger);
$filterListBuilder->buildRealmLists('Storage');
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
"conversions": {
"start_day_id": "DATE_FORMAT(dt, '%Y00%j')",
"end_day_id": "DATE_FORMAT(dt, '%Y00%j')"
},
"overseer_restrictions": {
"last_modified_start_date": "last_modified >= ${VALUE}",
"last_modified_end_date": "last_modified <= ${VALUE}"
}
},
"source_query": {
Expand Down
17 changes: 17 additions & 0 deletions docs/ingestor.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,23 @@ Aggregate:

$ xdmod-ingestor --aggregate=cloud --last-modified-start-date "$last_modified_start_date"

**Storage:**

If you do not have jobs data and/or wish to break down your ingestion process to
exclusively ingest storage data, you may do so as such.

Set timestamp:

$ last_modified_start_date=$(date +'%F %T')

Ingest storage logs:

$ xdmod-ingestor --datatype=storage

Aggregate:

$ xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"

Help
----

Expand Down
21 changes: 15 additions & 6 deletions docs/shredder.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ In order to make data available to the Open XDMoD portal you will need
to use the shredder utility. If you followed the install guide, you will
have already used the shredder to populate your database. In addition to
the install process, this program is typically used once a day to add
jobs from the the previous day to the database.
jobs from the previous day to the database.

Help
----
Expand Down Expand Up @@ -50,9 +50,10 @@ cluster name.
Log Format
----------

You must specify the format of the log files to be shredded. For HPC job accounting data, the
format depends upon the resource manager; for Cloud data the format should match that of
the event logs.
You must specify the format of the log files to be shredded. For HPC job
accounting data, the format depends upon the resource manager. For cloud data
the format should match that of the event logs. There is only one supported
format for storage data.

**Jobs:**

Expand Down Expand Up @@ -80,11 +81,19 @@ The convention for shredding cloud files is identical to job data:
$ xdmod-shredder -f genericcloud ...
$ xdmod-shredder -f openstack ...

**Storage:**

The shredder accepts one format for storage data. See the [Storage
Metrics](storage.md) documentation for an example. The convention for
shredding storage files is identical to job data:

$ xdmod-shredder -f storage ...

Input Source
------------

Files may be shredded one at a time by running the following command.
Please note that this is **not** currently supported for Cloud files:
Files may be shredded one at a time by running the following command. Please
note that this is **not** currently supported for cloud and storage files:

$ xdmod-shredder -i file ...

Expand Down
9 changes: 7 additions & 2 deletions docs/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,10 @@ $ acl-config && acl-import

## Data Ingestion

Storage data is shredded and ingested using the [`xdmod-shredder`](shredder.md)
and [`xdmod-ingestor`](ingestor.md) commands. Please see their respective
guides for further information.

All of the following commands must be executed in the order specified below to
fully ingest storage data into the data warehouse.

Expand All @@ -205,6 +209,7 @@ directory even if they have already been ingested.
Ingest and aggregate data:

```
$ xdmod-ingestor --ingest --datatype storage
$ xdmod-ingestor --aggregate=storage
$ last_modified_start_date=$(date +'%F %T')
$ xdmod-ingestor --datatype storage
$ xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"
```
10 changes: 6 additions & 4 deletions open_xdmod/modules/xdmod/integration_tests/scripts/bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ then
for storage_dir in $REF_DIR/storage/*; do
sudo -u xdmod xdmod-shredder -f storage -r $(basename $storage_dir) -d $storage_dir
done
sudo -u xdmod xdmod-ingestor --ingest --datatype storage
sudo -u xdmod xdmod-ingestor --aggregate=storage
last_modified_start_date=$(date +'%F %T')
sudo -u xdmod xdmod-ingestor --datatype storage
sudo -u xdmod xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"
sudo -u xdmod xdmod-import-csv -t names -i $REF_DIR/names.csv
sudo -u xdmod xdmod-ingestor
php /root/bin/createusers.php
Expand All @@ -56,8 +57,9 @@ then
for storage_dir in $REF_DIR/storage/*; do
sudo -u xdmod xdmod-shredder -f storage -r $(basename $storage_dir) -d $storage_dir
done
sudo -u xdmod xdmod-ingestor --ingest --datatype storage
sudo -u xdmod xdmod-ingestor --aggregate=storage
last_modified_start_date=$(date +'%F %T')
sudo -u xdmod xdmod-ingestor --datatype storage
sudo -u xdmod xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"

sudo -u xdmod xdmod-shredder -r openstack -d $REF_DIR/openstack -f openstack
sudo -u xdmod xdmod-ingestor
Expand Down
16 changes: 16 additions & 0 deletions open_xdmod/modules/xdmod/regression_tests/post_ingest_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,19 @@ then
exit 1

fi

# Shred, ingest and aggregate storage data for a single day and check to make
# sure that only one period is aggregated for each unit.
for storage_dir in $REF_DIR/storage_upgrade/*; do
sudo -u xdmod xdmod-shredder -f storage -r $(basename $storage_dir) -d $storage_dir
done
last_modified_start_date=$(date +'%F %T')
sudo -u xdmod xdmod-ingestor --datatype storage
agg_output=$(mktemp --tmpdir storage-aggregation-XXXXXXXX)
sudo -u xdmod xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date" | tee $agg_output
for unit in day month quarter year; do
if ! grep -q "unit: $unit, periods: 1," $agg_output; then
echo Did not aggregate 1 period of storage data for unit $unit
exit 1
fi
done
chakrabortyr marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
[
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 798776,
"hard_threshold": 25000001536000,
"logical_usage": 496090112000,
"mountpoint": "/data",
"pi": "ibech",
"resource": "recex",
"soft_threshold": 25000001536000,
"user": "ibech"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 105193763,
"hard_threshold": 20000002048000,
"logical_usage": 17666970880000,
"mountpoint": "/data",
"pi": "magwa",
"resource": "recex",
"soft_threshold": 20000002048000,
"user": "magwa"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 32673,
"hard_threshold": 10005000192000,
"logical_usage": 9928160640000,
"mountpoint": "/data",
"pi": "sanma",
"resource": "recex",
"soft_threshold": 10000003072000,
"user": "sanma"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 471,
"hard_threshold": 1010008064000,
"logical_usage": 124288000,
"mountpoint": "/data",
"pi": "fulma",
"resource": "recex",
"soft_threshold": 1000005632000,
"user": "shttr"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 10439,
"hard_threshold": 10005000192000,
"logical_usage": 9219064064000,
"mountpoint": "/data",
"pi": "norpa",
"resource": "recex",
"soft_threshold": 10000003072000,
"user": "norpa"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 943358,
"hard_threshold": 10005000192000,
"logical_usage": 3707438080000,
"mountpoint": "/data",
"pi": "aytinis",
"resource": "recex",
"soft_threshold": 10000003072000,
"user": "aytinis"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 3,
"hard_threshold": 2000003072000,
"logical_usage": 35556608000,
"mountpoint": "/data",
"pi": "fulma",
"resource": "recex",
"soft_threshold": 1000001536000,
"user": "setusca"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 1221435,
"hard_threshold": 30000001024000,
"logical_usage": 29183970432000,
"mountpoint": "/data",
"pi": "chaff",
"resource": "recex",
"soft_threshold": 30000001024000,
"user": "chaff"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 2151114,
"hard_threshold": 10005000192000,
"logical_usage": 8970264832000,
"mountpoint": "/data",
"pi": "leske",
"resource": "recex",
"soft_threshold": 10000003072000,
"user": "leske"
},
{
"dt": "2019-01-06T10:01:19Z",
"file_count": 1,
"hard_threshold": 10005000192000,
"logical_usage": 0,
"mountpoint": "/data",
"pi": "fulma",
"resource": "recex",
"soft_threshold": 10000003072000,
"user": "camwa"
}
]
Loading