Ngageoint Scale Versions Save

Processing framework for containerized algorithms

4.4.0

7 years ago

Dependencies

DC/OS 1.8
Docker 1.10
ElasticSearch 2.4
PostgreSQL 9.3+, PostGIS 2.0+
Vault 0.6.2+ (only if not using DC/OS Enterprise for secrets storage)

Deprecated

The following v4 REST API calls are now deprecated:
1. The /v4/sources/{id}/ API has been replaced by a v5 version. The new v5 version removes the "ingests" and "products" fields, which were expensive to include. To get the same information, users should now use /v4/sources/{id}/ingests/, /v4/sources/{id}/jobs/, and /v4/sources/{id}/products/. The new /v5/sources/{id}/ API also removes the ability to use the file name in the URL.
The v3 version of the REST API is now deprecated. We recommend that you migrate all use of the Scale REST API to use the new v4 REST API. There are only a few changes between the v3 and v4 versions:
1. The old log API /job-executions/{id}/logs/ is NOT available in v4 (returns 404).
2. For the job details API, v4 does not include the deprecated "input_files" and "products" fields while v3 still does.
3. For the recipe details API, v4 does not include the deprecated "input_files" field while v3 still does.

Known Issues

Scale's logging does not work correctly with any Docker newer than 1.10. This is due to a breaking change made to Docker's command line arguments. We recommend that you use Docker 1.10 for this version of Scale.

New Features

New capability for a batch to create recipes from source files using an existing recipe trigger rule or a custom rule (Issue 585)
New REST API to retrieve the jobs associated with a source file (Issue 740)
New capability to scan a workspace to ingest existing files (Issue 589, Issue 742)
New page for creating and managing batches (Issue 739, Issue 769, Issue 791)
New capability to define settings as being 'secret', which causes Scale to retrieve them from a secured Vault instance and does not store them insecurely (Issue 622, Issue 777)
New v5 REST API for viewing the details for a source file (better performance) (Issue 797)
New capability to set a job type's requirement for shared memory. The shared memory requirement will be passed to the job container as a Docker parameter, removing the previous need for algorithms using larger shared memory amounts to run in privileged mode. (Issue 579)
New capability that allows a job type interface to define mounts that the jobs require in their containers. This can be used to mount needed reference data. (Issue 762)

Enhancements/Updates

Improved performance of the various product REST APIs (Issue 754)
Scale web UI now allows access to navigation links while a page is loading (Issue 746)
Updated Scale documentation (Issue 223, Issue 338, Issue 620)
Added pan and zoom capabilities to ingest details UI page (Issue 763)
Scheduler now uses LIFO (last-in, first-out) instead of FIFO (first-in, first-out) as a secondary field for ordering; the priority field is still the primary field (Issue 758)
Improved the scheduler's handling of nodes (Issue 775)
Improved file name search on ingest UI page (Issue 767)
Added capability to switch the scheduler between LIFO and FIFO scheduling modes using a database field (Issue 794)
Improved scheduler's ability to handle scheduling job executions with configuration errors (Issue 774)

Bug Fixes

Fixed issue where the logstash and low disk space health check failures would cause a node to permanently remain in a degraded state (Issue 749)
Fixed rare issue that would cause a scheduler crash (Issue 748)
Fixed incorrect workspace URL on an ingest page (Issue 730)
Fixed bug where total on metrics UI page was being cut off (Issue 757)
Fixed issue where the API for creating a Strike would return a 500 error if the description field was not provided (Issue 765)
Fixed bug where daily metrics were broken by scan ingests (Issue 793)

JSON Schema Changes

For the new 'secret' settings feature, the job type interface schema has been updated to version 1.3 where a new field allows the user to define a setting as requiring protection as a secret.
For the new mount feature, the job type interface schema has been updated to version 1.4 where a new field allows the user to define mounts used by the job type.
For the new mount feature, the job configuration schema (renamed from the job type configuration schema) has been updated to version to allow configuring how Scale should provide the needed mounts to the job containers.

Database Migrations

For the new scan capability, a new table called scan is created. Also a new nullable field called scan has been added to the ingest table, as well as making the ingest.strike field nullable.
The description field in the strike table has been changed to a nullable text field.
The scheduler table has a new field called queue_mode which allows control of whether the scheduler uses LIFO or FIFO.
The job type table has a new field called shared_mem_required, indicating the shared memory needed to run the job type.

4.3.0

7 years ago

Deprecated

The v3 version of the REST API is now deprecated. We recommend that you migrate all use of the Scale REST API to use the new v4 REST API. There are only a few changes between the v3 and v4 versions:
1. The old log API /job-executions/{id}/logs/ is NOT available in v4 (returns 404).
2. For the job details API, v4 does not include the deprecated "input_files" and "products" fields while v3 still does.
3. For the recipe details API, v4 does not include the deprecated "input_files" field while v3 still does.

New Features

Scale will now submit tasks to pull the Scale Docker image on each node after the node has completed initial cleanup. This ensures that the Docker pull occurs within a task, allowing for more control and prevents initial pre-tasks and system jobs from failing due to Docker pull issues. (Issue 614)
The timeout system has been modified so that the job type timeout applies specifically to the main algorithm task, not the entire job execution. In addition, appropriate timeouts and reconciliation is performed for each task's staging and running duration. This will improve Scale's resiliency, as well as convey the actual problem when a timeout occurs. (Issue 596)
Nodes now run health check tasks periodically that monitor various health conditions of the node. If a node is unhealthy, it goes into a DEGRADED state that prevents the running of new jobs. This also removes the old automatic node pausing capability. (Issue 713, Issue 723)

Enhancements/Updates

System job timeouts are now system errors instead of algorithm errors (Issue 632)
Improved build system to optionally install/not install epel-release (Issue 676)
Created system errors for missing and deleted job input files, as well as added new field to indicate which errors should be automatically retried (Issue 700)
The scale_file, product_file and source_file tables were refactored to improve database performance when querying for files (Issue 708)
New REST API for returning the products from a given source file (Issue 729)
Added ability to filter by data time to source file REST API (Issue 669)

Bug Fixes

Fixed bug with extra inputs in a job's data JSON that are not defined within the job type interface (Issue 666)
Fixed bug with S3 bucket validation (Issue 667)
Fixed bug where timed out job executions were failing to be cleaned up (Issue 704)
Fixed bug when job type default settings are defined with non-string values (Issue 722)
Fixed bug where new jobs created by re-processing were not correctly marked as BLOCKED (Issue 711)

Database Migrations

A new field, should_be_retried, is added to the error table to indicate which errors should be automatically retried.
There are significant database changes as a result of the scale_file refactor. The ingest and file_ancestry_link tables have their source_file and product_file foreign keys changed to be based on scale_file.id. The fields in source_file and product_file are added to the scale_file table. Temporary copies of the source_file and product_file tables are made, those two tables are dropped, and the copied source and product data is merged into the scale_file table. In addition, the scale_file table has a new field called file_type that indicates if a file is a 'SOURCE' or 'PRODUCT'. New indexes are set up for the new scale_file fields. This migration is very expensive and can take a considerable amount of time depending on the size of the scale_file table. As one reference point, the migration took approximately 3 hours to perform with a scale_file table containing approximately 4 million rows.
A new field, batch_id, is added to the file_ancestry_link table and is populated with the correct, historical batch IDs. This migration takes time proportional to the number of recipes that have been created through batches.
The scheduler table fields max_node_errors and node_error_period are removed.

4.2.0

7 years ago

Deprecated

The v3 version of the REST API is now deprecated. We recommend that you migrate all use of the Scale REST API to use the new v4 REST API. There are only a few changes between the v3 and v4 versions:
1. The old log API /job-executions/{id}/logs/ is NOT available in v4 (returns 404).
2. For the job details API, v4 does not include the deprecated "input_files" and "products" fields while v3 still does.
3. For the recipe details API, v4 does not include the deprecated "input_files" field while v3 still does.

New Features

New capability to define configurable settings for job types. Configurable settings are key/value pairs that can be passed into a job's interface. They should reflect environmental concerns (e.g. database hostname), not affect the actual products produced by an algorithm. The setting names accepted/required by a job type can be defined in the latest version of the Job Interface schema. The default setting values for a job type can be defined within a new Job Type Configuration schema stored within the job_type table. Unlike a job's data (inputs), which are static, settings can be edited at any time and apply immediately for any job execution running of that job type. (Issue 573, Issue 665)
The Job Interface schema now allows the use of environment variables for passing information to an algorithm in addition to command line arguments. The environment variables defined in the interface may make use of the same input data/setting substitution that the command line arguments use. (Issue 573)

Enhancements/Updates

Update to detect errors where a Docker container terminates unexpectedly (Issue 625)
Update to Scale container cleanup to improve efficiency and possibly eliminate a failure condition (Issue 678)

Bug Fixes

Fixed critical bug where a node that changes agent ID would stop receiving new tasks from the scheduler (Issue 660)
Fixed UI bug where log polling did not stop after closing the window (Issue 658)
Fixed the query for unpublishing products to be much more efficient (Issue 653)
Improved efficiency for the ingest status REST API to prevent timeouts on the data feed page (Issue 637, Issue 663)
Property inputs now affect created product UUIDs (Issue 685)

JSON Schema Changes

The Job Interface schema has been updated to version 1.2 to add the following new capabilities:
1. Defining environment variables that get passed to the job when it runs
2. Defining configurable settings that can be set in Scale and get passed to the job as environment variables or command line arguments.
The Job Configuration schema has been updated to version 1.1 to contain the values of the configurable settings that should be passed to the job execution.

Database Migrations

A new configuration field is added to the job_type table to contain the default setting values that should be passed.
An index was added to the data_ended field in the scale_file table.
Two new fields, data_started and data_ended, were added to the ingest table, and an index was added to the ingest_ended field. The data time metrics displayed on the data feed page (from the ingest status API) are now based on the ingest table's fields (data_started, data_ended) instead of scale_file. Going forward whenever a source file is parsed, it's data time is copied into both the scale_file and ingest tables. Users may wish to backfill data by copying the data_started and data_ended fields from the scale_file table to the same fields in the linked rows in the ingest table (those fields will be null for ingest rows created prior to this migration). Such a copy may be very expensive for large numbers of ingested files, so users are left to do as much backfilling as they desire. If backfilling is not done, the history on the data feed page (when based on data time) will be blank as old ingests will have null fields.

4.1.1

7 years ago

Bug Fixes

Fixed inefficient unpublish product query to support reprocess. (Issue 653)

4.1.0

7 years ago

Deprecated

The v3 version of the REST API is now deprecated. We recommend that you migrate all use of the Scale REST API to use the new v4 REST API. There are only a few changes between the v3 and v4 versions:
1. The old log API /job-executions/{id}/logs/ is NOT available in v4 (returns 404).
2. For the job details API, v4 does not include the deprecated "input_files" and "products" fields while v3 still does.
3. For the recipe details API, v4 does not include the deprecated "input_files" field while v3 still does.

New Features

Scale now performs clean up of any Docker containers and volumes it creates. The Scale Cleanup system job has been removed. As Scale job executions finish, the scheduler tracks them and periodically submits clean up tasks that delete the containers and volumes associated with the finished executions. Also when a node is first registered with the scheduler, it goes into a state where a clean up task removes all non-running containers and dangling volumes before the node begins running Scale jobs. Issue 219

Enhancements/Updates

Batches can now be run on recipes based on input file data time (Issue 580)
Scale webserver logs are now captured (Issue 584)
Improved display and unit conversion of metric page total (Issue 563)
Improved table sorting on failure rates page (Issue 567)
Inactive jobs can now be filtered out of the REST API (Issue 604)
The ingest table can now be filtered by Strike (Issue 323)
Added link to job details on Strike detail page (Issue 571)
Added meta-data labels to the Scale Docker image (Issue 630)
Improved Scale's handling of lost Mesos tasks and ability to reconcile long-running tasks (Issue 629)

Bug Fixes

Fixed DC/OS health check for the Scale webserver (Issue 568)
Fixed REST API to work within the context of the DC/OS /service URL (Issue 569)
Fixed S3 Strike monitor to periodically reload configuration (Issue 617)
Fixed issue in batch creator job (Issue 603)
Fixed job type filter on failure rates page (Issue 566)
Fixed bug with querying ElasticSearch for job execution logs (Issue 633)
Fixed bug in handling empty S3 credential fields (Issue 615)
Fixed deployment bug where sometimes the Scale webserver or logstash apps would not destroy/re-create correctly (Issue 594)

Database Migrations

A new table called recipe_file is created to support Issue 580. It will be populated with the input files for each existing recipe, possibly taking a while to complete based upon how large your recipe table is.
Fields related to the old Scale clean up job are removed from the job_exe and job_type tables.
The maximum character length for fields within the task_update table are increased to 250.

4.0.0

7 years ago

Breaking Changes

The new logging system requires Scale to have access to a logstash Docker image and access to an ElasticSearch cluster. New environment variables are required to connect Scale to logstash (env SCALE_LOGGING_ADDRESS) and ElasticSearch (env SCALE_ELASTICSEARCH_URL). See the documentation for more info.
Use of the REST API without a version in the URL is now removed. All REST API calls now require an explicit version specified in the URL (ex. http://host.domain/api/v3/status/). Right now both v3 and v4 versions are supported. See the REST API documentation for more details.

Deprecated

The v3 version of the REST API is now deprecated. We recommend that you migrate all use of the Scale REST API to use the new v4 REST API. There are only a few changes between the v3 and v4 versions:
1. The old log API /job-executions/{id}/logs/ is NOT available in v4 (returns 404).
2. For the job details API, v4 does not include the deprecated "input_files" and "products" fields while v3 still does.
3. For the recipe details API, v4 does not include the deprecated "input_files" field while v3 still does.

Known Issues

Over time Docker volumes created by Scale will accumulate on the nodes. These volumes will need to be deleted on a regular basis (recommended at least hourly). A script can be set up to run under cron or as a DC/OS task that periodically deletes any dangling Docker volumes.

New Features

New logging capability utilizing logstash and ElasticSearch (Issue 15, also many other issues). The new logging capability greatly improves scheduler performance as the scheduler no longer has to collect logs after a job completes. Also the logging display in the Scale UI now joins messages from stdout and stderr into a single stream that highlights the stderr messages in red. With log messages stored in ElasticSearch, users now have the option of using additional tools on top of ElasticSearch (such as Kibana) for searching/analyzing Scale logs.
New env var CONFIG_URI allows Scale to pass a Docker credentials file to Mesos for running Docker tasks (Issue 370)
New UI page for creating and managing Strikes (Issue 287)
New S3 mounting feature to improve performance for jobs that read only a small part of a large file from an S3 bucket (Issue 389)
New REST API allowing the user to re-process a recipe (Issue 457)
New capability to create "batches" that re-process a large collection of recipes through the REST API (Issue 54, Issue 63, Issue 64, Issue 65)
v3 REST API is deprecated in favor of the new v4 REST API; also the deprecated ability to omit a version in the URL (which defaulted to v3) has been removed (Issue 536)

Enhancements/Updates

Associated jobs and products are now displayed on the UI ingest detail page (Issue 380)
Minor improvement to handling of Scale Docker image tagging (Issue 419)
Added documentation for Job Configuration, a JSON document schema used internally by Scale (Issue 218)
UI now uses the REST API version prefix (Issue 411)
Metrics page now shows total sum/count (Issue 433)
System job types can now be paused (Issue 393)
Improved scheduler response to Mesos status updates; also the database now stores all Mesos task status updates (Issue 412)
Added file name filtering on the ingests page (Issue 60)
Added additional task/execution information to the job detail page (Issue 430)
Errors with a results manifest now result in the job being tagged with algorithm errors instead of system errors (Issue 129)
The Scale scheduler and jobs now reconnect to logstash if it goes down and comes back up (Issue 478)
The Scale web server and logstash now reconnect to ElasticSearch if one or more executors go down (Issue 442)
On Scale scheduler restart, the Scale logstash container is also restarted automatically (Issue 441)
Scale UI displays information on superseded recipes (Issue 345)
Scale UI displays information on superseded jobs (Issue 344)
Updated job REST API to allow filter control for superseded jobs (Issue 497)
Updated recipe REST API to allow filter control for superseded recipes (Issue 495)
Updated sources REST API to include superseded fields (Issue 483)
Scale now also recognizes the ".nitf" extension for NITF files (Issue 479)
Scale UI now tails job logs using the new logging system (Issue 472)
Improvements to the Scale DC/OS package (Issue 507, Issue 546, Issue 547, Issue 557)

Bug Fixes

Fixes made to the UI nodes page (Issue 410, Issue 408)
Unit test correction involving the mesos.interface Python library (Issue 391)
Fixed job re-queue button on job detail page (Issue 427)
Fixed recipe trigger page not handling data_types field (Issue 428)
Fixed pausing/resuming a node on the nodes page (Issue 426)
Fixed various issues with nodes REST API (Issue 418)
Fixed issue where finished tasks were not getting exit codes set to 0 (Issue 359)
Fixed bug where unset CONFIG_URI environment variables caused the scheduler to fail (Issue 456)
Fixed error display on the log page (Issue 473)
Fixed bug preventing the saving of data types in recipe triggers (Issue 474)
Fixed bug where incorrect use of the HTTP IF_MODIFIED_SINCE header was causing only partial logs to be visible within Scale (Issue 434)
Fixed bug where not all nodes would appear on the nodes page (Issue 466)
Fixed bug with creating a new Strike on the Strike page (Issue 460)
Fixed bug where HTTP 500 was returned when queuing a new job via the REST API (Issue 508)
Improved performance for post-task database query (Issue 44)
Fixed bug where database integer overflows caused failures in the Scale daily metrics generation (Issue 525)
Bug fix for the Re-queue Job REST API (Issue 532)
Fixed bug causing job failures for optional inputs (Issue 485)
Bug fix so that invalid geometries in results manifests result in an "Invalid Results Manifest" error (Issue 509)
Removed unnecessary stderr logging from pre and post tasks (Issue 446)
Bug fix allowing the UI recipe editor to accept not filling in a media type (Issue 511)

Database Migrations

The job_exe current_stdout_url and current_stderr_url fields are removed as part of the new logging capability.
A new table named task_update is created to hold all task status updates from Mesos.
New tables are created for the new batch feature.
Some metrics table fields have been converted from Integer to BigInteger to handle large metric values.

3.1.0

7 years ago

Scale release 3.1.0

Known Issues

Over time Docker volumes created by Scale will accumulate on the nodes. These volumes will need to be deleted on a regular basis (recommended at least hourly). A script can be set up to run under cron or as a DC/OS task that periodically deletes any dangling Docker volumes.

New Features

Refactored ingest process to allow Strike monitors; one monitor allows the previous capability of monitoring a file system directory and a new monitor allows the polling of an SQS queue for file creation notifications for an S3 bucket. This requires mounting each Strike's mount field location on each node (i.e. a Strike with mount "host:/my/path" must have that NFS remote path mounted on each node as "/my/path"). This is needed for an automatically created host workspace for each Strike that represents the old mount location. (Issue 19)
New REST API for retrieving a list of currently PENDING jobs, similar to the list of RUNNING jobs (Issue 283)
New REST API to retrieve source file details (Issue 349)
Implemented product superseding (Issue 53)
Implemented recipe superseding (Issue 52)
New UI page to create/configure workspaces (Issue 254)
New workspace broker that handles AWS S3 buckets (Issue 18)
Created future standard for executable algorithm containers (Issue 324): Seed

Enhancements/Updates

Improvement to UI Nodes page (Issue 331)
Minor documentation updates (Issue 268)
Updated ingest REST API to allow filtering on Strike (Issue 321)
Update to allow multiple Scale frameworks in Mesos without volume name collisions (Issue 313)
REST API updates to support superseding (Issue 58, Issue 59, Issue 61, Issue 62)
Improved datagrid scrolling (Issue 362)
Updated S3 broker to handle EC2 role-based authentication (Issue 360)
Re-worked Job Load page in the UI (Issue 284)
Mesos now performs a force pull before running each Docker task (Issue 263)

Bug Fixes

Fixed issue where the scheduler could become stuck in a bad state without properly shutting down all threads (Issue 192)
Fixed issue where failed system jobs were marked with algorithm errors instead of system errors (Issue 188)
Fixed issue where job table would disappear (Issue 376)
Fixed issue creating new workspaces (Issue 363)
Fixed ordering on job type failure rate page (Issue 351)
Fixed detection of task launch errors (Issue 355)
Fixed date/time formatting in the UI (Issue 353)
Fixed hanging issue in UI when re-queuing jobs (Issue 329)
The UI now displays metrics in useful human-readable units (Issue 341)
The UI now longer creates a hyperlink for files with no valid link (Issue 340)
The UI now correctly handles more than 100 job types (Issue 330, Issue 316)
Fixes for recipe editor (Issue 295)

JSON Schema Changes

Strike configuration JSON schema updated from version 1.0 to 2.0 to support the new monitor system. Old 1.0 Strike configurations will be automatically converted by Scale, including automatically creating a new host mount workspace for each Strike that represents the mount field in the 1.0 version of the configuration. This requires mounting the 1.0 mount field on each node, see the New Features section.

Database Migrations

The ingest table is updated to support the new ingest process. This will cause ingest jobs that were created prior to this update to always fail. In order to re-queue old ingest jobs, you must cancel the job and copy the applicable source data file back into the monitored Strike directory to re-ingest.
The recipe_job table will be entirely re-written in order to support recipe superseding. This migration may take a very long time to complete.

Deprecations

3.0.0

8 years ago

This is the first official release of Scale.

If you are moving to 3.0.0 from a previous version, note that Scale itself is now fully Dockerized. This may require you to update your previous workspace and Strike configurations. Please consult the documentation for details.

The database migrate needed to update your Scale to 3.0.0 may take a very long time to complete, due to re-writing the job table and adding fields to the job_exe table.

The Scale REST API now includes URL versioning. Currently the old URLs without the version specified will default to v3, but this is deprecated and will be removed in a future version. Please consult the documentation for details.

SAMPLE_DATA

8 years ago

This is a "release" containing some sample landsat data for use with the vagrant quickstart. The code state at this tag should be suitable for initial strike processing of the ingest.