Snappydata Versions Save

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

v0.9

6 years ago

The SnappyData team continues to march towards a 1.0GA and we are pleased to announce the availability of version 0.9 of the platform today. This release contains significant new functionality, several performance enhancements, design changes to make the platform scale better and a new and improved console that improves enterprise readiness of the SnappyData cluster.

In-memory but disk persistent, by default now Until this release, by default, all tables were only memory resident and required explicit configuration for disk persistence (e.g. using the 'persistent' clause in 'create table'). From this release, we make all tables all persist to disk, by default. You can explicitly turn this OFF for pure memory-only tables.

Memory Management:

Improved “Unified Memory Manager” with more accurate accounting of memory. The previous release could prematurely spill to disk or cause GC pauses (SNAP-1235).
Support for Off-Heap storage in column store (SNAP-1454). The previous release required users to over allocate Java heap memory to avoid GC pauses or exposed applications to an increased risk of stop-the-world GC pauses. In addition to performance benefits, off-heap storage contributes to predictable system performance and behavior and is absolutely recommended for all production deployments.

Performance Enhancements: Version 0.9 includes several product enhancements that contribute to improved product performance. These include

The disk storage design for Column tables is more optimized. Before, the logical disk storage unit was still a set of rows. Instead now, the unit is now a set of column values making queries that require faulting data from disk significantly more efficient (SNAP-990).
The Query engine now caches the physical plan as well as the generated code for queries. Spark, likewise SnappyData, dynamically generates JVM byte code for the query, compiles and caches this generated code so any subsequent execution of the same query is much faster. But, often queries are similar not the same. For instance, the bound constants change( common in Where clauses). This meant the compiled plans are all that useful. Now, the generated code tokenizes literals and constants so that subsequent similar queries with different bound values execute much faster. (SNAP-1346).
Previous to 0.9, SnappyData was not optimized for PreparedStatements (JDBC) when the query was routed to the Spark Catalyst engine. Now, it is. (SNAP-1323).

Scaling Improvements:

SnappyData offers a smart connector mode, which allows Spark applications running in a remote cluster to intelligently and efficiently (very high degree of parallelism) access data stored in a SnappyData cluster. Version 0.9 offers a redesigned smart connector which acts as a client to the SnappyData cluster, offering much high levels of scaling for both the client and also improves the ability of the cluster to handle such connections without impacting the cluster’s ability to scale (Previous versions of the connector had to join the cluster as a peer member limiting the scalability of the cluster) (SNAP-1286)

Enterprise Readiness:

Consistency improvements: This release introduces snapshot Isolation semantics, by default, while processing queries using an MVCC algorithm so queries are guaranteed to access a stable view of the database (SNAP-1304).
Pulse Console: SnappyData Pulse has been redesigned to provide both developers and operations personnel with useful insights into the running of the SnappyData cluster. Improvements include -- Redesigned member view which displays detailed member description, heap and off-heap usage along with snappy storage and execution splits -- Cluster level aggregate memory and CPU usage -- SQL tab that shows the SQL statements executed within the system with the ability to view query plans for the same

Select bug fixes and performance related fixes:

Starting version 0.9, row tables support the Boolean data type
Support for slash ('/') and special characters in column names (SNAP-1705).
Scans and ingest through code generation could fail if the generated code of a single method exceeds 64k (SNAP-1384).

For the complete list of tickets that were fixed in this release, see ReleaseNotes.txt.

Description of download artifacts:

Artifact Name	Description
snappydata-0.9-bin.tar.gz	Full product binary (includes Hadoop 2.7)
snappydata-0.9-bin.zip	Full product binary (includes Hadoop 2.7)
snappydata-0.9-without-hadoop-bin.tar.gz	Product without the Hadoop dependency JARs
snappydata-0.9-without-hadoop-bin.zip	Product without the Hadoop dependency JARs
snappydata-client-1.5.5.jar	Client (JDBC) JAR
snappydata-core_2.11-0.9.jar	Only dependency to connect to SnappyStore from Apache Spark 2.0.X cluster (Smart Connector mode)
snappydata-0.9-odbc32.zip	32-bit ODBC driver for 32-bit Windows. Extract and run the msi.
snappydata-0.9-odbc64.zip	64-bit ODBC driver for 64-bit Windows. Extract and run the msi.
snappydata-0.9-odbc32_64.zip	32-bit ODBC driver for 64-bit Windows. Extract and run the msi.
ODBC-and-Tableau-Setup.pdf	Installation instructions for the ODBC driver including Tableau setup.
odbc-snappydata.tdc	TDC file for Tableau setup (see setup guide for details)
snappydata-zeppelin-0.7.1.jar	The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7

v0.8

7 years ago

SnappyData 0.8 Release with the following major changes:

New features/Fixes

ODBC Driver and Installer. You can now connect to the SnappyData cluster using the SnappyData ODBC driver and execute SQL queries. (SNAP-1357)
Multiple Language Binding using Thrift Protocol. SnappyData now provides support for Apache Thrift protocol enabling users to access the cluster from other languages that are not supported directly by SnappyData. (SNAP-1313)
Insert Performance Optimizations - Insert into tables is much more optimized and performant now. A new insert plan has been introduced which uses code generation and a new encoding format. (SNAP-490)
Fixes backward compatibility with Spark 2.0.0. The 0.8 SnappyData release is based on the Spark 2.0.2 version. The SnappyData Smart connector is now backward compatible with Spark 2.0.0 and Spark 2.0.1
SnappyData JDBC now uses a new, more optimized Thrift based driver to communicate with the Data Servers.
Column table bloat issue - Bouncing of data servers (due to failure) could result in data accumulating in the Delta Row Buffers in column tables instead of being aged into the expected compressed columnar format resulting in bloat and inferior query performance. This has been addressed. (SNAP-1146)
SnappyData now supports "persistent UDFs". UDFs once registered are persisted in the Catalog and hence usable upon restarts. (SNAP-982)
For other bug-fixes, see release notes for more details.

Known issues

Inserting into or querying a column table with wide schema may fail with StackOverflowException due to a limitation of JVM. (SNAP-1384)

SnappyData Synopses Data Engine:

Sample selection logic enhanced. It can now select best suited sample table even if SQL functions are used on QCS columns while creating sample tables.
Poisson multiplicity generator logic for bootstrap is improved. Error estimated using bootstrap are now more accurate.
Improved performance of closed-form and bootstrap error estimations.

Description of download artifacts

Artifact Name	Description
snappydata-0.8-bin.tar.gz	Full product binary (includes Hadoop 2.7)
snappydata-0.8-bin.zip	Full product binary (includes Hadoop 2.7)
snappydata-0.8-without-hadoop-bin.tar.gz	Product without the Hadoop dependency JARs
snappydata-0.8-without-hadoop-bin.zip	Product without the Hadoop dependency JARs
snappydata-client-1.5.4.jar	Client (JDBC) JAR
snappydata-core_2.11-0.8.jar	Only dependency to connect to SnappyStore from Apache Spark 2.0.X cluster (Smart Connector mode)
snappydata-0.8.0.1-odbc32.zip	32-bit ODBC driver for 32-bit Windows. Extract and run the msi.
snappydata-0.8.0.1-odbc64.zip	64-bit ODBC driver for 64-bit Windows. Extract and run the msi.
snappydata-0.8.0.1-odbc32_64.zip	32-bit ODBC driver for 64-bit Windows. Extract and run the msi.
ODBC-and-Tableau-Setup.pdf	Installation instructions for the ODBC driver including Tableau setup.
odbc-snappydata.tdc	TDC file for Tableau setup (see setup guide for details)
pulse.war	Needed only in RowStore mode. Classes for Pulse UI.

Get the Zeppelin interpreter jar, compatible with Apache Zeppelin 0.7, for SnappyData.

v0.7

7 years ago

SnappyData 0.7 Release with the following major changes.

In sync and fully compatible with Apache Spark 2.0.2.
Try SnappyData without any download as a Spark dependency
20X faster than Spark in-memory Caching. Try simple perf example on your laptop. Some of the individual optimizations listed below.
Performance optimizations:
- New GROUP BY and HASH JOIN operators used with SnappyData storage tables that are 5-10X faster than the ones in Spark. (SNAP-1067)
- Support for plan caching to reuse SparkPlan, RDD and PlanInfo (SNAP-1191)
- Optimizations for single dictionary column with SnappyData's GROUP BY and JOIN operator that improve the performance further by 2-3X. (SNAP-1194)
- Pooled version of Kryo serializer including for closures. Spark updated to allow for pluggable closure serializer. (SNAP-1136)
- Column batch level statistics to allow query predicates to skip entire batches when possible. (SNAP-1087)
Reduce serialization overheads of biggest contributors in queries. (SNAP-1202)
Plan optimizations to minimize data shuffle and combine aggregates when possible. (SNAP-1260)
New SnappyData Dashboard as a an extension to Spark UI. Explore your SnappyData cluster and Spark artifacts in the same UI.
HowTos: Working code snippets of various features for developers to get started. Check out the docs for more details.
Amazon Web Services AMI and Docker image with SnappyData 0.7 now available. Refer to docs for more details.
Support for map, flatMap, filter, glom, mapPartition and transform APIs to SchemaDStream (SNAP-1182)
Use ConfigEntry mechanism for SnappyData properties (SNAP-1180)
INSTALL JAR utility to load application jars that are available to all the jobs submitted to SnappyData. This is in addition to the existing way of providing application jars using --jars in spark-submit.
EC2 scripts are now moved to a new repository with enhancements and fixes.
Several other bug-fixes and optimizations. See release notes for more details.

SnappyData Synopses Data Engine:

Row count for sample tables is now displayed in SnappyData Dashboard.
Enabling HA semantics and redundancy for sample tables.
Other bug-fixes and performance improvements.

Description of download artifacts

Artifact Name	Description
snappydata-0.7-bin.tar.gz	Full product binary (includes Hadoop 2.7)
snappydata-0.7-bin.zip	Full product binary (includes Hadoop 2.7)
snappydata-0.7-without-hadoop-bin.tar.gz	Product without the Hadoop dependency JARs
snappydata-0.7-without-hadoop-bin.zip	Product without the Hadoop dependency JARs
snappydata-client-1.5.3.jar	Client (JDBC) JAR
snappydata-core_2.11-0.7.jar	Only dependency to connect to SnappyStore from Apache Spark cluster (Smart Connector mode)
snappydata-ec2-0.7.tar.gz	Script to Launch EC2 instances on AWS

Get the Zeppelin interpreter jar, compatible with Apache Zeppelin 0.6.1, for SnappyData.

v0.6.1

7 years ago

SnappyData 0.6.1 (Row Store 1.5.2) Release with the following major changes over the previous release.

Failure in IMPORT causes the system to close region and network interfaces. So threads are not interrupted anymore. (SNAP-1138)
Added a service to publish store table size that is used for query plan generation. These stats are also published on Snappy store UI tab. (SNAP-1075)
Fixes for Streaming related issues after Spark 2.0 merge. (SNAP-1060, SNAP-1141, SNAP-1115)
Other bug-fixes. (SNAP-1083, SNAP-1113)

v0.6

7 years ago

SnappyData 0.6 Release with the following major changes over the previous release.

Spark 2.0 based - we merged with Apache Spark 2.0 and we remain fully compatible with Spark
20X Gains in performance - While the "full stage code generation" (vectorization) improvements in Spark gives good improvements, we extended the code generation to several critical areas and into access of the SnappyStore making Snappy attain 20X better performance than Spark cached DataFrames for Scan/aggregation queries.
Cloud service - this is our first release that bundles supports for launching SnappyData on AWS. As part of this service we also added deep integration for Apache Zeppelin so you can visualize results using Snappy cluster as a Spark as well as a database cluster for Analytics (SNAP-864, SNAP-978)
- Download and extract snappydata-ec2-0.6.tar.gz to start using it. Refer docs/snappyOnAWS.md.
Support for describe table and show table using SnappyContext. (SNAP-1044)
Support for multiple Hadoop versions. (SNAP-981)
Single install/replace jar utility across SnappyData cluster. (SNAP-293)
Support for CUBE/ROLLUP/GROUPING SETS through sql. (SNAP-824)
Support for window clauses and partition/distribute by clauses.
SnappyData interpreter for Apache Zeppelin. (SNAP-861)
Support for EXISTS from sql. (SNAP-734)
Fix column table row count in Spark UI. (SNAP-1047)
Supporting VARCHAR with size and processing STRING as VARCHAR(32762), by default. (SNAP-735)
Moved spark-jobserver to 0.6.2.
Several other bug-fixes and performance improvements.

SnappyData Synopses Data Engine (AQP):

Better accuracy, error estimates, High level accuracy contracts - we added many improvements in this area.
Support for functions in sample creation. (AQP-214)
Support float datatype for sample created on row table. (AQP-216)
Several bug-fixes and optimizations.

Download artifacts description

snappydata-0.6-bin.tar.gz ---> Full product binary
snappydata-0.6-bin.zip ---> Full product binary
snappydata-0.6-without-hadoop-bin.tar.gz---> Product without the Hadoop dependency JARs
snappydata-0.6-without-hadoop-bin.zip ---> Product without the Hadoop dependency JARs
snappydata-client-1.5.1.jar ---> Client (JDBC) JAR
snappydata-core_2.11-0.6.jar ---> Only dependency to connect to SnappyStore from Apache Spark cluster (Split mode)
snappydata-ec2-0.6.tar.gz ---> Script to Launch on Ec2
snappydata-zeppelin-0.6.jar ---> Apache Zeppelin interpreter

v0.5

7 years ago

SnappyData 0.5 Release with the following major changes over the previous release.

Two tools, VSD and Pulse, are now packaged into the SnappyData distribution.
Added new fields on the Snappy Store tab in Spark UI (SNAP-852).
A new tool to collect the debug artifacts like logs, stats file and stack dumps, automatically and output as a tar zipped file. Time range based collection is also provided.
SnappyData AQP:
- Optimizations of bootstrap for sort based aggregate.
- Minimize the query plan size for bootstrap.
- Optimized the Declarative aggregate function.
SnappyData RowStore:
- SnappyData RowStore 1.5 is now GA, which offers GemFireXD users the bits to upgrade to a much more robust and stable version of the product. More details here.
Several other bug fixes and test additions.

v0.4-preview

7 years ago

SnappyData 0.4 Preview Release with the following changes over the previous release.

New Java APIs for JobServer interfaces. (SNAP-760)
Python API for Snappy StreamingContext
Added quickstart example with Python API for SnappyData (SNAP-741)
Support for "spark.snappydata" properties (SNAP-606)
Snappy's extension of UnifiedMemoryManager (SNAP-810)
Enabled code generation for Column table scan (SNAP-623)
Several other bug fixes and new tests.

v0.3-preview

8 years ago

SnappyData 0.3 Preview Release with the following changes over the previous 0.2.1 Preview release.

Updated code to Apache Spark version 1.6.1, spark-jobserver to 0.6.1
Ability to run snappydata core against stock Apache Spark 1.6.1 in split-cluster mode.
Support for complex types: ARRAY (ArrayType), MAP (MapType), STRUCT (StructType), for column tables.
New Java and Python APIs for SnappyData additions to Spark and jobserver.
AQP additions:
- New closed form error estimate implementations that give vastly improved results with filters in queries
- Addition of closed form error estimate for COUNT
- Bootstrap based error estimates
- Updated implementation of AQP for Spark 1.6.x compatibility
Index implementation and API for column tables. These are distributed partitioned indexes that are stored like regular column tables (TBD: automatic selection of best index in plan generation)
Unified partitioning schema for Spark and store layers. This allows minimizing shuffle for both queries and inserts when the number of partitions in shuffle and store match.
New optimized SQL parser implementation that is orders of magnitude faster and more flexible than Spark SQL parser.
Added a script to collect logs, statistics, stack dumps for all data store nodes in the system (with optional time range)
Addition of a pure "rowstore" startup mode that will inhibit Spark layer and lead nodes.
Column and row tables now return proper sizeInBytes in plan generation to let Spark determine the best join order.
Fix for issues related to Row, InternalRow usage and conversions in streaming API.
Fix for row tables, the INSERT and PUT operations behave correctly now with former throwing constraint violation where appropriate.

v0.2.1-preview

8 years ago

SnappyData 0.2.1 Preview Release with the following changes over the previous 0.2 Preview release.

Update docs for snappy-store HDFS feature and include hbase jar in distribution for users that need the HDFS feature (issue #194)
Many more fixes for snappy-store test failures and updated precheckin target for combined report generation.
Fixing mismatch of message in an unsupported exception in snappy-store (SQLState=0A000.S.29)
Support for custom key class(property: K), value class (V), key decoder class (KD), value decoder class (VD) for direct kafka DataSource of CREATE STREAM

v0.2-preview

8 years ago

SnappyData 0.2 Preview Release with the following changes over the previous 0.1 Preview release.

Fixes for issues reported by users and related seen in testing:
- Exception from snappy-shell/JDBC client when number of columns selected is > 8
- Hang in operations after exceptions like above (pool has no available connections)
- Issues including hang in CREATE TABLE as SELECT with limit clause
Corrected published poms to remove dependency on pentaho related repos and jars
Fixes for issues when building using snappydata jars published on maven central for subprojects like snappy-spark (rather than building all from source)
Addition of a "precheckin" target in snappy-store that runs all available GemFireXD junit and distributed unit tests with fixes for failures seen in precheckin
Fixes/additions to docs and links that went in after the previous release
Allow build using Java 8 (some issue still reported by users on Mac platforms)
Fix for the print-stacks command-line option of snappy-shell to dump proper stacks from all nodes