Qri Versions Save

you're invited to a data party!

v0.9.4

4 years ago

This patch release fixes a number of FSI (file system integration) issues and infrastructure changes that will improve Desktop. These include the restoration of the validate command, handling certain changes to the file system done outside of qri, improved logging, and Windows bug fixes.

See the CHANGELOG.md for the full description of changes.

v0.9.3

4 years ago

This patch release includes bug fixes and improvements around the working directory, in particular doing better with removing datasets, and generating automatic commit messages. There are also some small feature changes that are mostly laying the groundwork for future features.

See the CHANGELOG.md for the full description of changes.

v0.9.2

4 years ago

In this patch release we're fixing a bunch of tiny bugs centered around removing datasets, and adding methods for column statistics

📊 Get to know your data with stats

This release adds support for stats calculation. The easiest way to see stats is to get 'em:

$ qri get stats me/dataset

# getting stats also works in an FSI-linked directory
# so you can drop the dataset reference & just type:
$ cd /path/to/linked/dataset/directory
$ qri get stats

In both cases you'll get a JSON document of stats, with one stat aggregating each column in your datsaet. The type of stat created depends on the data type being aggregated. Here's the table of stats calculated so far:

column data type stat type notes
string string Calculates a term frequency If there are fewer than 10,000 unique values, fequencies that only occur once aren't listed in frequency map and instead increment a "unique" count.
number numeric Calculates a 10-bucket histogram, as well as min, max, mean, median.
boolean boolean Calculates a true / false / other count
null null counts the number of null values

v0.9.1

4 years ago

This release brings first-class support for Readmes, adds a bunch of stability, and sets the table for exciting collaboration features in the future.

📄 Qri now supports readme!

This release brings support for a new dataset component, readmes! Following in a long tradition of readme's in the world of software. Readme's are markdown documents for explaining your dataset in human terms.

The easiest way to create a readme is by creating a file called readme.md in an FSI-linked directory. Qri will pick up on the file & add it to your dataset. You can see what the rendered HTML version looks like by running qri render in an FSI-linked directory.

In the future, we're excited to build out the feature set readme's offer, and think they're a better long-term fit for us than the generic notion of our existing viz component. Readme's differ from viz by not allowing generic script inclusion, which allows us to present them in a safer sandbox-like environment. This fits well with our story around transform scripts and the general expectation that scripts Qri interacts with will come with a safer execution properties.

With this release, support for readme's in qri.cloud and desktop is right around the corner.

Happy note taking!

📘 Introducing Logbook

video!

Until now qri has used stored datasets as it's source of history. Qri keeps commit information in the dataset itself, and creates a log of datasets by having each dataset reference the one before it. Keeping commits in history has a number of advantages:

  • all datasets are attributed to the user that made them
  • all datasets have an accurate creation timestamp
  • all datasets include any notes the author made at the time
  • all of these these details are part of the dataset, and move with it.

We've gone a long way with this simplistic apporoach, but using datasets as the only source of history has one major limitation: the history of a dataset is tied to the data itself. This means you can't uncover the full history of a dataset unless you have all versions of a dataset stored locally. Logbook fixes that problem.

Logbook is a coordination tool for talking about who did what, without having to move around the data itself. This means Qri can tell you meaningful things about dataset versions you don't have. This will make syncing faster, and forms the basis for collaboration.

To make use of logbook, all you have to do is... nothing! Logbook is a transparent service that overlays onto traditional Qri commands. You'll see some new details in commands like qri log and a few new plumbing commands like qri fetch and qri logbook, but this feature adds no new requirements to the Qri workflow.

We're most excited about what logbook allows us to do (collaboration!), and can't wait to ship features that will show the benefit of logbook. More fun soon!

🏗 Stability Improvements

As always, we're working on stabilizing & improving the way Qri works. We've this release we've focused on bringing stability to three major areas

  • filesystem integration (FSI)
  • remotes
  • diff

Note: Turns out this commit a60187f changed 0.9.1-alpha into 0.9.2-dev. Presumably we were using 0.9.1-alpha because we were intending on cutting 0.9.1 a few weeks ago, but that never happened, then this commit mistakingly skipped it. So we're doing this release as 0.9.1.

v0.9.0

4 years ago

0.9.0 makes Qri work like Git!

:open_file_folder: File System Integration (RFC0025)

This release brings a few new commands into qri. If you're a git user, these will look familiar:

init        initialize a dataset directory
checkout    checkout creates a linked directory and writes dataset files to that directory
status      Show status of working directory
restore     restore returns part or all of a dataset to a previous state

You can now interact with a versioned dataset in similar way you would a git repository. Now creating new versions is as simple as cding to a linked directory and typing qri save.

After a lot of thought & research, we've come to believe that using the filesystem as an interface is a great way to interact with versioned data. Git has been doing this for some time, and we've put thought & care into bringing the aspects of git that work well in this context.

Running the new qri init command will create an FSI-linked directory. A new datset will be created in your qri repo, and a hidden file called .qri-ref will be created in the folder you've initialized within. When you're linked directory you no longer need to type the name of a dataset to interact with it. qri get body peername/dataset_name is just qri get body when you're in an FSI-linked directory. You can see which datasets are linked when you qri list, it'll show the folder it's linked to.

Unlike git, qri doesn't track all files in a linked folder. Instead it only looks for specific filenames to map to dataset components:

component possible filename
body body.csv, body.json, body.xlsx, body.cbor
meta meta.json, meta.yaml
schema schema.json, schema.yaml

We'll be following up with support for transform and viz components shortly. It's still possible to create datasets that don't have a link to the filesystem, and indeed this is still the better way to go for large datasets.

File system integration opens up a whole bunch of opportunities for integration with other tools by dropping back to a common interface: files. Now you can use whatever software you'd like to edit dataset files, and by writing back to that folder with one of these name you're ready to version from the get go. command like qri status make it easy to keep track of where you are in your work, and qri restore makes it easy to "reset to head".

:desktop_computer: Qri Desktop

This is the first qri release that will be bundled into Qri Desktop, our brand new project for working with datasets. Qri desktop puts a face on qri, We'll be cutting a release of qri desktop shortly. Check it out!

:cloud: qri.cloud as a new default registry

This release also puts a bunch of work into the registry. We've made the job of a registry smaller, moving much of the behaviour of dataset syncing into remotes, which any peer can now become. At the same time, we're hard at work building qri.cloud, our new hosted service for dataset management and collaboration. If you're coming from a prior version of qri, run the following to swich to the new registry:

qri config set registry.location https://registry.qri.cloud

Now when you qri publish, it'll go to qri.cloud. Lots of exciting things coming for qri cloud in the next few months.

v0.9.0-alpha

4 years ago

Preparing for 0.9.0

We're not quite ready to put the seal-of-approval on 0.9.0, but it's been more than a few months since we cut a relase. This alpha-edition splits the difference while we prepare for a full & proper 0.9.0. The forthcoming big ticket item will be File System Integration (RFC0025), which dramatically simplifies the story around integrating with a version-controlled dataset.

So while this isn't a proper release, the changelog gives a feel for just how much work is included this go-round. More soon!

v0.8.2

4 years ago

This patch release polishes up a couple of UI issues around stdout usage, webapp fetching, and other misc bug fixes. Full description of changes is in the CHANGELOG.md file.

v0.8.1

5 years ago

This patch release fixes a small-but-critical bug that prevented qri setup from working. A few other fixes & bumps made it in, but the main goal was restoring qri setup so folks can, you know, set qri up.

v0.8.0

5 years ago

Version 0.8.0 is our best-effort to close out the first set of public features.

Automatic Updates (RFC0024)

Qri can now keep your data up to date for you. 0.8.0 overhauls qri update into a service that schedules & runs updates in the background on your computer. Qri runs datasets and maintains a log of changes.

schedule shell scripts

Scheduling datasets that have starlark transforms is the ideal workflow in terms of portability, but a new set of use cases open by adding the capacity to schedule & execute shell scripts within the same cron environment.

Starlark changes

We've made two major changes, and one small API-breaking change. Bad news first:

ds.set_body has different optional arguments

ds.set_body(csv_string, raw=True, data_format="csv") is now ds.set_body(csv_string, parse_as="csv"). We think think this makes more sense, and that the previous API was confusing enough that we needed to completely deprecate it. Any prior transform scripts that used raw or data_format arguments will need to update.

new beautiful soup-like HTML package

Our html package is difficult to use, and we plan to deprecate it in a future release. In it's place we've introduced bsoup, a new package that implements parts of the beautiful soup 4 api. It's much easier use, and will be familiar to anyone coming from the world of python.

the "ds" passed to a transform is now the previous dataset version

The ds that's passed to is now the existing dataset, awaiting transformation. For technical reasons, ds used to be a blank dataset. In this version we've addressed those issues, which makes examining the current state a dataset possible without any extra load_dataset work. This makes things like append-only datasets a one-liner:

def transform(ds,ctx):
  ds.set_body(ds.get_body().append(["new row"]))

CLI uses '$PAGER' on POSIX systems

Lots of Qri output is, well, long, so we now check for the presence of the $PAGER environment variable and use it to show "scrolling" data where appropriate. While we're at it we've cleaned up output to make things a little more readable. Windows should be unaffected by this change. If you ever want to avoid pagination, I find the easiest way to do so is by piping to cat. For example:

$ qri ls | cat

Happy paging!

Switch to go modules

Our project has now switched entirely to using go modules. In the process we've deprecated gx, the distributed package manager we formerly used to fetch qri dependencies. This should dramatically simplify the process of building Qri from source by bringing dependency management into alignment with idiomatic go practices.

Dataset Strict mode

dataset.structure has a new boolean field: strict. If strict is true, a dataset must pass validation against the specified schema in order to save. When a dataset Dataset is in strict mode, Qri can assume that all data in the body is valid. Being able to make this assumption will allow us to provide additional functionality and performance speedups in the future. If your dataset has no errors, be sure to set strict to true.

Full description of changes are in CHANGELOG.md

v0.7.3

5 years ago

This release is all about 3 Rs:

  • Rendering
  • Remotes
  • load_dataset

This release we've focused on improving dataset visualiing, setting the stage with better defaults and a cleaner API for creating custom viz. We think expressing dataset vizualiations as self-contained html makes Qri datasets an order of magnitude more useful, and can't wait for you to try it.

Along with the usual bug fixes, a few nice bonuses have landed, like supplying multiple --file args to qri save to combine dataset input files, and qri get rendered to show rendered viz. Anyway, on to the big stuff:

Default Rendering (RFC0011)

Whenever you create a new dataset version, Qri will now create a default viz component if you don't provide one. Unless run with --no-render, Qri will now execute that template, and store the result in a file called index.html in your dataset. This makes your dataset much more fun when viewed directly on the d.web, which is outside of Qri entirely.

This is because IPFS HTTP gateways are sensitive to index.html. When you use qri to make a dataset, your dataset comes with a self-contained visualization that others can see without downloading Qri at all.

We think this dramatically increases the usefulness of a dataset, and increases the chances that others will want to share & disseminate your work by making your dataset a more-complete offering in the data value chain. These embedded default visualizations drop the time it takes to create a readable dataset to one step.

That being said, we've intentionally made the default visualization rather bland. The reason for this is twofold. First, to keep the file size of the index.html small (less than 1KB). Second, we want you to customize it. We'll refine the default template over time, but we hope you'll use viz to tell a story with your data.

Users may understandably want to disable default vizualizations. To achieve this qri save and qri update have a new flag: --no-render. No render will prevent the execution of any viz template. This will save ~1KB per version, at the cost of usability.

Overhauled HTML Template API (RFC0011)

Keeping with the theme of better viz, we've also taken time to overhaul our template API. Given that this is a public API, we took some time to think about what it would mean to try to render Qri templates outside of our go implementation. While no code does this today, we wanted to make sure it would be easier in the future, so we took steps to define an API that generally avoids use of the go templating ., instead presenting a ds object with json-case accessors. Taking things like this:

<h1>{{ .Meta.Title }}</h1>

to this:

<h1>{{ ds.meta.title }}</h1>

This change brings the template syntax closer to the way we work with datasets in other places (eg: in dataset.yaml files and starlark transform scripts), which should help cut down on the mental overhead of working with a dataset in all these locations. We think this up-front work on our template API will make it easier to start writing custom templates. We don't have docs up yet, but the RFC reference section outlines the API in detail.

Experimental Remote Mode (RFC0022)

The registry is nice and all, but we need more ways to push data around. In this release we're launching a new expriment called "remotes" that start into this work. Remotes act as a way for any user of Qri to setup their own server that keeps datasets alive, providing availability and ownership over data within a set of nodes that they control.

Currently we consider this feature "advanced only" as it comes with a number of warnings and some special setup configuration. For more info, check the RFC, and if you're interested in running a remote, hop on discord and say "hey I want to run a remote".

Starlark load_dataset (RFC0023)

We've made a breaking API change in Starlark that deprecates qri.load_dataset_body, and introduce a new global function: load_dataset. This new API makes it clear that load_dataset both loads the dataset and declares it as a dependency of this script. This is an important step toward making datasets a first-class citizen in the qri ecosystem. Here's an example of the new syntax:

load("http.star", "http")

# load a dataset into a variable named "fhv"
fhv = load_dataset("b5/nyc_for_hire_vehicles")

def download(ctx):
  # use the fhv dataset to inform an http request
  vins = ["%s,%s" % (entry['vin'], entry['model_yearl']), for entry in fhv.body()]

  res = http.post("https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/", form_body={
    'format': 'JSON', 
    'DATA': vins.join(";")
  })

  return res.json()

def transform(ds, ctx):
  ds.set_body(ctx.download)

Users who were previously using qri.load_dataset_body will need to update their scripts to use the new syntax. The easiest way to do that is by adding a new version to your dataset history with the updated script:

$ qri get transform.script me/dataset > transform.star
# make updates to transform.star file & save
$ qri save --file transform.script me/dataset

Three easy steps, and your dataset log tells the story of the upgrade.

Full notes in the changelog