Sample Sheet Versions Save

Parse Illumina sample sheets with Python

v0.6.0

6 years ago

You can now do:

content = SampleSheet(filename).to_json()

And on the CLI:

❯ sample-sheet to_json paired-end-single-index.csv | jq
{
  "Header": {
    "IEM1FileVersion": "4",
    "Investigator Name": "jdoe",
    "Experiment Name": "exp001",
    "Date": "11/16/2017",
    "Workflow": "SureSelectXT",
    "Application": "NextSeq FASTQ Only",
    "Assay": "SureSelectXT",
    "Description": "A description of this flow cell",
    "Chemistry": "Default"
  },
  "Reads": [
    151,
    151
  ],
  "Settings": {
    "CreateFastqForIndexReads": "1",
    "BarcodeMismatches": "2"
  },
  "Data": [
    {
      "Sample_Project": "exp001",
      "Description": "0.5x treatment",
      "Reference_Name": "mm10",
      "Sample_Name": "1823A-tissue",
      "index": "GAATCTGA",
      "Library_ID": "2017-01-20",
      "Read_Structure": "151T8B151T",
      "Sample_ID": "1823A",
      "Target_Set": "Intervals-001"
    },
    ...
  ]
}

v0.5.0

6 years ago

We now adhere to the entire Illumina specification for sample sheets and support many short-read analysis platform sample sheet variants including NextSeq, TrueSeq, and NovaSeq.

Example of adding a user-defined section thanks to help from @slagelwa:

from sample_sheet import SampleSheet

sample_sheet = SampleSheet()
sample_sheet.add_section('Manifests')

# Add a key value pair!
sample_sheet.Manifests.Key1 = "value1"

The .write() method will write each section out in the order they are defined between the [Reads] and [Settings] sections.

v0.4.0

6 years ago

The validation criteria for sample collisions in the same sample sheet has been adjusted so that you can theoretically have the same Sample_ID, Library_ID, index, and index2 in the sample sheet as long as they appear in a different Lane only.

As requested in #33 by @reisingerf.

v0.3.0

6 years ago

Round-trip reading, modifying, and writing!

Sample sheets will always be written in a deterministc manner which will help with hashing changes.

You can perform a round-trip read, modify, write (example below) or create a sample sheet de novo by instantiating SampleSheet and Sample classes and modifying them directly as shown in the README

infile = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
sample_sheet = SampleSheet(infile)

with open('test.csv', 'w') as handle:
    sample_sheet.write(handle)

❯ head <( https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv )
[Header],,,,,,,,
IEM1FileVersion,4,,,,,,,
Investigator Name,jdoe,,,,,,,
Experiment Name,exp001,,,,,,,
Date,11/16/2017,,,,,,,
Workflow,SureSelectXT,,,,,,,
Application,NextSeq FASTQ Only,,,,,,,
Assay,SureSelectXT,,,,,,,
Description,A description of this flow cell,,,,,,,
Chemistry,Default,,,,,,,

❯ head test.csv
[Header],,,,,,,,
IEM1FileVersion,4,,,,,,,
Investigator Name,jdoe,,,,,,,
Experiment Name,exp001,,,,,,,
Date,11/16/2017,,,,,,,
Workflow,SureSelectXT,,,,,,,
Application,NextSeq FASTQ Only,,,,,,,
Assay,SureSelectXT,,,,,,,
Description,A description of this flow cell,,,,,,,
Chemistry,Default,,,,,,,

Test Coverage CI

Code test coverage is now calculated on all branches and PRs.

Goal for this project will be sustaining at least 95% coverage with a target of 100%.

https://codecov.io/gh/clintval/sample-sheet

v0.2.0

6 years ago

v0.1.0

6 years ago

100% test coverage and CI integration :

❯ ./sample-sheet/run-tests
Name                            Stmts   Miss  Cover
---------------------------------------------------
sample_sheet/__init__.py            1      0   100%
sample_sheet/_sample_sheet.py     280      0   100%
---------------------------------------------------
TOTAL                             281      0   100%

OK!  58 tests, 0 failures, 0 errors in 0.0s

Print ASCII or HTML (IPython auto-detected) summaries of samples:

>>> sample_sheet.experimental_design
"""
| sample_id   | sample_name   | library_id   | description      |
|:------------|:--------------|:-------------|:-----------------|
| 1823A       | 1823A-tissue  | 2017-01-20   | 0.5x treatment   |
| 1823B       | 1823B-tissue  | 2017-01-20   | 0.5x treatment   |

Get rich unicode CLI summaries of sample sheets:

❯ sample-sheet-summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ iem1_file_version │ 4                               │
│ investigator_name │ jdoe                            │
│ experiment_name   │ exp001                          │
│ date              │ 11/16/2017                      │
│ workflow          │ SureSelectXT                    │
│ application       │ NextSeq FASTQ Only              │
│ assay             │ SureSelectXT                    │
│ description       │ A description of this flow cell │
│ chemistry         │ Default                         │
└───────────────────┴─────────────────────────────────┘
...

v0.0.2

6 years ago

README in ReStructered format, PyPi still garbles it.
extras_require now works with

$ pip install '.[test]'

v0.0.1

6 years ago

Features:

Supports [Header], [Settings], [Reads], and [Data] sections of Illumina sample sheets
Uses smart_open to open a file on S3, HDFS, WebHDFS, HTTP as well as local (compressed or not)
If a Read_Structure column can be inferred, then the structure is promoted to class ReadStructure.

Known bugs:

Script to output sample-sheet in terminal doesn't quite work yet.
experimental_design() was irreparably broken
Interface for barcode and library parameter generating methods were left behind.