Awesome Comma-Separated Values (CSV) - What's Next? - Frequently Asked Questions (F.A.Q.s) - Libraries & Tools
A collection about the comma-separated values (CSV) world for rich structured data in (plain) text
Formats, Formats, Formats
CSV RFC 4180 "Strict" • CSV v1.0 "The Right Way" • CSV <3 Numerics • CSV <3 JSON • CSV <3 YAML • CSV v1.1 "Modern"
ID,Name,Code,Area,Pop
ca,Canada,CAN,9984670,34278406
us,United States,USA,9629091,314167157
mx,México [Mexico],MEX,1972550,112322757
...
People and (simplistic) CSV parser writers (and fanatics) use it to claim that it is the ultimate (and only) CSV format and use it to end all discussions if the code breaks when you add a space before a quote or mixed quotes and so on. It's way too simplistic (no spaces, no comments, no blank lines, no semicolon for separator, no modern two-byte characters, and so on).
Next time someone bring ups:
Have you read the [strict] RFC 4180 [CSV format memo]? The quoting rules are in there.
Why not ask back: Have you read it? :-) Let's start at the beginning (together):
This memo provides information for the internet community. It does not specify an internet standard of any kind. It does not specify an internet standard of any kind. It does not specify an internet standard of any kind.
A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you CANNOT use blank lines. Why? Blank lines are "ambiguous". Might be a blank record or a blank line.
Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" practical common sense versions.
A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you CANNOT use comments. Why? The original CSV format was intended just for machine reading and not for human mere mortals.
Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions.
48°51'24"N
?A: No. No. No.
In the "simplistic" CSV RFC 4180 "Strict" format you CANNOT use "literal"
double quotes ("
) e.g. 48°51'24"N
- you MUST double quote
the geo coordinates and double up the double quote (""
)
e.g. "48°51'24""N"
. Example:
New York City,"40°42'46""N","74°00'21""W"
Paris,"48°51'24""N","2°21'03""E"
Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions. Example:
New York City, 40°42'46"N, 74°00'21"W
Paris, 48°51'24"N, 2°21'03"E
\"
for ""
)?A: No. No. No.
In the "simplistic" CSV RFC 4180 "Strict" format you MUST double up
the double quote (""
) inside double quotes. Period. Example:
1,"Hamlet says, ""Seems,"" madam! Nay it is; I know not ""seems."""
Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions. Use as you like it. Example:
1, "Hamlet says, \"Seems,\" madam! Nay it is; I know not \"seems.\""
'...'
or double quotes "..."
)?A: No. No. No.
In the "simplistic" CSV RFC 4180 "Strict" format you MUST always use
double quotes (""
) and double up the double quote inside double quotes. Period.
Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions. Use as you like it. Example:
1, "Hamlet says, 'Seems,' madam! Nay it is; I know not 'seems.'"
2, 'Hamlet says, "Seems," madam! Nay it is; I know not "seems."'
Q: What's CSV the right way? What best practices can I use?
Use best practices out-of-the-box with zero-configuration.
Do you know how to skip blank lines or how to add #
single-line comments?
Or how to trim leading and trailing spaces? No worries. It's turned on by default.
Yes, you can. Use
#######
# try with some comments
# and blank lines even before header (first row)
Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
instead of strict "classic" (no blank lines, no comments, no leading and trailing spaces, etc.):
Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
Or use the ARFF (attribute-relation file format)-like alternative style
with %
for comments and @
-directives
for "meta data" in the header (before any records):
%%%%%%%%%%%%%%%%%%
% try with some comments
% and blank lines even before @-directives in header
@RELATION Beer
@ATTRIBUTE Brewery
@ATTRIBUTE City
@ATTRIBUTE Name
@ATTRIBUTE Abv
@DATA
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
Or use the ARFF (attribute-relation file format)-like alternative style with @
-directives
inside comments (for easier backwards compatibility with old readers)
for "meta data" in the header (before any records):
##########################
# try with some comments
# and blank lines even before @-directives in header
#
# @RELATION Beer
#
# @ATTRIBUTE Brewery
# @ATTRIBUTE City
# @ATTRIBUTE Name
# @ATTRIBUTE Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
Or use the CSV meta data in CSV style:
##########################
# try with some comments
# and blank lines even header
Col, Name
1, Brewery
2, City
3, Name
4, Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
CSV ❤ Numerics Format - Comma-Separated Values (CSV) Line-by-Line Records with Auto-Converted Numerics (Float Numbers) Encoding Rules - A Modern (Simple) Tabular Data Format incl. Numbers, Comments and More
CSV ❤ JSON Format - Comma-Separated Values (CSV) Line-by-Line Records with JSON Encoding Rules - A Modern (Simple) Tabular Data Format incl. Arrays, Numbers, Booleans, Nulls, Nested Structures, Comments and More
Examples:
# "Vanilla" CSV <3 JSON
1,"John","12 Totem Rd. Aspen",true
2,"Bob",null,false
3,"Sue","Bigsby, 345 Carnival, WA 23009",false
or
# "Vanilla" CSV <3 JSON (Pretty Printed)
1, "John", "12 Totem Rd. Aspen", true
2, "Bob", null, false
3, "Sue", "Bigsby, 345 Carnival, WA 23009", false
CSV ❤ YAML Format - Comma-Separated Values (CSV) Line-by-Line Records with YAML Encoding Rules - A Modern (Simple) Tabular Data Format incl. Arrays, Numbers, Booleans, Nulls, Nested Structures, Comments and More
Examples:
# "Vanilla" CSV <3 YAML
1,John,12 Totem Rd. Aspen,true
2,Bob,null,false
3,Sue,"Bigsby, 345 Carnival, WA 23009",false
or
# "Vanilla" CSV <3 YAML (Pretty Printed)
1, John, 12 Totem Rd. Aspen, true
2, Bob, null, false
3, Sue, "Bigsby, 345 Carnival, WA 23009", false
#####################
# North America
# area (in sq km), pop(ulation)
ca, Canada, CAN, 9 984 670 km², 34 278 406
us, United States, USA, 9 629 091 km², 314 167 157
mx, México [Mexico], MEX, 1 972 550 km², 112 322 757
...
A: Yes, of course. A blank line is just a blank line. Use freely to format (beautify) your data.
A: Yes, of course. Use #
for comments. See the example above.
Why use commas, commas, commas?
Space • Tab • Field Separator (FS) • Other
Did you know? In the English (or German) language the most popular word delimiter / separator is - surprise, surprise - space. You're looking at spaces in action right now and right here ;-)
Why not use spaces?
United States
one value or two? See?"United States"
.By using commas you do NOT need to quote spaces in values, that is, use
us, United States, USA
instead of us "United States" USA
.
In theory the tab (\t
) separator is perfect. Values never use tabs, don't they? So why hasn't the tab separator taken off?
In practice tab separators are invisible or look like spaces and often you cannot tell if a space is a tab or not.
Thus, tab works great only and only (like space) if your values do NOT use spaces and you treat a tab like a space.
Again in theory the untypeable (unprintable) field separator (ASCII Code 31) is perfect. Values never use ASCII field separators.
In practice if the field separator is invisible and unprintable how do you type it on your keyboard?
Remember: The point of comma-separated values (CSV) is an easy-to-write and easy-to-read format for humans first (not for machines).
|
);
):
)·
)Wikipedia
More
Frictionless Data • CSV on the Web • CSV 1.1 / CSV Next
lightweight standards and tooling to make it effortless to get, share, and validate data
by Open Knowledge Foundation (OKFN)
web: frictionlessdata.io, github: frictionlessdata
by World Wide Web Consortium (W3C)
github: w3c/csvw
CSV Reader Library - (Source) - modern alternative to the broken ruby csv standard library
Honey Format Library / Tool - (Source), (Doc) by Jacob Burenstam -- Makes working with CSVs as smooth as honey. Proper objects for CSV headers and rows, convert column values, filter columns and rows, small(-ish) perfomance overhead, no dependencies other than Ruby standard library.
csv_string = <<~CSV
email,name,born,country
[email protected],John,2000-03-03,SE
[email protected],Jane,1970-03-03,SE
[email protected],Chris,1980-03-03,DK
CSV
# Print all rows where born is before 1990 and country code is 'SE'
csv = HoneyFormat::CSV.new(csv_string, type_map: { born: :date })
csv_string = csv.to_csv(columns: %i[born country]) do |row|
row.country == 'SE' && row.born < Date.new(1990, 1, 1)
end
puts csv_string
A conference for data makers everywhere
web: csvconf.com, github: csvconf, twitter: CSVConference
A curated list of awesome lists.
License
The awesome list is dedicated to the public domain. Use it as you please with no restrictions whatsoever.
Questions? Comments?
Post them to the wwwmake forum. Thanks!