Swift Url Versions Save

A new URL type for Swift

0.4.1

1 year ago

TSan Workaround

This release includes a workaround for a bug in TSan. PR #168 Issue #166 (thanks to @shadowfacts for the report)

TSan's internal bookkeeping seems to be corrupted if you use pass around an empty struct as an inout parameter. This pattern is sometimes used by generic algorithms (for example, the standard library's SystemRandomNumberGenerator is an empty struct), and is used internally by WebURL. There is no actual data race, but the corruption of TSan's bookkeeping data can lead to spurious reports of data races or even null-pointer dereferences within the TSan runtime.

To work around this, we add an unused field to these empty structs in debug builds.

Related bug reports: https://github.com/apple/swift/issues/61073 https://github.com/apple/swift/issues/61244 and https://github.com/apple/swift/issues/56405

Improvements to Testing

Additionally, some tests to Foundation extensions have been refactored, and the "Swifter" HTTP server dependency that was used by some tests has been dropped.

0.4.0

1 year ago

This is a big one!

  • WebURL now supports Internationalised Domain Names (IDNs).
  • The URL host parser is now exposed as API, so you can parse hostnames like URLs do.
  • There is a new Domain type, which supports rich processing of domains/IDNs.

IDN support was the missing piece. Now it is done, we can say:

πŸŽ‰ WebURL fully conforms to WHATWG URL Standard πŸŽ‰

Now, let's briefly go over each of those points:

🌐 Internationalised Domain Names

WebURL now supports Internationalised Domain Names (IDNs):

import WebURL

WebURL("http://δΈ­ε›½η§»εŠ¨.δΈ­ε›½")
// βœ… "http://xn--fiq02ib9d179b.xn--fiqs8s/"

WebURL("https://πŸ›.example.com/")
// βœ… "https://xn--878h.example.com/"

This may look strange if you are unfamiliar with IDNs. In order to be compatible with existing internet infrastructure, Unicode text in domains needs special compatibility processing, resulting in an encoded string with the distinctive "xn--" prefix. This processing is called IDNA. If somebody wants to register the domain "δΈ­ε›½η§»εŠ¨.δΈ­ε›½", they instead register "xn--fiq02ib9d179b.xn--fiqs8s", and behind the scenes, everything works just like it always did with plain, non-Unicode domains -- importantly, we don't need internet routing infrastructure or applications to process hostnames differently to how they normally would. This encoded version is not very helpful to humans, but browsers and applications can detect these domains and present them in Unicode (we have APIs for that; more info below).

For more information about IDNs see IDN World Report.

Browsers are making an increased effort this year to align their own IDNA implementations (Safari/WebKit already conforms), and it has been announced that Apple's next major operating system releases will include support in Foundation URL. Now WebURL also implements this part of the URL Standard, it is available now, and it fully backwards-deploys. It's important that URLs work consistently for everybody, and WebURL can help with that.

What's more - since this processing happens in the URL type, it works with our existing Foundation interop:

import WebURL
import Foundation
import WebURLFoundationExtras

let (data, _) = try await URLSession.shared.data(for: WebURL("http://全国温泉ガむド.jp")!)
// βœ… Works

let convertedToURL = URL(WebURL("http://全国温泉ガむド.jp")!)!
// ... continue processing 'convertedToURL' as you normally would

Developers have been asking for better IDN support across the industry for years - at this stage of adoption, most IDNs are in China, so Chinese developers in particular have been wanting to work with these kinds of URLs. I'm especially pleased that WebURL is now able to offer it to any Swift application.

πŸ“– Host Parsing API

IDN support as the standard requires is great and all, but it isn't enough.

URLs are designed to be universal - infinitely customisable. There are some "special" schemes which the standard knows about, such as http:, and while their hosts have semantic meaning (they are network addresses, hence we should use IDNA, detect IPv4 addresses, etc), generally, for other schemes, the host is just an opaque string and is not interpreted.

That's the correct model, but frequently we are processing URLs which are very HTTP-like, and we would like to support the same network addresses, in the same way, as an HTTP URL. For instance, suppose we were writing an application to handle ssh: URLs - the standard would only parse IPv6 addresses out for us, and everything else would just be an opaque string.

WebURL("ssh://karl@somehost/")!.host
// 😐 .opaque, "somehost"

WebURL("ssh://karl@abc.Ψ£Ω‡Ω„Ψ§.com/")!.host
// πŸ˜• .opaque, "abc.%D8%A3%D9%87%D9%84%D8%A7.com"

WebURL("ssh://[email protected]/")!.host
// 🀨 .opaque, "192.168.0.1"

Request libraries generally need to write their own parsers to handle this, but it is difficult to match the host parser for HTTP URLs exactly... unless, of course, you are the URL host parser πŸ€”...

So with 0.4.0, WebURL's Host type exposes the URL host parser directly to your applications. Not only is this great for processing URLs of any scheme, it's also useful for hostnames provided via command-line interfaces or configuration files. Being able to guarantee the host is interpreted the same way as it would be in an http: URL is a very useful property, just by itself.

WebURL.Host("EXAMPLE.com", scheme: "http")
// 😍 .domain, Domain { "example.com" }

WebURL.Host("abc.Ψ£Ω‡Ω„Ψ§.com", scheme: "http")
// 🀩 .domain, Domain { "abc.xn--igbi0gl.com" }

WebURL.Host("192.168.0.1", scheme: "http")
// πŸ₯³ .ipv4Address, IPv4Address { 192.168.0.1 }

πŸ¦† Domain API

Exposing the host parser is great and all, but it also isn't enough.

Previously, we only had types for IPv4 and IPv6 addresses, and domains were represented as Strings. Now, domains have their own type - WebURL.Domain, which is guaranteed to contain a validated, normalised domain from the URL host parser, and can be a useful place to house APIs which operate on domains.

WebURL.Domain("example.com")  // βœ… "example.com"
WebURL.Domain("localhost")    // βœ… "localhost"
WebURL.Domain("api.Ψ£Ω‡Ω„Ψ§.com")  // βœ… "api.xn--igbi0gl.com"
WebURL.Domain("xn--caf-dma")  // βœ… "xn--caf-dma" ("cafΓ©")

WebURL.Domain("in valid")     // βœ… nil (spaces are not allowed)
WebURL.Domain("xn--cafe-yvc") // βœ… nil (invalid IDN)
WebURL.Domain("192.168.0.1")  // βœ… nil (not a domain)

The most important API right now is render, which builds a result using an encapsulated algorithm. There is opportunity for renderers to produce any kind of result - for example, they might perform spoof-checking to guard against confusable text, or they might use a database to shorten domains to their most important section, or they might have special formatting for particular domains. You can create a renderer by conforming to the WebURL.Domain.Renderer protocol.

WebURL comes with an uncheckedUnicodeString renderer, so you can recover the Unicode form of a domain. This renderer does not perform any spoof-checking, so is not recommended for use in UI.

let domain = WebURL.Domain("xn--fiq02ib9d179b.xn--fiqs8s")!
domain.render(.uncheckedUnicodeString)
// βœ… "δΈ­ε›½η§»εŠ¨.δΈ­ε›½"

And with that, I'm happy with WebURL's host story. It provides rich, detailed information about the hosts defined in the URL Standard and gives you the means to easily and robustly process them. Please try it out and leave feedback!

🎁 Bonus: Spoof-checked renderer prototype

It is important that applications use spoof checking when displaying domains in Unicode form. We have a proof-of-concept renderer which ports much of Chromium's IDN spoof-checking logic. It works on my Mac, but deploying it can be a pain because it depends on the ICU library for its implementation of UAX39.

// Non-IDNs.
WebURL.Domain("paypal.com")?.render(.checkedUnicodeString) // βœ… "paypal.com"
WebURL.Domain("apple.com")?.render(.checkedUnicodeString)  // βœ… "apple.com"

// IDNs.
WebURL.Domain("a.Ψ£Ω‡Ω„Ψ§.com")?.render(.checkedUnicodeString)   // βœ… "a.Ψ£Ω‡Ω„Ψ§.com"
WebURL.Domain("δ½ ε₯½δ½ ε₯½")?.render(.checkedUnicodeString)     // βœ… "δ½ ε₯½δ½ ε₯½"

// Spoofs.
WebURL.Domain("Ρ€Π°Ξ³pal.com")?.render(.checkedUnicodeString) // βœ… "xn--pal-vxc83d5c.com"
WebURL.Domain("Π°pple.com")?.render(.checkedUnicodeString)  // βœ… "xn--pple-43d.com"

It would be great to turn this in to a maintained, easily-deployable package. I'm too busy right now, so it remains a prototype, but maybe one day? Or if anybody else would like to get involved, they can use it as a starting point.

Bugfixes

  • Fixed a crash when appending an empty array of form params (#140). Thanks to @adam-fowler for the report. Sorry it took so long to get in to a release.

0.3.1

2 years ago

What's Changed

πŸ”— Foundation Integration

0.3.0 brought Foundation-to-WebURL conversion, and this release adds conversion in the opposite direction (WebURL-to-Foundation). This is a particularly important feature for developers on Apple platforms, as it means you can now use WebURL to make requests using URLSession! We now have full, bidirectional interop with Foundation's URL , which is a huge milestone and a big step towards v1.0.πŸ₯³

WebURLFoundationExtras now adds a number of extensions to types such as URLRequest and URLSession to make that super easy:

import Foundation
import WebURL
import WebURLFoundationExtras

// ℹ️ Make URLSession requests using WebURL.
func makeRequest(to url: WebURL) -> URLSessionDataTask {
  return URLSession.shared.dataTask(with: url) {
    data, response, error in
    // ...
  }
}

// ℹ️ Also supports Swift concurrency.
func processData(from url: WebURL) async throws {
  let (data, _) = try await URLSession.shared.data(from: url)
  // ...
}

// ℹ️ For libraries: move to WebURL without breaking
// compatibility with clients using Foundation's URL.
public func processURL(_ url: Foundation.URL) throws {
  guard let webURL = WebURL(url) else {
    throw InvalidURLError()
  }
  // Internal code uses WebURL...
}

When you make a request using WebURL, you will benefit from its modern, web-compatible parser, which matches modern browsers and libraries in other languages:

// Using WebURL: Sends a request to "example.com". 
// Chrome, Safari, Firefox, Go, Python, NodeJS, Rust agree. βœ…
print( try String(contentsOf: WebURL("http://[email protected]:[email protected]/")!) )

// Using Foundation.URL: Sends a request to "evil.com"! 😡
print( try String(contentsOf: URL(string: "http://[email protected]:[email protected]/")!) )

Note that this only applies to the initial request; HTTP redirects continue to be processed by URLSession (it is not possible to override it universally), and so are not always web-compatible. As an alternative on non-Apple platforms, our fork of async-http-client uses WebURL for all of its internal URL processing, so it also provides web-compatible redirect handling.

For more information about why WebURL is a great choice even for applications and libraries using Foundation, and a discussion about how to safely work with multiple URL standards, we highly recommend reading: Using WebURL with Foundation.

URLSession extensions are only available on Apple platforms right now, due to a bug in swift-corelibs-foundation. I opened a PR to fix it, and once merged, we'll be able to make these extensions available to all platforms.

⚑️ Performance improvements

I say it every time, and it's true every time πŸ˜…. For this release, I noticed that, due to a quirk with how ManagedBuffer is implemented in the standard library, every access to the URL's header data required dynamic exclusivity enforcement. But that shouldn't be necessary - the URL storage uses COW to enforce non-local exclusivity, and local exclusivity can be enforced by the compiler if we wrap the ManagedBuffer in a struct with reference semantics. So that's what I did.

The result is ~5% faster parsing and 10-20% better performance when getting/setting URL components. For collection views like pathComponents, these enforcement checks affect basically every operation and amount to a consistent overhead that we're now able to eliminate.

benchmark                                          column     results/0_3_0 results/0_3_1      %
------------------------------------------------------------------------------------------------
Constructor.HTTP.AverageURLs                       time            23909.00      22665.00   5.20
Constructor.HTTP.AverageURLs.filtered              time            37826.50      36066.00   4.65
Constructor.HTTP.IPv4                              time            12205.00      11627.00   4.74
Constructor.HTTP.IPv4.filtered                     time            19164.00      17819.00   7.02
Constructor.HTTP.IPv6                              time            13677.00      13086.00   4.32
Constructor.HTTP.IPv6.filtered                     time            17614.00      16577.00   5.89
...
ComponentSetters.Unique.Username                   time              418.00        365.00  12.68
ComponentSetters.Unique.Username.PercentEncoding   time              767.00        632.00  17.60
ComponentSetters.Unique.Username.Long              time              636.00        527.00  17.14
...
ComponentSetters.Unique.Path.Simple                time             2525.00       2247.00  11.01
...
PathComponents.Iteration.Small.Forwards            time              705.00        602.00  14.61
PathComponents.Iteration.Small.Reverse             time              718.00        619.00  13.79
PathComponents.Iteration.Long.Reverse              time             3137.00       2752.00  12.27
PathComponents.Append.Single                       time             1362.00       1242.00   8.81

🌍 Standard Update

This release also implements a recent change to the WHATWG URL Standard, which forbids C0 Control characters and U+007F delete from appearing in domains. https://github.com/whatwg/url/pull/685

Full Changelog: https://github.com/karwa/swift-url/compare/0.3.0...0.3.1

0.3.0

2 years ago

What's Changed

πŸ“š DocC-based Documentation!

All of the documentation has been rewritten and reorganised to take advantage of the new DocC documentation engine. It's a really huge improvement, so do please check it out. And if you find anything which you think could be improved, don't hesitate to file an issue or even submit a PR πŸ™‚

πŸ”— Foundation Integration

WebURL 0.3.0 includes the WebURLFoundationExtras module, which comes with a way to convert Foundation URL objects to WebURLs. That means your libraries can use WebURL for their internal processing, while continuing to support clients who provide data using Foundation's types.

The async-http-client port is an example of this. Even though it uses WebURL for its internal processing, it is still possible to create requests using Foundation.URL using an extension. This means it gets to benefit from modern, web-compatible URL parsing (for example, when resolving HTTP redirects), and WebURL's simpler, more efficient API, without breaking compatibility.

New with this release, the async-http-client port offers a build configuration which omits all Foundation dependencies. By doing so, we've measured binary size improvements of up to 16% on a statically-linked & stripped executable, while keeping the full functionality of AHC such as streaming, compression, and HTTP/2. We expect that size improvement could improve even further with Swift 5.6, as the standard library will no longer need to link all of ICU's Unicode data.

⚑️+πŸ“ Performance and Code Size Improvements

WebURL keeps getting faster, and leaner, but not meaner πŸ˜‡. Compared to 0.2.0, WebURL 0.3.0 offers some incredible performance enhancements. URL parsing time has been reduced by almost 1/3, our fantastic in-place component setters can be almost 40% faster, and common operations like iterating path components can now be performed in just half the time.

And that's not even the best part. All of these improvements come in a package which is 20% smaller!

Title                              Section             Old             New  Percent
WebURLBenchmark                     __text:        1715105         1376177   -19.8%

(Measured on an Intel MBP)

πŸ˜” System.framework Integration Disabled on iOS For Now

And with all that great news, there had to be one... less great thing. Unfortunately, the last few releases of Xcode have shipped with a broken version of Apple's System.framework for iOS, which broke the build on that platform. Strangely, it is only iOS - macOS, tvOS, and even watchOS all work fine. We've disabled that integration on iOS for now, but we'll keep an eye on things and re-enable it once the issue is fixed (FB9832953).

In the mean time, you can still use swift-system, the open-source distribution of System.framework, on all platforms, including iOS.

Full Changelog: https://github.com/karwa/swift-url/compare/0.2.0...0.3.0

0.2.0

2 years ago

What's Changed

In addition to the changes listed below, the guide has been entirely rewritten, and now does a better job of explaining the WebURL API and object model, and the benefits it can bring to your application/library. The goal is to help you become as comfortable using WebURL as you are using Foundation's URL. It took a lot of work, and I'd really recommend giving it a read. Even if you're not using WebURL yet, the chances are that you'll learn a thing or two about how Foundation's URL actually works.

URL standard

API

  • Support for creating file URLs from file paths, and file paths from file URLs.
  • Added WebURLSystemExtras module which integrates with both swift-system and Apple's System.framework.
  • LazilyPercentDecoded<Collection> is now bidirectional when its source collection is.
  • WebURL.cannotBeABase has been renamed to WebURL.hasOpaquePath, following an update in the standard. (https://github.com/whatwg/url/pull/655) (reported by us)
  • Percent-encoding and -decoding APIs have been reworked to take advantage of static member syntax (SE-0299). A source-compatible fallback is in place for pre-5.5 compilers. This is the reason for duplicate functions appearing in the docs, one with a EncodeSet._Member argument. We're looking at moving to Swift-DocC which will hopefully fix this. The previous API has been back-ported with deprecation notices wherever possible, so the compiler should guide when it comes to updating your applications.
  • It is now possible to percent-decode a string as an array of bytes using the .percentDecodedBytesArray() function. This is useful for dealing with binary data and non-UTF8 strings.
  • The .pathComponents view now assumes inserted data is not percent-encoded, which preserves values exactly if they happen to contain strings which coincidentally look like percent-encoding.
  • A .pathComponents[raw: Index] subscript has been added, which returns a path component exactly as it appears in the URL string, including its percent-encoding.
  • A .pathComponent.replaceSubrange(_:withPercentEncodedComponents:) function has been added for inserting pre-encoded path components.
  • Added Sendable conformance to WebURL, WebURL.Host, IP addresses, origins, and the various wrapper views.
  • The .serialized property has been combined with .serializedExcludingFragment in to a single function: .serialized(excludingFragment: Bool = false).

Implementation

  • Better performance, especially for component setters
  • Component setters are now benchmarked
  • Support for fuzzing the parser
  • Added UnsafeBoundsCheckedBufferPointer which allows us to keep bounds-checking without sacrificing performance
  • Simplified internal storage types, reducing code size
  • Better percent-encoding performance

Full Changelog: https://github.com/karwa/swift-url/compare/0.1.0...0.2.0

What's coming next

The major goal for 0.3.0 is compatibility with Foundation's URL. At the very least, that is going to include support for creating a URL from a WebURL and vice versa, but we may need additional APIs for a truly great developer experience.

0.1.0

3 years ago

This is the initial release of WebURL! πŸŽ‰

Take a look at the Getting Started guide in the repo, and the full documentation available here!