Tikaondotnet Versions Save

Use the Java Tika text extraction library on the .NET platform

v1.17.1

6 years ago

1.17.1

  • Add new overloads to the TextExtractor.Extract allowing users to provide their own extraction result assemblers. Example:
public class CustomResult
{
    public string Text { get; set; }
    public IDictionary<string, string[]> Metadata { get; set; }
}

public static CustomResult CreateCustomResult(string text, Metadata metadata)
{
    var metaDataDictionary = metadata.names().ToDictionary(name => name, metadata.getValues);

    return new CustomResult
    {
        Metadata = metaDataDictionary,
        Text = text,
    };
}

[Test]
public void should_extract_author_list_from_pdf()
{
    var textExtractionResult = new TextExtractor().Extract("file_with_authors.pdf", CreateCustomResult);

    textExtractionResult.Metadata["meta:author"].Should().ContainInOrder("Fred Jones, M. D.", "Donald Evans D. M.");
}

v1.17

6 years ago

v1.16.0

6 years ago
  • Tika updated to 1.16. Please see the official Tika site for what's changed.

v1.15.0

6 years ago
  • Tika updated to 1.15. Please see the official Tika site for what's changed.

v1.14.2

7 years ago
  • Fix TextExtractor.Extract(string url) Closes #84
  • Fix TextExtractor nuget depenency on TikaOnDotNet. Should be future proof now. Closes #86

v1.14.1

7 years ago
  • Fix IKVM nuget dependency
  • Added StreamTextExtractor to support streams directly without in-memory buffering. Existing TextExtractor now uses this under the hood.

v1.14

7 years ago
  • Tika updated to 1.14. Please see the official Tika site for what's changed.
  • Please note that TikaOnDotnet assemblies are now signed. Thank you @Sicos1977 for the PR.

v1.13.1

7 years ago

v1.13.0

7 years ago

Updated to the latest Tika.

Tika 1.13 release notes

v1.12.2

8 years ago