Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPUB 3 support improvements #21

Merged
merged 8 commits into from
Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 37 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
# EpubReader
.NET library for reading EPUB files.

Supports .NET Framework >= 4.5, .NET Core >= 1.0, and .NET Standard >= 1.3.
Supports .NET Framework >= 4.6, .NET Core >= 1.0, and .NET Standard >= 1.3.

Supports EPUB 2 (2.0, 2.0.1) and EPUB 3 (3.0, 3.0.1, 3.1).

[Download](#download-latest-stable-release) | [WPF & .NET Core demo apps](#demo-apps)

## Migration from 2.x

[How to migrate from 2.x to 3.x](https:/vers-one/EpubReader/wiki/Migrating-from-2.x-to-3.x)

## Example
```csharp
// Opens a book and reads all of its content into memory
Expand Down Expand Up @@ -32,19 +38,25 @@ if (coverImageContent != null)
}
}

// CHAPTERS
// TABLE OF CONTENTS

// Enumerating chapters
foreach (EpubChapter chapter in epubBook.Chapters)
foreach (EpubNavigationItem chapter in epubBook.Navigation)
{
// Title of chapter
string chapterTitle = chapter.Title;

// HTML content of current chapter
string chapterHtmlContent = chapter.HtmlContent;

// Nested chapters
List<EpubChapter> subChapters = chapter.SubChapters;
List<EpubNavigationItem> subChapters = chapter.NestedItems;
}

// READING ORDER

// Enumerating the whole text content of the book in the order of reading
foreach (EpubTextContentFile textContentFile in book.ReadingOrder)
{
// HTML of current text content file
string htmlContent = textContentFile.Content;
}


Expand Down Expand Up @@ -116,32 +128,41 @@ foreach (EpubMetadataContributor contributor in package.Metadata.Contributors)
string contributorRole = contributor.Role;
}

// EPUB NCX data
EpubNavigation navigation = epubBook.Schema.Navigation;
// EPUB 2 NCX data
Epub2Ncx epub2Ncx = epubBook.Schema.Epub2Ncx;

// Enumerating NCX metadata
foreach (EpubNavigationHeadMeta meta in navigation.Head)
// Enumerating EPUB 2 NCX metadata
foreach (Epub2NcxHeadMeta meta in epub2Ncx.Head)
{
string metadataItemName = meta.Name;
string metadataItemContent = meta.Content;
}

// EPUB 3 navigation
Epub3NavDocument epub3NavDocument = epubBook.Schema.Epub3NavDocument

// Accessing structural semantics data of the head item
StructuralSemanticsProperty? ssp = epub3NavDocument.Navs.First().Type;
```

## More examples
[How to extract plain text from all chapters.](https:/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/ExtractPlainText.cs)

1. [How to extract the plain text of the whole book.](https:/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/ExtractPlainText.cs)
2. [How to extract the table of contents.](https:/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/PrintNavigation.cs)
3. [How to iterate over all EPUB files in a directory and collect some statistics.](https:/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/TestDirectory.cs)

## Download latest stable release
[Via NuGet package from nuget.org](https://www.nuget.org/packages/VersOne.Epub)

DLL file from GitHub: [for .NET Framework](https:/vers-one/EpubReader/releases/download/v2.0.5/VersOne.Epub.Net45.zip) (26.9 KB) / [for .NET Core](https:/vers-one/EpubReader/releases/download/v2.0.5/VersOne.Epub.NetCore.zip) (27.0 KB) / [for .NET Standard](https:/vers-one/EpubReader/releases/download/v2.0.5/VersOne.Epub.NetStandard.zip) (27.0 KB)
DLL file from GitHub: [for .NET Framework](https:/vers-one/EpubReader/releases/download/v3.0.0/VersOne.Epub.Net46.zip) (38.3 KB) / [for .NET Core](https:/vers-one/EpubReader/releases/download/v3.0.0/VersOne.Epub.NetCore.zip) (38.4 KB) / [for .NET Standard](https:/vers-one/EpubReader/releases/download/v3.0.0/VersOne.Epub.NetStandard.zip) (38.4 KB)

## Demo apps
[Download WPF demo app ](https:/vers-one/EpubReader/releases/download/v2.0.5/WpfDemo.zip) (WpfDemo.zip, 409 KB)
[Download WPF demo app](https:/vers-one/EpubReader/releases/download/v3.0.0/WpfDemo.zip) (WpfDemo.zip, 479 KB)

This .NET Framework application demonstrates how to open EPUB books and extract their content using the library.

HTML renderer used in this demo app may be a little bit slow for some books.
HTML renderer used in this demo app may have difficulties while rendering HTML content for some of the books if the HTML structure is too complicated.

[Download .NET Core console demo app](https:/vers-one/EpubReader/releases/download/v2.0.5/NetCoreDemo.zip) (NetCoreDemo.zip, 17.6 MB)
[Download .NET Core console demo app](https:/vers-one/EpubReader/releases/download/v3.0.0/NetCoreDemo.zip) (NetCoreDemo.zip, 17.6 MB)

This .NET Core console application demonstrates how to open EPUB books and retrieve their text content.
2 changes: 1 addition & 1 deletion Source/EpubReader.sln
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "VersOne.Epub", "VersOne.Epu
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VersOne.Epub.WpfDemo", "VersOne.Epub.WpfDemo\VersOne.Epub.WpfDemo.csproj", "{2C48D6FB-EC93-4B79-8E52-79B579B3C324}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VersOne.Epub.NetCoreDemo", "VersOne.Epub.NetCoreDemo\VersOne.Epub.NetCoreDemo.csproj", "{A6ED4735-3D37-4E44-BEE4-218C6BBAC1BD}"
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "VersOne.Epub.NetCoreDemo", "VersOne.Epub.NetCoreDemo\VersOne.Epub.NetCoreDemo.csproj", "{A6ED4735-3D37-4E44-BEE4-218C6BBAC1BD}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Expand Down
18 changes: 6 additions & 12 deletions Source/VersOne.Epub.NetCoreDemo/ExtractPlainText.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,30 +9,24 @@ internal static class ExtractPlainText
public static void Run(string filePath)
{
EpubBook book = EpubReader.ReadBook(filePath);
foreach (EpubChapter chapter in book.Chapters)
foreach (EpubTextContentFile textContentFile in book.ReadingOrder)
{
PrintChapter(chapter);
PrintTextContentFile(textContentFile);
}
}

private static void PrintChapter(EpubChapter chapter)
private static void PrintTextContentFile(EpubTextContentFile textContentFile)
{
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(chapter.HtmlContent);
htmlDocument.LoadHtml(textContentFile.Content);
StringBuilder sb = new StringBuilder();
foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//text()"))
{
sb.AppendLine(node.InnerText.Trim());
}
string chapterTitle = chapter.Title;
string chapterText = sb.ToString();
Console.WriteLine("------------ ", chapterTitle, "------------ ");
Console.WriteLine(chapterText);
string contentText = sb.ToString();
Console.WriteLine(contentText);
Console.WriteLine();
foreach (EpubChapter subChapter in chapter.SubChapters)
{
PrintChapter(subChapter);
}
}
}
}
26 changes: 0 additions & 26 deletions Source/VersOne.Epub.NetCoreDemo/ListChapters.cs

This file was deleted.

30 changes: 30 additions & 0 deletions Source/VersOne.Epub.NetCoreDemo/PrintNavigation.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
using System;

namespace VersOne.Epub.NetCoreDemo
{
internal static class PrintNavigation
{
public static void Run(string filePath)
{
using (EpubBookRef bookRef = EpubReader.OpenBook(filePath))
{
Console.WriteLine("Navigation:");
foreach (EpubNavigationItemRef navigationItemRef in bookRef.GetNavigation())
{
PrintNavigationItem(navigationItemRef, 0);
}
}
Console.WriteLine();
}

private static void PrintNavigationItem(EpubNavigationItemRef navigationItemRef, int identLevel)
{
Console.Write(new string(' ', identLevel * 2));
Console.WriteLine(navigationItemRef.Title);
foreach (EpubNavigationItemRef nestedNavigationItemRef in navigationItemRef.NestedItems)
{
PrintNavigationItem(nestedNavigationItemRef, identLevel + 1);
}
}
}
}
46 changes: 39 additions & 7 deletions Source/VersOne.Epub.NetCoreDemo/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -11,49 +11,81 @@ static void Main(string[] args)
while (input != 'Q')
{
Console.WriteLine("Select example:");
Console.WriteLine("1. List all chapters");
Console.WriteLine("2. Extract plain text from all chapters");
Console.WriteLine("1. Print book navigation tree (table of contents)");
Console.WriteLine("2. Extract plain text from the whole book");
Console.WriteLine("3. Test the library by reading all EPUB files from a directory");
Console.WriteLine("Q. Exit");
input = Char.ToUpper(Console.ReadKey(true).KeyChar);
Console.WriteLine();
switch (input)
{
case '1':
RunExample(ListChapters.Run);
RunFileExample(PrintNavigation.Run);
break;
case '2':
RunExample(ExtractPlainText.Run);
RunFileExample(ExtractPlainText.Run);
break;
case '3':
RunDirectoryExample(TestDirectory.Run);
break;
case 'Q':
break;
default:
Console.WriteLine("Input is not recognized. Please try again.");
Console.WriteLine();
break;
}
}
}

static void RunExample(Action<string> example)
private static void RunFileExample(Action<string> example)
{
Console.Write("Enter the path to the EPUB file: ");
string filePath = Console.ReadLine();
Console.WriteLine();
if (File.Exists(filePath) && Path.GetExtension(filePath).ToLower() == ".epub")
{
try
{
example(filePath);
Console.WriteLine();
}
catch (Exception ex)
{
Console.WriteLine("Exception was thrown:");
Console.WriteLine(ex.ToString());
Console.WriteLine();
}
}
else
{
Console.WriteLine("File doesn't exist.");
Console.WriteLine();
}
}
}

private static void RunDirectoryExample(Action<string> example)
{
Console.Write("Enter the path to the directory with EPUB files: ");
string directoryPath = Console.ReadLine();
Console.WriteLine();
if (Directory.Exists(directoryPath))
{
try
{
example(directoryPath);
}
catch (Exception ex)
{
Console.WriteLine("Exception was thrown:");
Console.WriteLine(ex.ToString());
Console.WriteLine();
}
}
else
{
Console.WriteLine("Directory doesn't exist.");
Console.WriteLine();
}
}
}
}
Loading