5 Commits

Author SHA1 Message Date
T.v.Dein
b50c6acff0 fix XML parsing (#2)
- Use antchfx/xmlquery for easier XML parsing. No more regexp wrangling and the result is much more reliable over a variety of ebooks. Much good.
- fix chapter selection, look for `<?xml[...]` which is much more reliable
- add option `-x` to dump the XML ebook source for debugging
2025-10-16 18:57:05 +02:00
90d30cb3e1 enhanced wording 2025-10-16 13:03:42 +02:00
f942378b47 fixed copy/paste errors 2025-10-16 13:02:35 +02:00
3d3bbe3266 clarified purpose of epuppy 2025-10-16 12:50:37 +02:00
84307c42eb added contrib and minimal technical coc 2025-10-16 12:50:22 +02:00
16 changed files with 262 additions and 71 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 57 KiB

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 79 KiB

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 32 KiB

18
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,18 @@
# CODE_OF_CONDUCT.md (compact model)
Purpose: Maintain a collaborative, harassment-free environment focused
on shipping quality software.
Standards: Be respectful; assume good intent; no harassment; no
discrimination; keep critiques technical.
Scope: All project spaces (issues, PRs, forums, events).
Reporting: Email tom AT vondein DOT org. Acknowledge within 72 hours.
Enforcement: Two maintainers review, one recuses on
conflict. Sanctions range from warning to removal. Summary posted (no
personal details).
Escalation: If you believe maintainers handled a report in bad faith,
escalate to our foundation committee (link) for independent review.

94
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,94 @@
## Project Goals
The idea behind this project is to build a small TUI tool to be able
to just take a look into some epub file without the need to leave the
shell. It has to be fast enough to just peak into an ebook and should
be small and easy to understand.
There will be no GUI, no web interface, no public API of some sort, no
builtin interpreter. It is not intended to build a full blown ebook
reader.
The programming language used for this project will always be
[GOLANG](https://go.dev/) with the exception of the documentation
([Perl POD](https://perldoc.perl.org/perlpod)) and the Makefile.
# Contributing
You can contribute to this project in various ways:
## Open an issue
If you encounter a problem or don't understand how the program works
or if you think the documentation is unclear, please don't hesitate to
open an issue.
Please add as much information about the case as possible, such as:
- Your environment (operating system etc)
- program version
- Input data. Please replace sensitive information with mock data!
- Actual program output.
- Expected program output.
- Error message - if any.
Be aware that I am working on this (and some other) project in my
spare time which is scarce. Therefore please don't expect me to
respond to your query within hours or even days. Be patient, but I
WILL respond.
## Pull Requests
Code and documentation help is always much appreciated! Please follow
thes guidelines to successfully contribute:
- Every pull request shall be based on latest `development`
branch. `main` is only used for releases.
- Execute the unit tests before committing: `make test`. There shall
be no errors.
- Strive to be backwards compatible so that users who are already
using the program don't have to change their habits - unless it is
really neccessary.
- Try to add a unit test for your addition.
- Don't ever change existing unit tests!
- Add a meaningful and comprehensive rationale about your contribution:
- Why do you think it might be useful for others?
- What did you actually change or add?
- Is there an open issue which this PR fixes and if so, please link
to that issue.
- [Re-]format your code with `gofmt -s`.
- Avoid unneccesary dependencies, especially for very small functions.
- **If** a new dependency is being added, it must be compatible with
our [license agreement](LICENSE).
- You need to accept that the code or documentation you contribute
will be redistributed under the terms of said license agreement. If
your contribution is considerably large or if you contribute
regularly, then feel free to add your name and if you want your
email address to the *AUTHORS* section of the
manual page.
- Adhere to the above mentioned project goals.
- If you are unsure if your addition or change will be accepted,
better ask before starting coding. Open an issue about your proposal
and let's discuss it! That way we avoid doing unnessesary work on
both sides.
Each pull request will be carefully reviewed and if it is a useful
addition it will be accepted. However, please be prepared that
sometimes a PR will be rejected. The reasons may vary and will be
documented. Perhaps the above guidelines are not matched, or the
addition seems to be not so useful from my perspective, maybe there
are too much changes or there might be changes I don't even
understand.
But whatever happens: your contribution is always welcome!

View File

@@ -6,6 +6,13 @@ may not work for all EPUB files yet. It uses a modified version of the
unmaintained but the best I could find to parse EPUBs. Find it in the
`pkg/epub/` directory.
The idea behind this tool is to be able to just take a look into some
epub file without the need to leave the shell. And it had to be fast
enough to just peak into an ebook. However, it is possible to actually
read epub ebooks with epuppy but I'd encourage you to buy a hardware
ebook reader with an e-ink display. It's better for your eyes in the
long run.
## Screenshots
- Viewing an ebook in dark mode

View File

@@ -16,7 +16,7 @@ import (
)
const (
Version string = `v0.0.2`
Version string = `v0.0.3`
Usage string = `Usage epuppy [options] <epub file>
Options:
@@ -25,6 +25,7 @@ Options:
-n --line-numbers add line numbers
-c --config <file> use config <file>
-t --txt dump readable content to STDOUT
-x --xml dump source xml to STDOUT
-d --debug enable debugging
-h --help show help message
-v --version show program version`
@@ -37,6 +38,7 @@ type Config struct {
Darkmode bool `koanf:"dark"` // -D
LineNumbers bool `koanf:"line-numbers"` // -n
Dump bool `koanf:"txt"` // -t
XML bool `koanf:"xml"` // -x
Config string `koanf:"config"` // -c
ColorDark ColorSetting `koanf:"colordark"` // comes from config file only
ColorLight ColorSetting `koanf:"colorlight"` // comes from config file only
@@ -66,6 +68,7 @@ func InitConfig(output io.Writer) (*Config, error) {
flagset.BoolP("store-progress", "s", false, "store reading progress")
flagset.BoolP("line-numbers", "n", false, "add line numbers")
flagset.BoolP("txt", "t", false, "dump readable content to STDOUT")
flagset.BoolP("xml", "x", false, "dump xml to STDOUT")
flagset.StringP("config", "c", "", "read config from file")
if err := flagset.Parse(os.Args[1:]); err != nil {

View File

@@ -2,7 +2,6 @@ package cmd
import (
"fmt"
"log"
"os"
"path/filepath"
"strings"
@@ -33,17 +32,11 @@ func ViewText(conf *Config) (int, error) {
}
func ViewEpub(conf *Config) (int, error) {
book, err := epub.Open(conf.Document)
book, err := epub.Open(conf.Document, conf.XML)
if err != nil {
return 0, err
}
defer func() {
if err := book.Close(); err != nil {
log.Fatal(err)
}
}()
buf := strings.Builder{}
head := strings.Builder{}
@@ -59,9 +52,20 @@ func ViewEpub(conf *Config) (int, error) {
head.WriteString(" ")
}
fetchByContent(conf, &buf, book)
if conf.Dump {
return fmt.Println(buf.String())
}
return Pager(conf, head.String(), buf.String())
}
// FIXME: since the switch to book.Files() in epub.Open() this
// returns invalid chapter numbering
func fetchByContent(conf *Config, buf *strings.Builder, book *epub.Book) bool {
chapter := 1
var gotbody bool
for _, content := range book.Content {
if len(content.Body) > 0 {
@@ -81,13 +85,10 @@ func ViewEpub(conf *Config) (int, error) {
buf.WriteString("\r\n\r\n\r\n\r\n")
chapter++
gotbody = true
}
}
}
if conf.Dump {
return fmt.Println(buf.String())
}
return Pager(conf, head.String(), buf.String())
return gotbody
}

6
go.mod
View File

@@ -6,6 +6,7 @@ toolchain go1.24.9
require (
github.com/alecthomas/repr v0.5.2
github.com/antchfx/xmlquery v1.5.0
github.com/knadh/koanf/parsers/toml v0.1.0
github.com/knadh/koanf/providers/file v1.2.0
github.com/knadh/koanf/providers/posflag v1.0.1
@@ -35,14 +36,17 @@ require (
github.com/rivo/uniseg v0.4.7 // indirect
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect
golang.org/x/sys v0.37.0 // indirect
golang.org/x/text v0.3.8 // indirect
golang.org/x/text v0.30.0 // indirect
)
require (
github.com/antchfx/xpath v1.3.5 // indirect
github.com/fsnotify/fsnotify v1.9.0 // indirect
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/knadh/koanf/maps v0.1.2 // indirect
github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/pelletier/go-toml v1.9.5 // indirect
golang.org/x/net v0.33.0 // indirect
)

76
go.sum
View File

@@ -1,5 +1,9 @@
github.com/alecthomas/repr v0.5.2 h1:SU73FTI9D1P5UNtvseffFSGmdNci/O6RsqzeXJtP0Qs=
github.com/alecthomas/repr v0.5.2/go.mod h1:Fr0507jx4eOXV7AlPV6AVZLYrLIuIeSOWtW57eE/O/4=
github.com/antchfx/xmlquery v1.5.0 h1:uAi+mO40ZWfyU6mlUBxRVvL6uBNZ6LMU4M3+mQIBV4c=
github.com/antchfx/xmlquery v1.5.0/go.mod h1:lJfWRXzYMK1ss32zm1GQV3gMIW/HFey3xDZmkP1SuNc=
github.com/antchfx/xpath v1.3.5 h1:PqbXLC3TkfeZyakF5eeh3NTWEbYl4VHNVeufANzDbKQ=
github.com/antchfx/xpath v1.3.5/go.mod h1:i54GszH55fYfBmoZXapTHN8T8tkcHfRgLyVwwqzXNcs=
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -28,6 +32,9 @@ github.com/fsnotify/fsnotify v1.9.0 h1:2Ml+OJNzbYCTzsxtv8vKSFD9PbJjmhYF14k/jKC7S
github.com/fsnotify/fsnotify v1.9.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0=
github.com/go-viper/mapstructure/v2 v2.4.0 h1:EBsztssimR/CONLSZZ04E8qAkxNYq4Qp9LvH92wZUgs=
github.com/go-viper/mapstructure/v2 v2.4.0/go.mod h1:oJDH3BJKyqBA2TXFhDsKDGDTlndYOZ6rGS0BRZIxGhM=
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da h1:oI5xCqsCo564l8iNU+DwB5epxmsaqB+rhGL0m5jtYqE=
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/knadh/koanf/maps v0.1.2 h1:RBfmAW5CnZT+PJ1CVc1QSJKf4Xu9kxfQgYVQSu8hpbo=
github.com/knadh/koanf/maps v0.1.2/go.mod h1:npD/QZY3V6ghQDdcQzl1W4ICNVTkohC8E73eI2xW4yI=
github.com/knadh/koanf/parsers/toml v0.1.0 h1:S2hLqS4TgWZYj4/7mI5m1CQQcWurxUz6ODgOub/6LCI=
@@ -73,15 +80,80 @@ github.com/stretchr/testify v1.8.1 h1:w7B6lhMri9wdJUVmEZPGGhZzrYTPvgJArz7wNPgYKs
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e h1:JVG44RsyaB9T2KIHavMF/ppJZNG9ZpyihvCd0w101no=
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e/go.mod h1:RbqR21r5mrJuqunuUZ/Dhy/avygyECGrLceyNeo4LiM=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliYc=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/exp v0.0.0-20220909182711-5c715a9e8561 h1:MDc5xs78ZrZr3HMQugiXOAkSZtfTpbJLDr/lwfgO53E=
golang.org/x/exp v0.0.0-20220909182711-5c715a9e8561/go.mod h1:cyybsKvd6eL0RnXn6p/Grxp8F5bW7iYuBgsNCOHpMYE=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20210809222454-d867a43fc93e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.37.0 h1:fdNQudmxPjkdUTPnLn5mdQv7Zwvbvpaxqs831goi9kQ=
golang.org/x/sys v0.37.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.12.0/go.mod h1:owVbMEjm3cBLCHdkQu9b1opXd4ETQWc3BhuQGKgXgvU=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/term v0.27.0/go.mod h1:iMsnZpn0cago0GOrHO2+Y7u7JPn5AylBrcoWkElMTSM=
golang.org/x/term v0.36.0 h1:zMPR+aF8gfksFprF/Nc/rd1wRS1EI6nDBGyWAvDzx2Q=
golang.org/x/term v0.36.0/go.mod h1:Qu394IJq6V6dCBRgwqshf3mPF85AqzYEzofzRdZkWss=
golang.org/x/text v0.3.8 h1:nAL+RVCQ9uMn3vJZbV+MRnydTJFPf8qqY42YiA6MrqY=
golang.org/x/text v0.3.8/go.mod h1:E6s5w1FMmriuDzIBO73fBruAKo1PCIq6d2Q6DHfQ8WQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
golang.org/x/text v0.30.0 h1:yznKA/E9zq54KzlzBEAWn1NXSQ8DIp/NYMy88xJjl4k=
golang.org/x/text v0.30.0/go.mod h1:yDdHFIX9t+tORqspjENWgzaCVXgk0yYnYuSZ8UzzBVM=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View File

@@ -16,7 +16,6 @@ type Book struct {
Container Container `json:"-"`
Mimetype string `json:"-"`
Content []Content
fd *zip.ReadCloser
}
@@ -34,11 +33,6 @@ func (p *Book) Files() []string {
return fns
}
// Close close file reader
func (p *Book) Close() error {
return p.fd.Close()
}
// -----------------------------------------------------------------------------
func (p *Book) filename(n string) string {
return path.Join(path.Dir(p.Container.Rootfile.Path), n)

View File

@@ -1,19 +1,16 @@
package epub
import (
"encoding/xml"
"fmt"
"regexp"
"strings"
"github.com/antchfx/xmlquery"
)
var (
cleantitle = regexp.MustCompile(`(?s)<head>.*</head>`)
cleanmarkup = regexp.MustCompile(`<[^<>]+>`)
cleanentities = regexp.MustCompile(`&.+;`)
cleancomments = regexp.MustCompile(`/*.*/`)
cleanspace = regexp.MustCompile(`^\s*`)
cleanh1 = regexp.MustCompile(`<h[1-6].*</h[1-6]>`)
cleanentitles = regexp.MustCompile(`&.+;`)
empty = regexp.MustCompile(`(?s)^[\s ]*$`)
newlines = regexp.MustCompile(`[\r\n]+`)
)
// Content nav-point content
@@ -26,25 +23,30 @@ type Content struct {
}
func (c *Content) String(content []byte) error {
title := Title{}
err := xml.Unmarshal(content, &title)
// parse XML, look for title and <p>.*</p> stuff
doc, err := xmlquery.Parse(
strings.NewReader(
cleanentitles.ReplaceAllString(string(content), " ")))
if err != nil {
if !strings.HasPrefix(err.Error(), "XML syntax error") {
return fmt.Errorf("XML parser error %w", err)
panic(err)
}
// extract the title
for _, item := range xmlquery.Find(doc, "//title") {
c.Title = strings.TrimSpace(item.InnerText())
}
// extract all paragraphs, ignore any formatting and re-fill the
// paragraph, that is, we replaces all newlines inside with one
// space.
txt := strings.Builder{}
for _, item := range xmlquery.Find(doc, "//p") {
if !empty.MatchString(item.InnerText()) {
txt.WriteString(newlines.ReplaceAllString(item.InnerText(), " ") + "\n\n")
}
}
c.Title = strings.TrimSpace(title.Content)
txt := cleantitle.ReplaceAllString(string(content), "")
txt = cleanh1.ReplaceAllString(txt, "")
txt = cleanmarkup.ReplaceAllString(txt, "")
txt = cleanentities.ReplaceAllString(txt, " ")
txt = cleancomments.ReplaceAllString(txt, "")
txt = strings.TrimSpace(txt)
c.Body = cleanspace.ReplaceAllString(txt, "")
c.Body = strings.TrimSpace(txt.String())
c.XML = content
if len(c.Body) == 0 {

View File

@@ -1,36 +1,22 @@
package epub
import (
"log"
"testing"
)
func TestEpub(t *testing.T) {
bk, err := open(t, "test.epub")
_, err := open(t, "test.epub")
if err != nil {
t.Fatal(err)
}
defer func() {
if err := bk.Close(); err != nil {
log.Fatal(err)
}
}()
}
func open(t *testing.T, f string) (*Book, error) {
bk, err := Open(f)
bk, err := Open(f, false)
if err != nil {
return nil, err
}
defer func() {
if err := bk.Close(); err != nil {
log.Fatal(err)
}
}()
t.Logf("files: %+v", bk.Files())
t.Logf("book: %+v", bk)

View File

@@ -8,6 +8,7 @@ type Ncx struct {
// NavPoint nav point
type NavPoint struct {
Text string `xml:"navLabel>text" json:"text"`
Content Content `xml:"content" json:"content"`
Points []NavPoint `xml:"navPoint" json:"points"`
}

View File

@@ -2,12 +2,14 @@ package epub
import (
"archive/zip"
"fmt"
"log"
"os"
"strings"
)
// Open open a epub file
func Open(fn string) (*Book, error) {
func Open(fn string, dumpxml bool) (*Book, error) {
fd, err := zip.OpenReader(fn)
if err != nil {
return nil, err
@@ -55,14 +57,21 @@ func Open(fn string) (*Book, error) {
}
ct := Content{Src: file}
if strings.Contains(string(content), "DOCTYPE") {
if strings.Contains(string(content), "<?xml") {
if err := ct.String(content); err != nil {
return &bk, err
}
}
bk.Content = append(bk.Content, ct)
if dumpxml {
fmt.Println(string(ct.XML))
}
}
if dumpxml {
os.Exit(0)
}
return &bk, nil