mirror of
https://codeberg.org/scip/kleingebaeck.git
synced 2025-12-17 12:31:03 +01:00
Compare commits
3 Commits
v0.3.6
...
fix/linter
| Author | SHA1 | Date | |
|---|---|---|---|
| 39269d3790 | |||
| bebcd15ada | |||
| 20e6299ebd |
1
.github/ISSUE_TEMPLATE/note_to_self.md
vendored
1
.github/ISSUE_TEMPLATE/note_to_self.md
vendored
@@ -5,4 +5,3 @@ title: "[bug-report]"
|
|||||||
labels: bug
|
labels: bug
|
||||||
assignees: TLINDEN
|
assignees: TLINDEN
|
||||||
|
|
||||||
---
|
|
||||||
|
|||||||
1
Makefile
1
Makefile
@@ -63,7 +63,6 @@ lint:
|
|||||||
|
|
||||||
lint-full:
|
lint-full:
|
||||||
golangci-lint run --enable-all --exclude-use-default --disable exhaustivestruct,exhaustruct,depguard,interfacer,deadcode,golint,structcheck,scopelint,varcheck,ifshort,maligned,nosnakecase,godot,funlen,gofumpt,cyclop,noctx,gochecknoglobals,paralleltest
|
golangci-lint run --enable-all --exclude-use-default --disable exhaustivestruct,exhaustruct,depguard,interfacer,deadcode,golint,structcheck,scopelint,varcheck,ifshort,maligned,nosnakecase,godot,funlen,gofumpt,cyclop,noctx,gochecknoglobals,paralleltest
|
||||||
gocritic check -enableAll *.go
|
|
||||||
|
|
||||||
testfuzzy: clean
|
testfuzzy: clean
|
||||||
go test -fuzz ./... $(ARGS)
|
go test -fuzz ./... $(ARGS)
|
||||||
|
|||||||
43
README-de.md
43
README-de.md
@@ -222,49 +222,6 @@ Sowie alle Bilder.
|
|||||||
Das Format kann man mit der Variable `template` in der Konfiguration
|
Das Format kann man mit der Variable `template` in der Konfiguration
|
||||||
ändern. Die `example.conf` enthält ein Beispiel für das Standard Template.
|
ändern. Die `example.conf` enthält ein Beispiel für das Standard Template.
|
||||||
|
|
||||||
## Verhalten des Tools
|
|
||||||
|
|
||||||
Es gibt einige Dinge über das Verhalten von kleingebäck, über die Du
|
|
||||||
Bescheid wissen solltest:
|
|
||||||
|
|
||||||
- alle HTML Seiten und Bilder werden immer heruntergeladen
|
|
||||||
- es wird ein (konfigurierbarer) Useragent verwendet
|
|
||||||
- HTTP Cookies werden beachtet
|
|
||||||
- bei Fehlern wird dreimal mit unterschiedlichem Abstand erneut
|
|
||||||
versucht
|
|
||||||
- Bilder Downloads laufen parallelisiert mit leicht unterschiedlichen
|
|
||||||
zeitlichen Abständen ab
|
|
||||||
- Gleich aussehende Bilder werden nicht überschrieben
|
|
||||||
|
|
||||||
Der letzte Punkt muss genauer erläutert werden:
|
|
||||||
|
|
||||||
Wenn man bei Kleinanzeigen.de eine Anzeige einstellt und Bilder
|
|
||||||
postet, werden diese dort in ihrer Grösse reduziert (durch Kompression
|
|
||||||
und Verkleinerung der Bilder usw.). Diese reduzierten Bilder werden
|
|
||||||
dann von kleingebäck heruntergeladen. Falls Du Deine original Bilder
|
|
||||||
behalten hast, kannst Du diese danach in das Backupverzeichnis
|
|
||||||
kopieren. Bei einem erneuten kleingebäck-Lauf werden diese Bilder dann
|
|
||||||
nicht überschrieben.
|
|
||||||
|
|
||||||
Wir verwenden dafür einen Algorythmus namens [distance
|
|
||||||
hashing](https://github.com/corona10/goimagehash). Dieser Algorithmus
|
|
||||||
prüft die Ähnlichkeit von Bildern. Diese können in ihrer Auflösung,
|
|
||||||
Kompression, Farbtiefe und vielem mehr manipuliert worden sein und
|
|
||||||
trotzdem als das "gleiche Bild" erkannt werden (wohlgemerkt nicht "das
|
|
||||||
selbe": die Dateien sind durchaus unterschiedlich!). Bis zu einer
|
|
||||||
Distance von 5 überschreiben wir keine Bilder, weil wir dann davon
|
|
||||||
ausgehen, dass das lokal Vorhandene das Original ist.
|
|
||||||
|
|
||||||
Bitte beachte aber, dass dies KEIN Cachingmechanismus ist: die Bilder
|
|
||||||
werden trotzdem immer alle heruntergeladen. Das muss so sein, da wir
|
|
||||||
uns nicht die Dateinamen anschauen können, da kleinanzeigen.de diese
|
|
||||||
nämlich zu Zahlen umbenennt. Und die Dateinamen können sich auch
|
|
||||||
ändern, wenn der User in der Anzeige die Bilder umarrangiert hat.
|
|
||||||
|
|
||||||
Du kannst dieses Verhalten mit der Option **--force** ausschalten. Du
|
|
||||||
kannst ausserdem mit der Option **--ignoreerrors** auch alle Fehler
|
|
||||||
ignorieren, die beim Bilderdownload auftreten könnten.
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
Die Dokumentation kann man
|
Die Dokumentation kann man
|
||||||
|
|||||||
42
README.md
42
README.md
@@ -207,48 +207,6 @@ variable. The supplied sample config contains the default template.
|
|||||||
|
|
||||||
All images will be stored in the same directory.
|
All images will be stored in the same directory.
|
||||||
|
|
||||||
## Tool Behavior
|
|
||||||
|
|
||||||
There are a bunch of things you might want to know about the behavior
|
|
||||||
of the kleingebäck tool:
|
|
||||||
|
|
||||||
- all HTML pages and IMAGEs are always being downloaded
|
|
||||||
- we use a (customizable) user agent
|
|
||||||
- we respect HTTP cookies
|
|
||||||
- in the case of an error, the tool does 3 retries, the time it waits
|
|
||||||
between tries is longer for each retry
|
|
||||||
- image download is parallized using small time differences to look
|
|
||||||
more natural
|
|
||||||
- same images are not being overwritten on subsequent download
|
|
||||||
|
|
||||||
|
|
||||||
The latter needs to be elaborated a bit more:
|
|
||||||
|
|
||||||
If you publish an ad on kleinanzeigen.de and post images, those images
|
|
||||||
will be reduced in size by the site (by compressing and down sizing
|
|
||||||
them). This reduced images will be downloaded by kleingebäck. However,
|
|
||||||
you may still own the original images and may want to put them into
|
|
||||||
that backup directory so that you have all things for one ad together.
|
|
||||||
|
|
||||||
You can easily do that, because kleingebäck won't overwrite those
|
|
||||||
original images. It uses something called a distance hash using
|
|
||||||
[goimagehash](https://github.com/corona10/goimagehash). This
|
|
||||||
algorithmus checks the similarity of images. If an image has been
|
|
||||||
resized it is still very similar to the original one. We accept a
|
|
||||||
maximum of a distance of 5, everything above leads to overwrite.
|
|
||||||
|
|
||||||
This works with resizes, cropped and otherwise manipulated images as
|
|
||||||
long as the image still shows the original contents good enough.
|
|
||||||
|
|
||||||
Also note, that this is NOT a caching mechanism: the images will be
|
|
||||||
downloaded anyway during each run. We also can't look at the file
|
|
||||||
names because kleinanzeigen.de renames all images to numbers. And
|
|
||||||
those might even change if the user re-arranges the images.
|
|
||||||
|
|
||||||
You can override this behavior using the **--force** option. Another
|
|
||||||
option, **--ignoreerrors**, can be used to ignore all kinds of image
|
|
||||||
errors.
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
You can read the documentation [online](https://github.com/TLINDEN/kleingebaeck/blob/main/kleingebaeck.pod) or locally once you have installed kleingebaeck with: `kleingebaeck --manual`.
|
You can read the documentation [online](https://github.com/TLINDEN/kleingebaeck/blob/main/kleingebaeck.pod) or locally once you have installed kleingebaeck with: `kleingebaeck --manual`.
|
||||||
|
|||||||
17
SECURITY.md
17
SECURITY.md
@@ -1,17 +0,0 @@
|
|||||||
# Security Policy
|
|
||||||
|
|
||||||
## Supported Versions
|
|
||||||
|
|
||||||
Only the latest release is supported. If you find an issue (any
|
|
||||||
issue!), please check with the latest release first.
|
|
||||||
|
|
||||||
## Reporting a Vulnerability
|
|
||||||
|
|
||||||
I don't agree with the "responsible disclosure" process most projects
|
|
||||||
(and companies) work these days.
|
|
||||||
|
|
||||||
So, if you find a vulnerability of any kind, please just open an
|
|
||||||
[issue](https://github.com/TLINDEN/kleingebaeck/issues). Please add
|
|
||||||
all details required to reproduce the vulnerability. You won't be chased.
|
|
||||||
|
|
||||||
That's just all about it.
|
|
||||||
4
ad.go
4
ad.go
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
Copyright © 2023-2024 Thomas von Dein
|
Copyright © 2023 Thomas von Dein
|
||||||
|
|
||||||
This program is free software: you can redistribute it and/or modify
|
This program is free software: you can redistribute it and/or modify
|
||||||
it under the terms of the GNU General Public License as published by
|
it under the terms of the GNU General Public License as published by
|
||||||
@@ -73,7 +73,7 @@ func (ad *Ad) Incomplete() bool {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (ad *Ad) CalculateExpire() {
|
func (ad *Ad) CalculateExpire() {
|
||||||
if ad.Created != "" {
|
if len(ad.Created) > 0 {
|
||||||
ts, err := time.Parse("02.01.2006", ad.Created)
|
ts, err := time.Parse("02.01.2006", ad.Created)
|
||||||
if err == nil {
|
if err == nil {
|
||||||
ad.Expire = ts.AddDate(0, ExpireMonths, ExpireDays).Format("02.01.2006")
|
ad.Expire = ts.AddDate(0, ExpireMonths, ExpireDays).Format("02.01.2006")
|
||||||
|
|||||||
11
config.go
11
config.go
@@ -34,7 +34,7 @@ import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
VERSION string = "0.3.6"
|
VERSION string = "0.3.2"
|
||||||
Baseuri string = "https://www.kleinanzeigen.de"
|
Baseuri string = "https://www.kleinanzeigen.de"
|
||||||
Listuri string = "/s-bestandsliste.html"
|
Listuri string = "/s-bestandsliste.html"
|
||||||
Defaultdir string = "."
|
Defaultdir string = "."
|
||||||
@@ -52,8 +52,6 @@ const (
|
|||||||
|
|
||||||
DefaultAdNameTemplate string = "{{.Slug}}"
|
DefaultAdNameTemplate string = "{{.Slug}}"
|
||||||
|
|
||||||
DefaultOutdirTemplate string = "."
|
|
||||||
|
|
||||||
// for image download throttling
|
// for image download throttling
|
||||||
MinThrottle int = 2
|
MinThrottle int = 2
|
||||||
MaxThrottle int = 20
|
MaxThrottle int = 20
|
||||||
@@ -67,8 +65,6 @@ const (
|
|||||||
WIN string = "windows"
|
WIN string = "windows"
|
||||||
)
|
)
|
||||||
|
|
||||||
var DirsVisited map[string]int
|
|
||||||
|
|
||||||
const Usage string = `This is kleingebaeck, the kleinanzeigen.de backup tool.
|
const Usage string = `This is kleingebaeck, the kleinanzeigen.de backup tool.
|
||||||
|
|
||||||
Usage: kleingebaeck [-dvVhmoclu] [<ad-listing-url>,...]
|
Usage: kleingebaeck [-dvVhmoclu] [<ad-listing-url>,...]
|
||||||
@@ -81,7 +77,7 @@ Options:
|
|||||||
-l --limit <num> Limit the ads to download to <num>, default: load all.
|
-l --limit <num> Limit the ads to download to <num>, default: load all.
|
||||||
-c --config <file> Use config file <file> (default: ~/.kleingebaeck).
|
-c --config <file> Use config file <file> (default: ~/.kleingebaeck).
|
||||||
--ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
--ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
||||||
-f --force Overwrite images and ads even if the already exist.
|
-f --force Download images even if they already exist.
|
||||||
-m --manual Show manual.
|
-m --manual Show manual.
|
||||||
-h --help Show usage.
|
-h --help Show usage.
|
||||||
-V --version Show program version.
|
-V --version Show program version.
|
||||||
@@ -130,7 +126,7 @@ func InitConfig(output io.Writer) (*Config, error) {
|
|||||||
// Load default values using the confmap provider.
|
// Load default values using the confmap provider.
|
||||||
if err := kloader.Load(confmap.Provider(map[string]interface{}{
|
if err := kloader.Load(confmap.Provider(map[string]interface{}{
|
||||||
"template": template,
|
"template": template,
|
||||||
"outdir": DefaultOutdirTemplate,
|
"outdir": ".",
|
||||||
"loglevel": "notice",
|
"loglevel": "notice",
|
||||||
"userid": 0,
|
"userid": 0,
|
||||||
"adnametemplate": DefaultAdNameTemplate,
|
"adnametemplate": DefaultAdNameTemplate,
|
||||||
@@ -157,7 +153,6 @@ func InitConfig(output io.Writer) (*Config, error) {
|
|||||||
flagset.BoolP("help", "h", false, "show usage")
|
flagset.BoolP("help", "h", false, "show usage")
|
||||||
flagset.BoolP("manual", "m", false, "show manual")
|
flagset.BoolP("manual", "m", false, "show manual")
|
||||||
flagset.BoolP("force", "f", false, "force")
|
flagset.BoolP("force", "f", false, "force")
|
||||||
flagset.BoolP("ignoreerrors", "", false, "ignore image download HTTP errors")
|
|
||||||
|
|
||||||
if err := flagset.Parse(os.Args[1:]); err != nil {
|
if err := flagset.Parse(os.Args[1:]); err != nil {
|
||||||
return nil, fmt.Errorf("failed to parse program arguments: %w", err)
|
return nil, fmt.Errorf("failed to parse program arguments: %w", err)
|
||||||
|
|||||||
2
fetch.go
2
fetch.go
@@ -52,7 +52,7 @@ func NewFetcher(conf *Config) (*Fetcher, error) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (f *Fetcher) Get(uri string) (io.ReadCloser, error) {
|
func (f *Fetcher) Get(uri string) (io.ReadCloser, error) {
|
||||||
req, err := http.NewRequest(http.MethodGet, uri, http.NoBody)
|
req, err := http.NewRequest(http.MethodGet, uri, nil)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to create a new HTTP request obj: %w", err)
|
return nil, fmt.Errorf("failed to create a new HTTP request obj: %w", err)
|
||||||
}
|
}
|
||||||
|
|||||||
5
go.mod
5
go.mod
@@ -14,7 +14,7 @@ require (
|
|||||||
github.com/lmittmann/tint v1.0.4
|
github.com/lmittmann/tint v1.0.4
|
||||||
github.com/mattn/go-isatty v0.0.20
|
github.com/mattn/go-isatty v0.0.20
|
||||||
github.com/spf13/pflag v1.0.5
|
github.com/spf13/pflag v1.0.5
|
||||||
github.com/tlinden/yadu v0.1.2
|
github.com/tlinden/yadu v0.1.1
|
||||||
golang.org/x/sync v0.5.0
|
golang.org/x/sync v0.5.0
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -24,7 +24,6 @@ require (
|
|||||||
github.com/corona10/goimagehash v1.1.0 // indirect
|
github.com/corona10/goimagehash v1.1.0 // indirect
|
||||||
github.com/fatih/color v1.16.0 // indirect
|
github.com/fatih/color v1.16.0 // indirect
|
||||||
github.com/fsnotify/fsnotify v1.6.0 // indirect
|
github.com/fsnotify/fsnotify v1.6.0 // indirect
|
||||||
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
|
||||||
github.com/knadh/koanf/maps v0.1.1 // indirect
|
github.com/knadh/koanf/maps v0.1.1 // indirect
|
||||||
github.com/mattn/go-colorable v0.1.13 // indirect
|
github.com/mattn/go-colorable v0.1.13 // indirect
|
||||||
github.com/mitchellh/copystructure v1.2.0 // indirect
|
github.com/mitchellh/copystructure v1.2.0 // indirect
|
||||||
@@ -34,7 +33,7 @@ require (
|
|||||||
github.com/pelletier/go-toml v1.9.5 // indirect
|
github.com/pelletier/go-toml v1.9.5 // indirect
|
||||||
github.com/pkg/errors v0.9.1 // indirect
|
github.com/pkg/errors v0.9.1 // indirect
|
||||||
golang.org/x/net v0.0.0-20220722155237-a158d28d115b // indirect
|
golang.org/x/net v0.0.0-20220722155237-a158d28d115b // indirect
|
||||||
golang.org/x/sys v0.17.0 // indirect
|
golang.org/x/sys v0.14.0 // indirect
|
||||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||||
|
|
||||||
)
|
)
|
||||||
|
|||||||
6
go.sum
6
go.sum
@@ -15,8 +15,6 @@ github.com/fatih/color v1.16.0 h1:zmkK9Ngbjj+K0yRhTVONQh1p/HknKYSlNT+vZCzyokM=
|
|||||||
github.com/fatih/color v1.16.0/go.mod h1:fL2Sau1YI5c0pdGEVCbKQbLXB6edEj1ZgiY4NijnWvE=
|
github.com/fatih/color v1.16.0/go.mod h1:fL2Sau1YI5c0pdGEVCbKQbLXB6edEj1ZgiY4NijnWvE=
|
||||||
github.com/fsnotify/fsnotify v1.6.0 h1:n+5WquG0fcWoWp6xPWfHdbskMCQaFnG6PfBrh1Ky4HY=
|
github.com/fsnotify/fsnotify v1.6.0 h1:n+5WquG0fcWoWp6xPWfHdbskMCQaFnG6PfBrh1Ky4HY=
|
||||||
github.com/fsnotify/fsnotify v1.6.0/go.mod h1:sl3t1tCWJFWoRz9R8WJCbQihKKwmorjAbSClcnxKAGw=
|
github.com/fsnotify/fsnotify v1.6.0/go.mod h1:sl3t1tCWJFWoRz9R8WJCbQihKKwmorjAbSClcnxKAGw=
|
||||||
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
|
|
||||||
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
|
|
||||||
github.com/jarcoal/httpmock v1.3.1 h1:iUx3whfZWVf3jT01hQTO/Eo5sAYtB2/rqaUuOtpInww=
|
github.com/jarcoal/httpmock v1.3.1 h1:iUx3whfZWVf3jT01hQTO/Eo5sAYtB2/rqaUuOtpInww=
|
||||||
github.com/jarcoal/httpmock v1.3.1/go.mod h1:3yb8rc4BI7TCBhFY8ng0gjuLKJNquuDNiPaZjnENuYg=
|
github.com/jarcoal/httpmock v1.3.1/go.mod h1:3yb8rc4BI7TCBhFY8ng0gjuLKJNquuDNiPaZjnENuYg=
|
||||||
github.com/knadh/koanf/maps v0.1.1 h1:G5TjmUh2D7G2YWf5SQQqSiHRJEjaicvU0KpypqB3NIs=
|
github.com/knadh/koanf/maps v0.1.1 h1:G5TjmUh2D7G2YWf5SQQqSiHRJEjaicvU0KpypqB3NIs=
|
||||||
@@ -68,8 +66,6 @@ github.com/tlinden/yadu v0.1.0 h1:qtCi1jxg392qVRLFyrJ2LYu6/PiKSp1LT02EX+mNLME=
|
|||||||
github.com/tlinden/yadu v0.1.0/go.mod h1:l3bRmHKL9zGAR6pnBHY2HRPxBecf7L74BoBgOOpTcUA=
|
github.com/tlinden/yadu v0.1.0/go.mod h1:l3bRmHKL9zGAR6pnBHY2HRPxBecf7L74BoBgOOpTcUA=
|
||||||
github.com/tlinden/yadu v0.1.1 h1:116oEUy9b4PcMF5wLL2dCFA/sn/praYutOnao07MROw=
|
github.com/tlinden/yadu v0.1.1 h1:116oEUy9b4PcMF5wLL2dCFA/sn/praYutOnao07MROw=
|
||||||
github.com/tlinden/yadu v0.1.1/go.mod h1:l3bRmHKL9zGAR6pnBHY2HRPxBecf7L74BoBgOOpTcUA=
|
github.com/tlinden/yadu v0.1.1/go.mod h1:l3bRmHKL9zGAR6pnBHY2HRPxBecf7L74BoBgOOpTcUA=
|
||||||
github.com/tlinden/yadu v0.1.2 h1:TYYVnUJwziRJ9YPbIbRf9ikmDw0Q8Ifixm+J/kBQFh8=
|
|
||||||
github.com/tlinden/yadu v0.1.2/go.mod h1:l3bRmHKL9zGAR6pnBHY2HRPxBecf7L74BoBgOOpTcUA=
|
|
||||||
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
||||||
golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
||||||
golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
||||||
@@ -85,8 +81,6 @@ golang.org/x/sys v0.0.0-20220908164124-27713097b956/go.mod h1:oPkhp1MJrh7nUepCBc
|
|||||||
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
golang.org/x/sys v0.14.0 h1:Vz7Qs629MkJkGyHxUlRHizWJRG2j8fbQKjELVSNhy7Q=
|
golang.org/x/sys v0.14.0 h1:Vz7Qs629MkJkGyHxUlRHizWJRG2j8fbQKjELVSNhy7Q=
|
||||||
golang.org/x/sys v0.14.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
golang.org/x/sys v0.14.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
||||||
golang.org/x/sys v0.17.0 h1:25cE3gD+tdBA7lp7QfhuV+rJiE9YXTcS3VG1SqssI/Y=
|
|
||||||
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
|
||||||
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
||||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
|
||||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||||
|
|||||||
10
image.go
10
image.go
@@ -33,7 +33,7 @@ const MaxDistance = 3
|
|||||||
type Image struct {
|
type Image struct {
|
||||||
Filename string
|
Filename string
|
||||||
Hash *goimagehash.ImageHash
|
Hash *goimagehash.ImageHash
|
||||||
Data *bytes.Reader
|
Data *bytes.Buffer
|
||||||
URI string
|
URI string
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -49,7 +49,7 @@ func (img *Image) LogValue() slog.Value {
|
|||||||
// holds all images of an ad
|
// holds all images of an ad
|
||||||
type Cache []*goimagehash.ImageHash
|
type Cache []*goimagehash.ImageHash
|
||||||
|
|
||||||
func NewImage(buf *bytes.Reader, filename, uri string) *Image {
|
func NewImage(buf *bytes.Buffer, filename string, uri string) *Image {
|
||||||
img := &Image{
|
img := &Image{
|
||||||
Filename: filename,
|
Filename: filename,
|
||||||
URI: uri,
|
URI: uri,
|
||||||
@@ -131,10 +131,8 @@ func ReadImages(addir string, dont bool) (Cache, error) {
|
|||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
|
||||||
reader := bytes.NewReader(data.Bytes())
|
img := NewImage(data, filename, "")
|
||||||
|
if err = img.CalcHash(); err != nil {
|
||||||
img := NewImage(reader, filename, "")
|
|
||||||
if err := img.CalcHash(); err != nil {
|
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -133,7 +133,7 @@
|
|||||||
.\" ========================================================================
|
.\" ========================================================================
|
||||||
.\"
|
.\"
|
||||||
.IX Title "KLEINGEBAECK 1"
|
.IX Title "KLEINGEBAECK 1"
|
||||||
.TH KLEINGEBAECK 1 "2024-02-10" "1" "User Commands"
|
.TH KLEINGEBAECK 1 "2024-01-25" "1" "User Commands"
|
||||||
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
|
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
|
||||||
.\" way too many mistakes in technical documents.
|
.\" way too many mistakes in technical documents.
|
||||||
.if n .ad l
|
.if n .ad l
|
||||||
@@ -152,7 +152,7 @@ kleingebaeck \- kleinanzeigen.de backup tool
|
|||||||
\& \-l \-\-limit <num> Limit the ads to download to <num>, default: load all.
|
\& \-l \-\-limit <num> Limit the ads to download to <num>, default: load all.
|
||||||
\& \-c \-\-config <file> Use config file <file> (default: ~/.kleingebaeck).
|
\& \-c \-\-config <file> Use config file <file> (default: ~/.kleingebaeck).
|
||||||
\& \-\-ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
\& \-\-ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
||||||
\& \-f \-\-force Overwrite images and ads even if the already exist.
|
\& \-f \-\-force Download images even if they already exist.
|
||||||
\& \-m \-\-manual Show manual.
|
\& \-m \-\-manual Show manual.
|
||||||
\& \-h \-\-help Show usage.
|
\& \-h \-\-help Show usage.
|
||||||
\& \-V \-\-version Show program version.
|
\& \-V \-\-version Show program version.
|
||||||
@@ -195,7 +195,7 @@ Be careful if you want to change the template. The variable is a
|
|||||||
multiline string surrounded by three double quotes. You can left out
|
multiline string surrounded by three double quotes. You can left out
|
||||||
certain fields and use any formatting you like. Refer to
|
certain fields and use any formatting you like. Refer to
|
||||||
<https://pkg.go.dev/text/template> for details how to write a
|
<https://pkg.go.dev/text/template> for details how to write a
|
||||||
template. Also read the \s-1TEMPLATES\s0 section below.
|
template.
|
||||||
.PP
|
.PP
|
||||||
If you're on windows and want to customize the output directory, put
|
If you're on windows and want to customize the output directory, put
|
||||||
it into single quotes to avoid the backslashes interpreted as escape
|
it into single quotes to avoid the backslashes interpreted as escape
|
||||||
@@ -204,94 +204,6 @@ chars like this:
|
|||||||
.Vb 1
|
.Vb 1
|
||||||
\& outdir = \*(AqC:\eData\eAds\*(Aq
|
\& outdir = \*(AqC:\eData\eAds\*(Aq
|
||||||
.Ve
|
.Ve
|
||||||
.SH "TEMPLATES"
|
|
||||||
.IX Header "TEMPLATES"
|
|
||||||
Various parts of the configuration can be modified using templates:
|
|
||||||
the output directory, the ad directory and the ad listing itself.
|
|
||||||
.SS "\s-1OUTPUT DIR TEMPLATE\s0"
|
|
||||||
.IX Subsection "OUTPUT DIR TEMPLATE"
|
|
||||||
The config varialbe \f(CW\*(C`outdir\*(C'\fR or the command line parameter \f(CW\*(C`\-o\*(C'\fR take a
|
|
||||||
template which may contain:
|
|
||||||
.ie n .IP """{{.Year}}""" 4
|
|
||||||
.el .IP "\f(CW{{.Year}}\fR" 4
|
|
||||||
.IX Item "{{.Year}}"
|
|
||||||
.PD 0
|
|
||||||
.ie n .IP """{{.Month}}""" 4
|
|
||||||
.el .IP "\f(CW{{.Month}}\fR" 4
|
|
||||||
.IX Item "{{.Month}}"
|
|
||||||
.ie n .IP """{{.Day}}""" 4
|
|
||||||
.el .IP "\f(CW{{.Day}}\fR" 4
|
|
||||||
.IX Item "{{.Day}}"
|
|
||||||
.PD
|
|
||||||
.PP
|
|
||||||
That way you can create a new output directory for every backup
|
|
||||||
run. For example:
|
|
||||||
.PP
|
|
||||||
.Vb 1
|
|
||||||
\& outdir = "/home/backups/ads\-{{.Year}}\-{{.Month}}\-{{.Day}}"
|
|
||||||
.Ve
|
|
||||||
.PP
|
|
||||||
Or using the command line flag:
|
|
||||||
.PP
|
|
||||||
.Vb 1
|
|
||||||
\& \-o "/home/backups/ads\-{{.Year}}\-{{.Month}}\-{{.Day}}"
|
|
||||||
.Ve
|
|
||||||
.PP
|
|
||||||
The default value is \f(CW\*(C`.\*(C'\fR \- the current directory.
|
|
||||||
.SS "\s-1AD DIRECTORY TEMPLATE\s0"
|
|
||||||
.IX Subsection "AD DIRECTORY TEMPLATE"
|
|
||||||
The ad directory name can be modified using the following ad values:
|
|
||||||
.IP "{{.Price}}" 4
|
|
||||||
.IX Item "{{.Price}}"
|
|
||||||
.PD 0
|
|
||||||
.IP "{{.ID}}" 4
|
|
||||||
.IX Item "{{.ID}}"
|
|
||||||
.IP "{{.Category}}" 4
|
|
||||||
.IX Item "{{.Category}}"
|
|
||||||
.IP "{{.Condition}}" 4
|
|
||||||
.IX Item "{{.Condition}}"
|
|
||||||
.IP "{{.Created}}" 4
|
|
||||||
.IX Item "{{.Created}}"
|
|
||||||
.IP "{{.Slug}}" 4
|
|
||||||
.IX Item "{{.Slug}}"
|
|
||||||
.IP "{{.Text}}" 4
|
|
||||||
.IX Item "{{.Text}}"
|
|
||||||
.PD
|
|
||||||
.PP
|
|
||||||
It can only be configured in the config file. By default only
|
|
||||||
\&\f(CW\*(C`{{.Slug}}\*(C'\fR is being used, this is the title of the ad in url format.
|
|
||||||
.SS "\s-1AD TEMPLATE\s0"
|
|
||||||
.IX Subsection "AD TEMPLATE"
|
|
||||||
The ad listing itself can be modified as well, using the same
|
|
||||||
variables as the ad name template above.
|
|
||||||
.PP
|
|
||||||
This is the default template:
|
|
||||||
.PP
|
|
||||||
.Vb 7
|
|
||||||
\& Title: {{.Title}}
|
|
||||||
\& Price: {{.Price}}
|
|
||||||
\& Id: {{.ID}}
|
|
||||||
\& Category: {{.Category}}
|
|
||||||
\& Condition: {{.Condition}}
|
|
||||||
\& Created: {{.Created}}
|
|
||||||
\& Expire: {{.Expire}}
|
|
||||||
\&
|
|
||||||
\& {{.Text}}
|
|
||||||
.Ve
|
|
||||||
.PP
|
|
||||||
The config parameter to modify is \f(CW\*(C`template\*(C'\fR. See example.conf in the
|
|
||||||
source repository. Please take care, since this is a multiline
|
|
||||||
string. This is how it shall look if you modify it:
|
|
||||||
.PP
|
|
||||||
.Vb 2
|
|
||||||
\& template="""
|
|
||||||
\& Title: {{.Title}}
|
|
||||||
\&
|
|
||||||
\& {{.Text}}
|
|
||||||
\& """
|
|
||||||
.Ve
|
|
||||||
.PP
|
|
||||||
That is, the content between the two \f(CW"""\fR chars is the template.
|
|
||||||
.SH "SETUP"
|
.SH "SETUP"
|
||||||
.IX Header "SETUP"
|
.IX Header "SETUP"
|
||||||
To setup the tool, you need to lookup your userid on
|
To setup the tool, you need to lookup your userid on
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ SYNOPSYS
|
|||||||
-l --limit <num> Limit the ads to download to <num>, default: load all.
|
-l --limit <num> Limit the ads to download to <num>, default: load all.
|
||||||
-c --config <file> Use config file <file> (default: ~/.kleingebaeck).
|
-c --config <file> Use config file <file> (default: ~/.kleingebaeck).
|
||||||
--ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
--ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
||||||
-f --force Overwrite images and ads even if the already exist.
|
-f --force Download images even if they already exist.
|
||||||
-m --manual Show manual.
|
-m --manual Show manual.
|
||||||
-h --help Show usage.
|
-h --help Show usage.
|
||||||
-V --version Show program version.
|
-V --version Show program version.
|
||||||
@@ -55,7 +55,6 @@ CONFIGURATION
|
|||||||
multiline string surrounded by three double quotes. You can left out
|
multiline string surrounded by three double quotes. You can left out
|
||||||
certain fields and use any formatting you like. Refer to
|
certain fields and use any formatting you like. Refer to
|
||||||
<https://pkg.go.dev/text/template> for details how to write a template.
|
<https://pkg.go.dev/text/template> for details how to write a template.
|
||||||
Also read the TEMPLATES section below.
|
|
||||||
|
|
||||||
If you're on windows and want to customize the output directory, put it
|
If you're on windows and want to customize the output directory, put it
|
||||||
into single quotes to avoid the backslashes interpreted as escape chars
|
into single quotes to avoid the backslashes interpreted as escape chars
|
||||||
@@ -63,71 +62,6 @@ CONFIGURATION
|
|||||||
|
|
||||||
outdir = 'C:\Data\Ads'
|
outdir = 'C:\Data\Ads'
|
||||||
|
|
||||||
TEMPLATES
|
|
||||||
Various parts of the configuration can be modified using templates: the
|
|
||||||
output directory, the ad directory and the ad listing itself.
|
|
||||||
|
|
||||||
OUTPUT DIR TEMPLATE
|
|
||||||
The config varialbe "outdir" or the command line parameter "-o" take a
|
|
||||||
template which may contain:
|
|
||||||
|
|
||||||
"{{.Year}}"
|
|
||||||
"{{.Month}}"
|
|
||||||
"{{.Day}}"
|
|
||||||
|
|
||||||
That way you can create a new output directory for every backup run. For
|
|
||||||
example:
|
|
||||||
|
|
||||||
outdir = "/home/backups/ads-{{.Year}}-{{.Month}}-{{.Day}}"
|
|
||||||
|
|
||||||
Or using the command line flag:
|
|
||||||
|
|
||||||
-o "/home/backups/ads-{{.Year}}-{{.Month}}-{{.Day}}"
|
|
||||||
|
|
||||||
The default value is "." - the current directory.
|
|
||||||
|
|
||||||
AD DIRECTORY TEMPLATE
|
|
||||||
The ad directory name can be modified using the following ad values:
|
|
||||||
|
|
||||||
{{.Price}}
|
|
||||||
{{.ID}}
|
|
||||||
{{.Category}}
|
|
||||||
{{.Condition}}
|
|
||||||
{{.Created}}
|
|
||||||
{{.Slug}}
|
|
||||||
{{.Text}}
|
|
||||||
|
|
||||||
It can only be configured in the config file. By default only
|
|
||||||
"{{.Slug}}" is being used, this is the title of the ad in url format.
|
|
||||||
|
|
||||||
AD TEMPLATE
|
|
||||||
The ad listing itself can be modified as well, using the same variables
|
|
||||||
as the ad name template above.
|
|
||||||
|
|
||||||
This is the default template:
|
|
||||||
|
|
||||||
Title: {{.Title}}
|
|
||||||
Price: {{.Price}}
|
|
||||||
Id: {{.ID}}
|
|
||||||
Category: {{.Category}}
|
|
||||||
Condition: {{.Condition}}
|
|
||||||
Created: {{.Created}}
|
|
||||||
Expire: {{.Expire}}
|
|
||||||
|
|
||||||
{{.Text}}
|
|
||||||
|
|
||||||
The config parameter to modify is "template". See example.conf in the
|
|
||||||
source repository. Please take care, since this is a multiline string.
|
|
||||||
This is how it shall look if you modify it:
|
|
||||||
|
|
||||||
template="""
|
|
||||||
Title: {{.Title}}
|
|
||||||
|
|
||||||
{{.Text}}
|
|
||||||
"""
|
|
||||||
|
|
||||||
That is, the content between the two """ chars is the template.
|
|
||||||
|
|
||||||
SETUP
|
SETUP
|
||||||
To setup the tool, you need to lookup your userid on kleinanzeigen.de.
|
To setup the tool, you need to lookup your userid on kleinanzeigen.de.
|
||||||
Go to your ad overview page while NOT being logged in:
|
Go to your ad overview page while NOT being logged in:
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ kleingebaeck - kleinanzeigen.de backup tool
|
|||||||
-l --limit <num> Limit the ads to download to <num>, default: load all.
|
-l --limit <num> Limit the ads to download to <num>, default: load all.
|
||||||
-c --config <file> Use config file <file> (default: ~/.kleingebaeck).
|
-c --config <file> Use config file <file> (default: ~/.kleingebaeck).
|
||||||
--ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
--ignoreerrors Ignore HTTP errors, may lead to incomplete ad backup.
|
||||||
-f --force Overwrite images and ads even if the already exist.
|
-f --force Download images even if they already exist.
|
||||||
-m --manual Show manual.
|
-m --manual Show manual.
|
||||||
-h --help Show usage.
|
-h --help Show usage.
|
||||||
-V --version Show program version.
|
-V --version Show program version.
|
||||||
@@ -55,7 +55,7 @@ Be careful if you want to change the template. The variable is a
|
|||||||
multiline string surrounded by three double quotes. You can left out
|
multiline string surrounded by three double quotes. You can left out
|
||||||
certain fields and use any formatting you like. Refer to
|
certain fields and use any formatting you like. Refer to
|
||||||
L<https://pkg.go.dev/text/template> for details how to write a
|
L<https://pkg.go.dev/text/template> for details how to write a
|
||||||
template. Also read the TEMPLATES section below.
|
template.
|
||||||
|
|
||||||
If you're on windows and want to customize the output directory, put
|
If you're on windows and want to customize the output directory, put
|
||||||
it into single quotes to avoid the backslashes interpreted as escape
|
it into single quotes to avoid the backslashes interpreted as escape
|
||||||
@@ -63,91 +63,6 @@ chars like this:
|
|||||||
|
|
||||||
outdir = 'C:\Data\Ads'
|
outdir = 'C:\Data\Ads'
|
||||||
|
|
||||||
=head1 TEMPLATES
|
|
||||||
|
|
||||||
Various parts of the configuration can be modified using templates:
|
|
||||||
the output directory, the ad directory and the ad listing itself.
|
|
||||||
|
|
||||||
=head2 OUTPUT DIR TEMPLATE
|
|
||||||
|
|
||||||
The config varialbe C<outdir> or the command line parameter C<-o> take a
|
|
||||||
template which may contain:
|
|
||||||
|
|
||||||
=over
|
|
||||||
|
|
||||||
=item C<{{.Year}}>
|
|
||||||
|
|
||||||
=item C<{{.Month}}>
|
|
||||||
|
|
||||||
=item C<{{.Day}}>
|
|
||||||
|
|
||||||
=back
|
|
||||||
|
|
||||||
That way you can create a new output directory for every backup
|
|
||||||
run. For example:
|
|
||||||
|
|
||||||
outdir = "/home/backups/ads-{{.Year}}-{{.Month}}-{{.Day}}"
|
|
||||||
|
|
||||||
Or using the command line flag:
|
|
||||||
|
|
||||||
-o "/home/backups/ads-{{.Year}}-{{.Month}}-{{.Day}}"
|
|
||||||
|
|
||||||
The default value is C<.> - the current directory.
|
|
||||||
|
|
||||||
=head2 AD DIRECTORY TEMPLATE
|
|
||||||
|
|
||||||
The ad directory name can be modified using the following ad values:
|
|
||||||
|
|
||||||
=over
|
|
||||||
|
|
||||||
=item {{.Price}}
|
|
||||||
|
|
||||||
=item {{.ID}}
|
|
||||||
|
|
||||||
=item {{.Category}}
|
|
||||||
|
|
||||||
=item {{.Condition}}
|
|
||||||
|
|
||||||
=item {{.Created}}
|
|
||||||
|
|
||||||
=item {{.Slug}}
|
|
||||||
|
|
||||||
=item {{.Text}}
|
|
||||||
|
|
||||||
=back
|
|
||||||
|
|
||||||
It can only be configured in the config file. By default only
|
|
||||||
C<{{.Slug}}> is being used, this is the title of the ad in url format.
|
|
||||||
|
|
||||||
=head2 AD TEMPLATE
|
|
||||||
|
|
||||||
The ad listing itself can be modified as well, using the same
|
|
||||||
variables as the ad name template above.
|
|
||||||
|
|
||||||
This is the default template:
|
|
||||||
|
|
||||||
Title: {{.Title}}
|
|
||||||
Price: {{.Price}}
|
|
||||||
Id: {{.ID}}
|
|
||||||
Category: {{.Category}}
|
|
||||||
Condition: {{.Condition}}
|
|
||||||
Created: {{.Created}}
|
|
||||||
Expire: {{.Expire}}
|
|
||||||
|
|
||||||
{{.Text}}
|
|
||||||
|
|
||||||
The config parameter to modify is C<template>. See example.conf in the
|
|
||||||
source repository. Please take care, since this is a multiline
|
|
||||||
string. This is how it shall look if you modify it:
|
|
||||||
|
|
||||||
template="""
|
|
||||||
Title: {{.Title}}
|
|
||||||
|
|
||||||
{{.Text}}
|
|
||||||
"""
|
|
||||||
|
|
||||||
That is, the content between the two C<"""> chars is the template.
|
|
||||||
|
|
||||||
=head1 SETUP
|
=head1 SETUP
|
||||||
|
|
||||||
To setup the tool, you need to lookup your userid on
|
To setup the tool, you need to lookup your userid on
|
||||||
|
|||||||
28
main.go
28
main.go
@@ -18,16 +18,13 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"bufio"
|
|
||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
"runtime"
|
|
||||||
"runtime/debug"
|
"runtime/debug"
|
||||||
|
|
||||||
"github.com/inconshreveable/mousetrap"
|
|
||||||
"github.com/lmittmann/tint"
|
"github.com/lmittmann/tint"
|
||||||
"github.com/tlinden/yadu"
|
"github.com/tlinden/yadu"
|
||||||
)
|
)
|
||||||
@@ -38,25 +35,6 @@ func main() {
|
|||||||
os.Exit(Main(os.Stdout))
|
os.Exit(Main(os.Stdout))
|
||||||
}
|
}
|
||||||
|
|
||||||
func init() {
|
|
||||||
// if we're running on Windows AND if the user double clicked the
|
|
||||||
// exe file from explorer, we tell them and then wait until any
|
|
||||||
// key has been hit, which will make the cmd window disappear and
|
|
||||||
// thus give the user time to read it.
|
|
||||||
if runtime.GOOS == "windows" {
|
|
||||||
if mousetrap.StartedByExplorer() {
|
|
||||||
fmt.Println("Do no double click kleingebaeck.exe!")
|
|
||||||
fmt.Println("Please open a command shell and run it from there.")
|
|
||||||
fmt.Println()
|
|
||||||
fmt.Print("Press any key to quit: ")
|
|
||||||
_, err := bufio.NewReader(os.Stdin).ReadString('\n')
|
|
||||||
if err != nil {
|
|
||||||
panic(err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func Main(output io.Writer) int {
|
func Main(output io.Writer) int {
|
||||||
logLevel := &slog.LevelVar{}
|
logLevel := &slog.LevelVar{}
|
||||||
opts := &tint.Options{
|
opts := &tint.Options{
|
||||||
@@ -134,11 +112,10 @@ func Main(output io.Writer) int {
|
|||||||
slog.Debug("config", "conf", conf)
|
slog.Debug("config", "conf", conf)
|
||||||
|
|
||||||
// prepare output dir
|
// prepare output dir
|
||||||
outdir, err := OutDirName(conf)
|
err = Mkdir(conf.Outdir)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return Die(err)
|
return Die(err)
|
||||||
}
|
}
|
||||||
conf.Outdir = outdir
|
|
||||||
|
|
||||||
// used for all HTTP requests
|
// used for all HTTP requests
|
||||||
fetch, err := NewFetcher(conf)
|
fetch, err := NewFetcher(conf)
|
||||||
@@ -146,9 +123,6 @@ func Main(output io.Writer) int {
|
|||||||
return Die(err)
|
return Die(err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// setup ad dir registry, needed to check for duplicates
|
|
||||||
DirsVisited = make(map[string]int)
|
|
||||||
|
|
||||||
switch {
|
switch {
|
||||||
case len(conf.Adlinks) >= 1:
|
case len(conf.Adlinks) >= 1:
|
||||||
// directly backup ad listing[s]
|
// directly backup ad listing[s]
|
||||||
|
|||||||
26
main_test.go
26
main_test.go
@@ -334,14 +334,14 @@ type Adsource struct {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Render a HTML template for an adlisting or an ad
|
// Render a HTML template for an adlisting or an ad
|
||||||
func GetTemplate(adconfigs []AdConfig, adconfig *AdConfig, htmltemplate string) string {
|
func GetTemplate(adconfigs []AdConfig, adconfig AdConfig, htmltemplate string) string {
|
||||||
tmpl, err := tpl.New("template").Parse(htmltemplate)
|
tmpl, err := tpl.New("template").Parse(htmltemplate)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
panic(err)
|
panic(err)
|
||||||
}
|
}
|
||||||
|
|
||||||
var out bytes.Buffer
|
var out bytes.Buffer
|
||||||
if adconfig.ID == "" {
|
if len(adconfig.ID) == 0 {
|
||||||
err = tmpl.Execute(&out, adconfigs)
|
err = tmpl.Execute(&out, adconfigs)
|
||||||
} else {
|
} else {
|
||||||
err = tmpl.Execute(&out, adconfig)
|
err = tmpl.Execute(&out, adconfig)
|
||||||
@@ -376,15 +376,15 @@ func InitValidSources() []Adsource {
|
|||||||
ads := []Adsource{
|
ads := []Adsource{
|
||||||
{
|
{
|
||||||
uri: fmt.Sprintf("%s%s?userId=1", Baseuri, Listuri),
|
uri: fmt.Sprintf("%s%s?userId=1", Baseuri, Listuri),
|
||||||
content: GetTemplate(list1, &empty, LISTTPL),
|
content: GetTemplate(list1, empty, LISTTPL),
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
uri: fmt.Sprintf("%s%s?userId=1&pageNum=2", Baseuri, Listuri),
|
uri: fmt.Sprintf("%s%s?userId=1&pageNum=2", Baseuri, Listuri),
|
||||||
content: GetTemplate(list2, &empty, LISTTPL),
|
content: GetTemplate(list2, empty, LISTTPL),
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
uri: fmt.Sprintf("%s%s?userId=1&pageNum=3", Baseuri, Listuri),
|
uri: fmt.Sprintf("%s%s?userId=1&pageNum=3", Baseuri, Listuri),
|
||||||
content: GetTemplate(list3, &empty, LISTTPL),
|
content: GetTemplate(list3, empty, LISTTPL),
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -392,7 +392,7 @@ func InitValidSources() []Adsource {
|
|||||||
for _, ad := range adsrc {
|
for _, ad := range adsrc {
|
||||||
ads = append(ads, Adsource{
|
ads = append(ads, Adsource{
|
||||||
uri: fmt.Sprintf("%s/s-anzeige/%s/%s", Baseuri, ad.Slug, ad.ID),
|
uri: fmt.Sprintf("%s/s-anzeige/%s/%s", Baseuri, ad.Slug, ad.ID),
|
||||||
content: GetTemplate(nil, &ad, ADTPL),
|
content: GetTemplate(nil, ad, ADTPL),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -405,28 +405,28 @@ func InitInvalidSources() []Adsource {
|
|||||||
{
|
{
|
||||||
// valid ad page but without content
|
// valid ad page but without content
|
||||||
uri: fmt.Sprintf("%s/s-anzeige/empty/1", Baseuri),
|
uri: fmt.Sprintf("%s/s-anzeige/empty/1", Baseuri),
|
||||||
content: GetTemplate(nil, &empty, EMPTYPAGE),
|
content: GetTemplate(nil, empty, EMPTYPAGE),
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
// some random foreign webpage
|
// some random foreign webpage
|
||||||
uri: INVALIDURI,
|
uri: INVALIDURI,
|
||||||
content: GetTemplate(nil, &empty, "<html>foo</html>"),
|
content: GetTemplate(nil, empty, "<html>foo</html>"),
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
// some invalid page path
|
// some invalid page path
|
||||||
uri: fmt.Sprintf("%s/anzeige/name/1", Baseuri),
|
uri: fmt.Sprintf("%s/anzeige/name/1", Baseuri),
|
||||||
content: GetTemplate(nil, &empty, "<html></html>"),
|
content: GetTemplate(nil, empty, "<html></html>"),
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
// some none-ad page
|
// some none-ad page
|
||||||
uri: fmt.Sprintf("%s/anzeige/name/1/foo/bar", Baseuri),
|
uri: fmt.Sprintf("%s/anzeige/name/1/foo/bar", Baseuri),
|
||||||
content: GetTemplate(nil, &empty, "<html>HTTP 404: /eine-anzeige/ does not exist!</html>"),
|
content: GetTemplate(nil, empty, "<html>HTTP 404: /eine-anzeige/ does not exist!</html>"),
|
||||||
status: 404,
|
status: 404,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
// valid ad page but 503
|
// valid ad page but 503
|
||||||
uri: fmt.Sprintf("%s/s-anzeige/503/1", Baseuri),
|
uri: fmt.Sprintf("%s/s-anzeige/503/1", Baseuri),
|
||||||
content: GetTemplate(nil, &empty, "<html>HTTP 503: service unavailable</html>"),
|
content: GetTemplate(nil, empty, "<html>HTTP 503: service unavailable</html>"),
|
||||||
status: 503,
|
status: 503,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
@@ -465,7 +465,7 @@ func SetIntercept(ads []Adsource) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func VerifyAd(advertisement *AdConfig) error {
|
func VerifyAd(advertisement AdConfig) error {
|
||||||
body := advertisement.Title + advertisement.Price + advertisement.ID + "Kleinanzeigen => " +
|
body := advertisement.Title + advertisement.Price + advertisement.ID + "Kleinanzeigen => " +
|
||||||
advertisement.Category + advertisement.Condition + advertisement.Created
|
advertisement.Category + advertisement.Condition + advertisement.Created
|
||||||
|
|
||||||
@@ -525,7 +525,7 @@ func TestMain(t *testing.T) {
|
|||||||
|
|
||||||
// verify if downloaded ads match
|
// verify if downloaded ads match
|
||||||
for _, ad := range adsrc {
|
for _, ad := range adsrc {
|
||||||
if err := VerifyAd(&ad); err != nil {
|
if err := VerifyAd(ad); err != nil {
|
||||||
t.Errorf(err.Error())
|
t.Errorf(err.Error())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
7
mkrel.sh
7
mkrel.sh
@@ -22,12 +22,7 @@ freebsd/amd64
|
|||||||
linux/amd64
|
linux/amd64
|
||||||
netbsd/amd64
|
netbsd/amd64
|
||||||
openbsd/amd64
|
openbsd/amd64
|
||||||
windows/amd64
|
windows/amd64"
|
||||||
freebsd/arm64
|
|
||||||
linux/arm64
|
|
||||||
netbsd/arm64
|
|
||||||
openbsd/arm64
|
|
||||||
windows/arm64"
|
|
||||||
|
|
||||||
tool="$1"
|
tool="$1"
|
||||||
version="$2"
|
version="$2"
|
||||||
|
|||||||
32
scrape.go
32
scrape.go
@@ -126,32 +126,16 @@ func ScrapeAd(fetch *Fetcher, uri string) error {
|
|||||||
|
|
||||||
advertisement.CalculateExpire()
|
advertisement.CalculateExpire()
|
||||||
|
|
||||||
// prepare ad dir name
|
|
||||||
addir, err := AdDirName(fetch.Config, advertisement)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
proceed := CheckAdVisited(fetch.Config, addir)
|
|
||||||
if !proceed {
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// write listing
|
// write listing
|
||||||
err = WriteAd(fetch.Config, advertisement, addir)
|
addir, err := WriteAd(fetch.Config, advertisement)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
// tell the user
|
|
||||||
slog.Debug("extracted ad listing", "ad", advertisement)
|
slog.Debug("extracted ad listing", "ad", advertisement)
|
||||||
|
|
||||||
// stats
|
|
||||||
fetch.Config.IncrAds()
|
fetch.Config.IncrAds()
|
||||||
|
|
||||||
// register for later checks
|
|
||||||
DirsVisited[addir] = 1
|
|
||||||
|
|
||||||
return ScrapeImages(fetch, advertisement, addir)
|
return ScrapeImages(fetch, advertisement, addir)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -184,15 +168,14 @@ func ScrapeImages(fetch *Fetcher, advertisement *Ad, addir string) error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
buf := new(bytes.Buffer)
|
buf := new(bytes.Buffer)
|
||||||
|
|
||||||
_, err = buf.ReadFrom(body)
|
_, err = buf.ReadFrom(body)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to read from image buffer: %w", err)
|
return fmt.Errorf("failed to read from image buffer: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
reader := bytes.NewReader(buf.Bytes())
|
buf2 := buf.Bytes() // needed for image writing
|
||||||
|
|
||||||
image := NewImage(reader, file, imguri)
|
image := NewImage(buf, file, imguri)
|
||||||
err = image.CalcHash()
|
err = image.CalcHash()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -206,17 +189,12 @@ func ScrapeImages(fetch *Fetcher, advertisement *Ad, addir string) error {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
_, err = reader.Seek(0, 0)
|
err = WriteImage(file, buf2)
|
||||||
if err != nil {
|
|
||||||
return fmt.Errorf("failed to seek(0) on image reader: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
err = WriteImage(file, reader)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Debug("wrote image", "image", image, "size", buf.Len(), "throttle", throttle)
|
slog.Debug("wrote image", "image", image, "size", len(buf2), "throttle", throttle)
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
})
|
})
|
||||||
|
|||||||
70
store.go
70
store.go
@@ -26,36 +26,8 @@ import (
|
|||||||
"runtime"
|
"runtime"
|
||||||
"strings"
|
"strings"
|
||||||
tpl "text/template"
|
tpl "text/template"
|
||||||
"time"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
type OutdirData struct {
|
|
||||||
Year, Day, Month string
|
|
||||||
}
|
|
||||||
|
|
||||||
func OutDirName(conf *Config) (string, error) {
|
|
||||||
tmpl, err := tpl.New("outdir").Parse(conf.Outdir)
|
|
||||||
if err != nil {
|
|
||||||
return "", fmt.Errorf("failed to parse outdir template: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
buf := bytes.Buffer{}
|
|
||||||
|
|
||||||
now := time.Now()
|
|
||||||
data := OutdirData{
|
|
||||||
Year: now.Format("2006"),
|
|
||||||
Month: now.Format("02"),
|
|
||||||
Day: now.Format("01"),
|
|
||||||
}
|
|
||||||
|
|
||||||
err = tmpl.Execute(&buf, data)
|
|
||||||
if err != nil {
|
|
||||||
return "", fmt.Errorf("failed to execute outdir template: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
return buf.String(), nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func AdDirName(conf *Config, advertisement *Ad) (string, error) {
|
func AdDirName(conf *Config, advertisement *Ad) (string, error) {
|
||||||
tmpl, err := tpl.New("adname").Parse(conf.Adnametemplate)
|
tmpl, err := tpl.New("adname").Parse(conf.Adnametemplate)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -72,13 +44,19 @@ func AdDirName(conf *Config, advertisement *Ad) (string, error) {
|
|||||||
return buf.String(), nil
|
return buf.String(), nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func WriteAd(conf *Config, advertisement *Ad, addir string) error {
|
func WriteAd(conf *Config, advertisement *Ad) (string, error) {
|
||||||
|
// prepare ad dir name
|
||||||
|
addir, err := AdDirName(conf, advertisement)
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
// prepare output dir
|
// prepare output dir
|
||||||
dir := filepath.Join(conf.Outdir, addir)
|
dir := filepath.Join(conf.Outdir, addir)
|
||||||
|
|
||||||
err := Mkdir(dir)
|
err = Mkdir(dir)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return "", err
|
||||||
}
|
}
|
||||||
|
|
||||||
// write ad file
|
// write ad file
|
||||||
@@ -86,7 +64,7 @@ func WriteAd(conf *Config, advertisement *Ad, addir string) error {
|
|||||||
|
|
||||||
listingfd, err := os.Create(listingfile)
|
listingfd, err := os.Create(listingfile)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to create Adlisting.txt: %w", err)
|
return "", fmt.Errorf("failed to create Adlisting.txt: %w", err)
|
||||||
}
|
}
|
||||||
defer listingfd.Close()
|
defer listingfd.Close()
|
||||||
|
|
||||||
@@ -98,27 +76,27 @@ func WriteAd(conf *Config, advertisement *Ad, addir string) error {
|
|||||||
|
|
||||||
tmpl, err := tpl.New("adlisting").Parse(conf.Template)
|
tmpl, err := tpl.New("adlisting").Parse(conf.Template)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to parse adlisting template: %w", err)
|
return "", fmt.Errorf("failed to parse adlisting template: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
err = tmpl.Execute(listingfd, advertisement)
|
err = tmpl.Execute(listingfd, advertisement)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to execute adlisting template: %w", err)
|
return "", fmt.Errorf("failed to execute adlisting template: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info("wrote ad listing", "listingfile", listingfile)
|
slog.Info("wrote ad listing", "listingfile", listingfile)
|
||||||
|
|
||||||
return nil
|
return addir, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func WriteImage(filename string, reader *bytes.Reader) error {
|
func WriteImage(filename string, buf []byte) error {
|
||||||
file, err := os.Create(filename)
|
file, err := os.Create(filename)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to open image file: %w", err)
|
return fmt.Errorf("failed to open image file: %w", err)
|
||||||
}
|
}
|
||||||
defer file.Close()
|
defer file.Close()
|
||||||
|
|
||||||
_, err = reader.WriteTo(file)
|
_, err = file.Write(buf)
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to write to image file: %w", err)
|
return fmt.Errorf("failed to write to image file: %w", err)
|
||||||
@@ -155,21 +133,3 @@ func fileExists(filename string) bool {
|
|||||||
|
|
||||||
return !info.IsDir()
|
return !info.IsDir()
|
||||||
}
|
}
|
||||||
|
|
||||||
// check if an addir has already been processed by current run and
|
|
||||||
// decide what to do
|
|
||||||
func CheckAdVisited(conf *Config, adname string) bool {
|
|
||||||
if Exists(DirsVisited, adname) {
|
|
||||||
if conf.ForceDownload {
|
|
||||||
slog.Warn("an ad with the same name has already been downloaded, overwriting", "addir", adname)
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
|
|
||||||
// don't overwrite
|
|
||||||
slog.Warn("an ad with the same name has already been downloaded, skipping (use -f to overwrite)", "addir", adname)
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
// overwrite
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -18,7 +18,6 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
|
||||||
"testing"
|
"testing"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -29,10 +28,10 @@ import (
|
|||||||
func TestWriteImage(t *testing.T) {
|
func TestWriteImage(t *testing.T) {
|
||||||
t.Parallel()
|
t.Parallel()
|
||||||
|
|
||||||
reader := bytes.NewReader([]byte{1, 2, 3, 4, 5, 6, 7, 8})
|
buf := []byte{1, 2, 3, 4, 5, 6, 7, 8}
|
||||||
file := "t/out/t.jpg"
|
file := "t/out/t.jpg"
|
||||||
|
|
||||||
err := WriteImage(file, reader)
|
err := WriteImage(file, buf)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Errorf("Could not write mock image to %s: %s", file, err.Error())
|
t.Errorf("Could not write mock image to %s: %s", file, err.Error())
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,7 +1,5 @@
|
|||||||
#!/bin/sh -x
|
#!/bin/sh -x
|
||||||
base="../kleinanzeigen"
|
base="../kleinanzeigen"
|
||||||
|
|
||||||
rm -rf $base
|
|
||||||
mkdir -p $base
|
mkdir -p $base
|
||||||
|
|
||||||
echo "Generating /s-bestandsliste.html"
|
echo "Generating /s-bestandsliste.html"
|
||||||
|
|||||||
13
util.go
13
util.go
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
Copyright © 2023-2024 Thomas von Dein
|
Copyright © 2023 Thomas von Dein
|
||||||
|
|
||||||
This program is free software: you can redistribute it and/or modify
|
This program is free software: you can redistribute it and/or modify
|
||||||
it under the terms of the GNU General Public License as published by
|
it under the terms of the GNU General Public License as published by
|
||||||
@@ -32,7 +32,7 @@ import (
|
|||||||
|
|
||||||
func Mkdir(dir string) error {
|
func Mkdir(dir string) error {
|
||||||
if _, err := os.Stat(dir); errors.Is(err, os.ErrNotExist) {
|
if _, err := os.Stat(dir); errors.Is(err, os.ErrNotExist) {
|
||||||
err := os.MkdirAll(dir, os.ModePerm)
|
err := os.Mkdir(dir, os.ModePerm)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to create directory %s: %w", dir, err)
|
return fmt.Errorf("failed to create directory %s: %w", dir, err)
|
||||||
}
|
}
|
||||||
@@ -74,12 +74,3 @@ func IsNoTty() bool {
|
|||||||
func GetThrottleTime() time.Duration {
|
func GetThrottleTime() time.Duration {
|
||||||
return time.Duration(rand.Intn(MaxThrottle-MinThrottle+1)+MinThrottle) * time.Millisecond
|
return time.Duration(rand.Intn(MaxThrottle-MinThrottle+1)+MinThrottle) * time.Millisecond
|
||||||
}
|
}
|
||||||
|
|
||||||
// look if a key in a map exists, generic variant
|
|
||||||
func Exists[K comparable, V any](m map[K]V, v K) bool {
|
|
||||||
if _, ok := m[v]; ok {
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|||||||
Reference in New Issue
Block a user