mirror of
https://codeberg.org/scip/kleingebaeck.git
synced 2025-12-17 12:31:03 +01:00
Compare commits
3 Commits
doc/add-di
...
fix/window
| Author | SHA1 | Date | |
|---|---|---|---|
| 7337464112 | |||
| dbb64dcae1 | |||
| 74db3f534e |
43
README-de.md
43
README-de.md
@@ -222,49 +222,6 @@ Sowie alle Bilder.
|
|||||||
Das Format kann man mit der Variable `template` in der Konfiguration
|
Das Format kann man mit der Variable `template` in der Konfiguration
|
||||||
ändern. Die `example.conf` enthält ein Beispiel für das Standard Template.
|
ändern. Die `example.conf` enthält ein Beispiel für das Standard Template.
|
||||||
|
|
||||||
## Verhalten des Tools
|
|
||||||
|
|
||||||
Es gibt einige Dinge über das Verhalten von kleingebäck, über die Du
|
|
||||||
Bescheid wissen solltest:
|
|
||||||
|
|
||||||
- alle HTML Seiten und Bilder werden immer heruntergeladen
|
|
||||||
- es wird ein (konfigurierbarer) Useragent verwendet
|
|
||||||
- HTTP Cookies werden beachtet
|
|
||||||
- bei Fehlern wird dreimal mit unterschiedlichem Abstand erneut
|
|
||||||
versucht
|
|
||||||
- Bilder Downloads laufen parallelisiert mit leicht unterschiedlichen
|
|
||||||
zeitlichen Abständen ab
|
|
||||||
- Gleich aussehende Bilder werden nicht überschrieben
|
|
||||||
|
|
||||||
Der letzte Punkt muss genauer erläutert werden:
|
|
||||||
|
|
||||||
Wenn man bei Kleinanzeigen.de eine Anzeige einstellt und Bilder
|
|
||||||
postet, werden diese dort in ihrer Grösse reduziert (durch Kompression
|
|
||||||
und Verkleinerung der Bilder usw.). Diese reduzierten Bilder werden
|
|
||||||
dann von kleingebäck heruntergeladen. Falls Du Deine original Bilder
|
|
||||||
behalten hast, kannst Du diese danach in das Backupverzeichnis
|
|
||||||
kopieren. Bei einem erneuten kleingebäck-Lauf werden diese Bilder dann
|
|
||||||
nicht überschrieben.
|
|
||||||
|
|
||||||
Wir verwenden dafür einen Algorythmus namens [distance
|
|
||||||
hashing](https://github.com/corona10/goimagehash). Dieser Algorithmus
|
|
||||||
prüft die Ähnlichkeit von Bildern. Diese können in ihrer Auflösung,
|
|
||||||
Kompression, Farbtiefe und vielem mehr manipuliert worden sein und
|
|
||||||
trotzdem als das "gleiche Bild" erkannt werden (wohlgemerkt nicht "das
|
|
||||||
selbe": die Dateien sind durchaus unterschiedlich!). Bis zu einer
|
|
||||||
Distance von 5 überschreiben wir keine Bilder, weil wir dann davon
|
|
||||||
ausgehen, dass das lokal Vorhandene das Original ist.
|
|
||||||
|
|
||||||
Bitte beachte aber, dass dies KEIN Cachingmechanismus ist: die Bilder
|
|
||||||
werden trotzdem immer alle heruntergeladen. Das muss so sein, da wir
|
|
||||||
uns nicht die Dateinamen anschauen können, da kleinanzeigen.de diese
|
|
||||||
nämlich zu Zahlen umbenennt. Und die Dateinamen können sich auch
|
|
||||||
ändern, wenn der User in der Anzeige die Bilder umarrangiert hat.
|
|
||||||
|
|
||||||
Du kannst dieses Verhalten mit der Option **--force** ausschalten. Du
|
|
||||||
kannst ausserdem mit der Option **--ignoreerrors** auch alle Fehler
|
|
||||||
ignorieren, die beim Bilderdownload auftreten könnten.
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
Die Dokumentation kann man
|
Die Dokumentation kann man
|
||||||
|
|||||||
42
README.md
42
README.md
@@ -207,48 +207,6 @@ variable. The supplied sample config contains the default template.
|
|||||||
|
|
||||||
All images will be stored in the same directory.
|
All images will be stored in the same directory.
|
||||||
|
|
||||||
## Tool Behavior
|
|
||||||
|
|
||||||
There are a bunch of things you might want to know about the behavior
|
|
||||||
of the kleingebäck tool:
|
|
||||||
|
|
||||||
- all HTML pages and IMAGEs are always being downloaded
|
|
||||||
- we use a (customizable) user agent
|
|
||||||
- we respect HTTP cookies
|
|
||||||
- in the case of an error, the tool does 3 retries, the time it waits
|
|
||||||
between tries is longer for each retry
|
|
||||||
- image download is parallized using small time differences to look
|
|
||||||
more natural
|
|
||||||
- same images are not being overwritten on subsequent download
|
|
||||||
|
|
||||||
|
|
||||||
The latter needs to be elaborated a bit more:
|
|
||||||
|
|
||||||
If you publish an ad on kleinanzeigen.de and post images, those images
|
|
||||||
will be reduced in size by the site (by compressing and down sizing
|
|
||||||
them). This reduced images will be downloaded by kleingebäck. However,
|
|
||||||
you may still own the original images and may want to put them into
|
|
||||||
that backup directory so that you have all things for one ad together.
|
|
||||||
|
|
||||||
You can easily do that, because kleingebäck won't overwrite those
|
|
||||||
original images. It uses something called a distance hash using
|
|
||||||
[goimagehash](https://github.com/corona10/goimagehash). This
|
|
||||||
algorithmus checks the similarity of images. If an image has been
|
|
||||||
resized it is still very similar to the original one. We accept a
|
|
||||||
maximum of a distance of 5, everything above leads to overwrite.
|
|
||||||
|
|
||||||
This works with resizes, cropped and otherwise manipulated images as
|
|
||||||
long as the image still shows the original contents good enough.
|
|
||||||
|
|
||||||
Also note, that this is NOT a caching mechanism: the images will be
|
|
||||||
downloaded anyway during each run. We also can't look at the file
|
|
||||||
names because kleinanzeigen.de renames all images to numbers. And
|
|
||||||
those might even change if the user re-arranges the images.
|
|
||||||
|
|
||||||
You can override this behavior using the **--force** option. Another
|
|
||||||
option, **--ignoreerrors**, can be used to ignore all kinds of image
|
|
||||||
errors.
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
You can read the documentation [online](https://github.com/TLINDEN/kleingebaeck/blob/main/kleingebaeck.pod) or locally once you have installed kleingebaeck with: `kleingebaeck --manual`.
|
You can read the documentation [online](https://github.com/TLINDEN/kleingebaeck/blob/main/kleingebaeck.pod) or locally once you have installed kleingebaeck with: `kleingebaeck --manual`.
|
||||||
|
|||||||
@@ -34,7 +34,7 @@ import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
VERSION string = "0.3.5"
|
VERSION string = "0.3.6"
|
||||||
Baseuri string = "https://www.kleinanzeigen.de"
|
Baseuri string = "https://www.kleinanzeigen.de"
|
||||||
Listuri string = "/s-bestandsliste.html"
|
Listuri string = "/s-bestandsliste.html"
|
||||||
Defaultdir string = "."
|
Defaultdir string = "."
|
||||||
|
|||||||
1
go.mod
1
go.mod
@@ -24,6 +24,7 @@ require (
|
|||||||
github.com/corona10/goimagehash v1.1.0 // indirect
|
github.com/corona10/goimagehash v1.1.0 // indirect
|
||||||
github.com/fatih/color v1.16.0 // indirect
|
github.com/fatih/color v1.16.0 // indirect
|
||||||
github.com/fsnotify/fsnotify v1.6.0 // indirect
|
github.com/fsnotify/fsnotify v1.6.0 // indirect
|
||||||
|
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
||||||
github.com/knadh/koanf/maps v0.1.1 // indirect
|
github.com/knadh/koanf/maps v0.1.1 // indirect
|
||||||
github.com/mattn/go-colorable v0.1.13 // indirect
|
github.com/mattn/go-colorable v0.1.13 // indirect
|
||||||
github.com/mitchellh/copystructure v1.2.0 // indirect
|
github.com/mitchellh/copystructure v1.2.0 // indirect
|
||||||
|
|||||||
2
go.sum
2
go.sum
@@ -15,6 +15,8 @@ github.com/fatih/color v1.16.0 h1:zmkK9Ngbjj+K0yRhTVONQh1p/HknKYSlNT+vZCzyokM=
|
|||||||
github.com/fatih/color v1.16.0/go.mod h1:fL2Sau1YI5c0pdGEVCbKQbLXB6edEj1ZgiY4NijnWvE=
|
github.com/fatih/color v1.16.0/go.mod h1:fL2Sau1YI5c0pdGEVCbKQbLXB6edEj1ZgiY4NijnWvE=
|
||||||
github.com/fsnotify/fsnotify v1.6.0 h1:n+5WquG0fcWoWp6xPWfHdbskMCQaFnG6PfBrh1Ky4HY=
|
github.com/fsnotify/fsnotify v1.6.0 h1:n+5WquG0fcWoWp6xPWfHdbskMCQaFnG6PfBrh1Ky4HY=
|
||||||
github.com/fsnotify/fsnotify v1.6.0/go.mod h1:sl3t1tCWJFWoRz9R8WJCbQihKKwmorjAbSClcnxKAGw=
|
github.com/fsnotify/fsnotify v1.6.0/go.mod h1:sl3t1tCWJFWoRz9R8WJCbQihKKwmorjAbSClcnxKAGw=
|
||||||
|
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
|
||||||
|
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
|
||||||
github.com/jarcoal/httpmock v1.3.1 h1:iUx3whfZWVf3jT01hQTO/Eo5sAYtB2/rqaUuOtpInww=
|
github.com/jarcoal/httpmock v1.3.1 h1:iUx3whfZWVf3jT01hQTO/Eo5sAYtB2/rqaUuOtpInww=
|
||||||
github.com/jarcoal/httpmock v1.3.1/go.mod h1:3yb8rc4BI7TCBhFY8ng0gjuLKJNquuDNiPaZjnENuYg=
|
github.com/jarcoal/httpmock v1.3.1/go.mod h1:3yb8rc4BI7TCBhFY8ng0gjuLKJNquuDNiPaZjnENuYg=
|
||||||
github.com/knadh/koanf/maps v0.1.1 h1:G5TjmUh2D7G2YWf5SQQqSiHRJEjaicvU0KpypqB3NIs=
|
github.com/knadh/koanf/maps v0.1.1 h1:G5TjmUh2D7G2YWf5SQQqSiHRJEjaicvU0KpypqB3NIs=
|
||||||
|
|||||||
22
main.go
22
main.go
@@ -18,13 +18,16 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"bufio"
|
||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
|
"runtime"
|
||||||
"runtime/debug"
|
"runtime/debug"
|
||||||
|
|
||||||
|
"github.com/inconshreveable/mousetrap"
|
||||||
"github.com/lmittmann/tint"
|
"github.com/lmittmann/tint"
|
||||||
"github.com/tlinden/yadu"
|
"github.com/tlinden/yadu"
|
||||||
)
|
)
|
||||||
@@ -35,6 +38,25 @@ func main() {
|
|||||||
os.Exit(Main(os.Stdout))
|
os.Exit(Main(os.Stdout))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
// if we're running on Windows AND if the user double clicked the
|
||||||
|
// exe file from explorer, we tell them and then wait until any
|
||||||
|
// key has been hit, which will make the cmd window disappear and
|
||||||
|
// thus give the user time to read it.
|
||||||
|
if runtime.GOOS == "windows" {
|
||||||
|
if mousetrap.StartedByExplorer() {
|
||||||
|
fmt.Println("Do no double click kleingebaeck.exe!")
|
||||||
|
fmt.Println("Please open a command shell and run it from there.")
|
||||||
|
fmt.Println()
|
||||||
|
fmt.Print("Press any key to quit: ")
|
||||||
|
_, err := bufio.NewReader(os.Stdin).ReadString('\n')
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func Main(output io.Writer) int {
|
func Main(output io.Writer) int {
|
||||||
logLevel := &slog.LevelVar{}
|
logLevel := &slog.LevelVar{}
|
||||||
opts := &tint.Options{
|
opts := &tint.Options{
|
||||||
|
|||||||
Reference in New Issue
Block a user