Compare commits

...

13 Commits

Author SHA1 Message Date
bd9d8fdb2c fix version finding 2023-12-17 17:53:01 +01:00
T.v.Dein
1ee886c504 Merge pull request #2 from TLINDEN/dev
re-orgainzied code a little, using go templates instead format string
2023-12-17 17:49:27 +01:00
f932d7be83 re-orgainzied code a little, using go templates instead format string 2023-12-17 17:32:05 +01:00
T.v.Dein
d7b13e8a9a Merge pull request #1 from TLINDEN/dev
added custom template support, added more ad data, use concurrency
2023-12-16 20:35:18 +01:00
e904ed6687 added custom template support, added more ad data, use concurrency 2023-12-16 20:32:10 +01:00
df6baadc85 better sample config 2023-12-16 00:01:29 +01:00
314315a1c6 fix pod entities => markdown 2023-12-15 18:29:42 +01:00
2e83e68f20 fix logo 2023-12-15 18:02:34 +01:00
b5e51b43c9 add logo 2023-12-15 18:00:41 +01:00
1b55d887bc enhancements:
- english README (german version will be put to the homepage)
- better commandline options
- enhanced logging capabilities and error handling
- config file support
- support to backup one or more singular ads
- add id to adlisting
- added manual page
- fixed config file reading
- fixed typo
2023-12-15 17:19:44 +01:00
c2f378be05 fix link 2023-12-14 19:06:40 +01:00
5b47128d0d added name 2023-12-14 19:03:18 +01:00
5bd49db9ba scrape all ads 2023-12-14 19:02:32 +01:00
16 changed files with 811 additions and 98 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

BIN
.github/assets/kleingebaecklogo.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

View File

@@ -17,7 +17,7 @@
# #
# no need to modify anything below # no need to modify anything below
tool = kleingebaeck tool = kleingebaeck
VERSION = $(shell grep VERSION main.go | head -1 | cut -d '"' -f2) VERSION = $(shell grep VERSION config.go | head -1 | cut -d '"' -f2)
archs = darwin freebsd linux windows archs = darwin freebsd linux windows
PREFIX = /usr/local PREFIX = /usr/local
UID = root UID = root

120
README.md
View File

@@ -1,24 +1,116 @@
## kleinanzeigen.de Backup ## Kleingebäck - kleinanzeigen.de Backup
![Kleingebaeck Logo](https://github.com/TLINDEN/kleingebaeck/blob/main/.github/assets/kleingebaecklogo-small.png)
[![License](https://img.shields.io/badge/license-GPL-blue.svg)](https://github.com/tlinden/kleingebaeck/blob/master/LICENSE) [![License](https://img.shields.io/badge/license-GPL-blue.svg)](https://github.com/tlinden/kleingebaeck/blob/master/LICENSE)
[![Go Report Card](https://goreportcard.com/badge/github.com/tlinden/kleingebaeck)](https://goreportcard.com/report/github.com/tlinden/kleingebaeck) [![Go Report Card](https://goreportcard.com/badge/github.com/tlinden/kleingebaeck)](https://goreportcard.com/report/github.com/tlinden/kleingebaeck)
![GitHub License](https://img.shields.io/github/license/tlinden/kleingebaeck)
[![GitHub release](https://img.shields.io/github/v/release/tlinden/kleingebaeck?color=%2300a719)](https://github.com/TLINDEN/kleingebaeck/releases/latest)
Mit diesem kleinen aber feinen Tool kann man seine
[https://kleinanzeigen.de](Anzeigen bei kleinanzeigen.de) sichern. Das
Problem ist ja bekanntlich, dass Kleinanzeigen nach einer Weile (2
Monate?) automatisch gelöscht werden. Wenn man keine Sicherung hat,
wird es schwierig, die erneut einzustellen. Mit dem Tool braucht man
sich keine Texte zu merken. Man kann auch einfach Änderungen
(z.B. Preis runter) durchführen oder den Text anpassen und dann ein
neues Backup anfertigen.
Es wird pro Anzeige ein Verzeichnis erstellt. In der Datei This tool can be used to backup ads on the german ad page https://kleinanzeigen.de
`Anzeige.txt` wird der Titel, die Beschreibung sowie der Preis
eingetragen. Ausserdem werden alle Bilder heruntergeladen.
## Copyright und Lizenz It downloads all (or only the specified ones) ads of one user into a
directory, each ad into its own subdirectory. The backup will contain
a textfile `Adlisting.txt` which contains the ad contents as the
title, body, price etc. All images will be downloaded as well.
Lizensiert unter der GNU GENERAL PUBLIC LICENSE version 3. The tool doesn't need authentication and doesn't have any
dependencies. Just download the binary for your platform from the
releases page and you're good to go.
The releases also include a handy tarball which you can use to install
the tool system-wide including the manual page. Just extract it and
type: `make install`.
## Commandline options:
```
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options:
--user,-u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output.
--verbose,-v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory)
--manual,-m Show manual.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck).
If one or more <ad-listing-url>'s are specified, only backup those,
otherwise backup all ads of the given user.
```
## Configfile
You can create a config file to save typing. By default
`~/.kleingebaeck.hcl` is being used but you can specify one with
`-c` as well.
Format is simple:
```
user = 1010101
verbose = true
outdir = "test"
template = ""
```
## Usage
To setup the tool, you need to lookup your userid on
kleinanzeigen.de. Go to your ad overview page while NOT being logged
in:
https://www.kleinanzeigen.de/s-bestandsliste.html?userId=XXXXXX
The `XXXXX` part is your userid.
Put it into the configfile as outlined above. Also specify an output
directory. Then just execute `kleingebaeck`.
Inside the output directory you'll find a new subdirectory for each
ad. Every directory contains a file `Adlisting.txt`, which will look
somewhat like this:
```default
Title: A book I sell
Price: 99 € VB
Id: 1919191919
Category: Sachbücher
Condition: Sehr Gut
Created: 10.12.2023
This is the description text.
Pay with paypal.
```
You can change the formatting using the `template` config
variable. The supplied sample config contains the default template.
All images will be stored in the same directory.
## Kleingebäck?
The name is derived from "kleinanzeigen backup": "klein" (german for
small) and "back". In german "bäck" is spelled the same as the english
"back" so "kleinbäck" was short enough, but it's not a valid german
word. "Kleingebäck" however is: it means "Cookies" in english :)
## Getting help
Although I'm happy to hear from kleingebaeck users in private email,
that's the best way for me to forget to do something.
In order to report a bug, unexpected behavior, feature requests or to
submit a patch, please open an issue on github:
https://github.com/TLINDEN/kleingebaeck/issues.
Please repeat the failing command with debugging enabled `-d` and
include the output in the issue.
## Copyright und License
Licensed under the GNU GENERAL PUBLIC LICENSE version 3.
## Author ## Author

64
config.go Normal file
View File

@@ -0,0 +1,64 @@
/*
Copyright © 2023 Thomas von Dein
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
package main
import (
"os"
"github.com/hashicorp/hcl/v2/hclsimple"
)
const (
VERSION string = "0.0.4"
Baseuri string = "https://www.kleinanzeigen.de"
Listuri string = "/s-bestandsliste.html"
Defaultdir string = "."
DefaultTemplate string = "Title: {{.Title}}\nPrice: {{.Price}}\nId: {{.Id}}\n" +
"Category: {{.Category}}\nCondition: {{.Condition}}\nCreated: {{.Created}}\n\n{{.Text}}\n"
Useragent string = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
type Config struct {
Verbose bool `hcl:"verbose"`
User int `hcl:"user"`
Outdir string `hcl:"outdir"`
Template string `hcl:"template"`
}
func ParseConfigfile(file string) (*Config, error) {
c := Config{}
if path, err := os.Stat(file); !os.IsNotExist(err) {
if !path.IsDir() {
configstring, err := os.ReadFile(file)
if err != nil {
return nil, err
}
err = hclsimple.Decode(
path.Name(), configstring,
nil, &c,
)
if err != nil {
return nil, err
}
}
}
return &c, nil
}

19
example.hcl Normal file
View File

@@ -0,0 +1,19 @@
#
# kleingebaeck sample configuration file.
# put this to ~/.kleingebaeck.hcl.
#
# Comments start with the '#' character.
# kleinanzeigen.de user-id. must be an unquoted number
user = 00000000
# enable verbose output (same as -v), may be true or false.
verbose = true
# directory where to store downloaded ads. kleingebaeck will try to
# create it. must be a quoted string.
outdir = "test"
# template. leave empty to use the default one, which is:
# "Title: {{.Title}}\nPrice: {{.Price}}\nId: {{.Id}}\nCategory: {{.Category}}\nCondition: {{.Condition}}\nCreated: {{.Created}}\n\n{{.Text}}\n"
template = ""

11
go.mod
View File

@@ -1,12 +1,21 @@
module kleingebaeck module kleingebaeck
go 1.20 go 1.21
require ( require (
astuart.co/goq v1.0.0 // indirect astuart.co/goq v1.0.0 // indirect
github.com/PuerkitoBio/goquery v1.5.0 // indirect github.com/PuerkitoBio/goquery v1.5.0 // indirect
github.com/agext/levenshtein v1.2.1 // indirect
github.com/andybalholm/cascadia v1.0.0 // indirect github.com/andybalholm/cascadia v1.0.0 // indirect
github.com/apparentlymart/go-textseg/v13 v13.0.0 // indirect
github.com/apparentlymart/go-textseg/v15 v15.0.0 // indirect
github.com/google/go-cmp v0.3.1 // indirect
github.com/hashicorp/hcl/v2 v2.19.1 // indirect
github.com/lmittmann/tint v1.0.3 // indirect
github.com/mitchellh/go-wordwrap v0.0.0-20150314170334-ad45545899c7 // indirect
github.com/spf13/pflag v1.0.5 // indirect github.com/spf13/pflag v1.0.5 // indirect
github.com/zclconf/go-cty v1.13.0 // indirect
golang.org/x/net v0.0.0-20190606173856-1492cefac77f // indirect golang.org/x/net v0.0.0-20190606173856-1492cefac77f // indirect
golang.org/x/text v0.11.0 // indirect
) )

18
go.sum
View File

@@ -2,14 +2,30 @@ astuart.co/goq v1.0.0 h1:nnYIhu/Z/j0VaX9Dp+pmh2Uh7ldEz6XfgSg+bAY5Yrw=
astuart.co/goq v1.0.0/go.mod h1:+fokcnFrO8Pw2fj8drdStJvzoMFebJH69rw8IC21rno= astuart.co/goq v1.0.0/go.mod h1:+fokcnFrO8Pw2fj8drdStJvzoMFebJH69rw8IC21rno=
github.com/PuerkitoBio/goquery v1.5.0 h1:uGvmFXOA73IKluu/F84Xd1tt/z07GYm8X49XKHP7EJk= github.com/PuerkitoBio/goquery v1.5.0 h1:uGvmFXOA73IKluu/F84Xd1tt/z07GYm8X49XKHP7EJk=
github.com/PuerkitoBio/goquery v1.5.0/go.mod h1:qD2PgZ9lccMbQlc7eEOjaeRlFQON7xY8kdmcsrnKqMg= github.com/PuerkitoBio/goquery v1.5.0/go.mod h1:qD2PgZ9lccMbQlc7eEOjaeRlFQON7xY8kdmcsrnKqMg=
github.com/agext/levenshtein v1.2.1 h1:QmvMAjj2aEICytGiWzmxoE0x2KZvE0fvmqMOfy2tjT8=
github.com/agext/levenshtein v1.2.1/go.mod h1:JEDfjyjHDjOF/1e4FlBE/PkbqA9OfWu2ki2W0IB5558=
github.com/andybalholm/cascadia v1.0.0 h1:hOCXnnZ5A+3eVDX8pvgl4kofXv2ELss0bKcqRySc45o= github.com/andybalholm/cascadia v1.0.0 h1:hOCXnnZ5A+3eVDX8pvgl4kofXv2ELss0bKcqRySc45o=
github.com/andybalholm/cascadia v1.0.0/go.mod h1:GsXiBklL0woXo1j/WYWtSYYC4ouU9PqHO0sqidkEA4Y= github.com/andybalholm/cascadia v1.0.0/go.mod h1:GsXiBklL0woXo1j/WYWtSYYC4ouU9PqHO0sqidkEA4Y=
github.com/apparentlymart/go-textseg/v13 v13.0.0 h1:Y+KvPE1NYz0xl601PVImeQfFyEy6iT90AvPUL1NNfNw=
github.com/apparentlymart/go-textseg/v13 v13.0.0/go.mod h1:ZK2fH7c4NqDTLtiYLvIkEghdlcqw7yxLeM89kiTRPUo=
github.com/apparentlymart/go-textseg/v15 v15.0.0 h1:uYvfpb3DyLSCGWnctWKGj857c6ew1u1fNQOlOtuGxQY=
github.com/apparentlymart/go-textseg/v15 v15.0.0/go.mod h1:K8XmNZdhEBkdlyDdvbmmsvpAG721bKi0joRfFdHIWJ4=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/google/go-cmp v0.3.1 h1:Xye71clBPdm5HgqGwUkwhbynsUJZhDbS20FvLhQ2izg=
github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
github.com/hashicorp/hcl/v2 v2.19.1 h1://i05Jqznmb2EXqa39Nsvyan2o5XyMowW5fnCKW5RPI=
github.com/hashicorp/hcl/v2 v2.19.1/go.mod h1:ThLC89FV4p9MPW804KVbe/cEXoQ8NZEh+JtMeeGErHE=
github.com/lmittmann/tint v1.0.3 h1:W5PHeA2D8bBJVvabNfQD/XW9HPLZK1XoPZH0cq8NouQ=
github.com/lmittmann/tint v1.0.3/go.mod h1:HIS3gSy7qNwGCj+5oRjAutErFBl4BzdQP6cJZ0NfMwE=
github.com/mitchellh/go-wordwrap v0.0.0-20150314170334-ad45545899c7 h1:DpOJ2HYzCv8LZP15IdmG+YdwD2luVPHITV96TkirNBM=
github.com/mitchellh/go-wordwrap v0.0.0-20150314170334-ad45545899c7/go.mod h1:ZXFpozHsX6DPmq2I0TCekCxypsnAUbP2oI0UX1GXzOo=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/zclconf/go-cty v1.13.0 h1:It5dfKTTZHe9aeppbNOda3mN7Ag7sg6QkBNm6TkyFa0=
github.com/zclconf/go-cty v1.13.0/go.mod h1:YKQzy/7pZ7iq2jNFzy5go57xdxdWoLLpaEp4u238AE0=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
@@ -17,3 +33,5 @@ golang.org/x/net v0.0.0-20190606173856-1492cefac77f h1:IWHgpgFqnL5AhBUBZSgBdjl2v
golang.org/x/net v0.0.0-20190606173856-1492cefac77f/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks= golang.org/x/net v0.0.0-20190606173856-1492cefac77f/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.11.0 h1:LAntKIrcmeSKERyiOh0XMV39LXS8IE9UL2yP7+f5ij4=
golang.org/x/text v0.11.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=

View File

@@ -133,11 +133,93 @@
.\" ======================================================================== .\" ========================================================================
.\" .\"
.IX Title "KLEINGEBAECK 1" .IX Title "KLEINGEBAECK 1"
.TH KLEINGEBAECK 1 "2023-12-14" "1" "User Commands" .TH KLEINGEBAECK 1 "2023-12-17" "1" "User Commands"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents. .\" way too many mistakes in technical documents.
.if n .ad l .if n .ad l
.nh .nh
.SS "kleingebaeck" .SH "NAME"
.IX Subsection "kleingebaeck" kleingebaeck \- kleinanzeigen.de backup tool
Backup of kleinanzeigen.de .SH "SYNOPSYS"
.IX Header "SYNOPSYS"
.Vb 9
\& This is kleingebaeck, the kleinanzeigen.de backup tool.
\& Usage: kleingebaeck [\-dvVhmoc] [<ad\-listing\-url>,...]
\& Options:
\& \-\-user,\-u <uid> Backup ads from user with uid <uid>.
\& \-\-debug, \-d Enable debug output.
\& \-\-verbose,\-v Enable verbose output.
\& \-\-output\-dir,\-o <dir> Set output dir (default: current directory)
\& \-\-manual,\-m Show manual.
\& \-\-config,\-c <file> Use config file <file> (default: ~/.kleingebaeck).
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This tool can be used to backup ads on the german ad page <https://kleinanzeigen.de>.
.PP
It downloads all (or only the specified ones) ads of one user into a
directory, each ad into its own subdirectory. The backup will contain
a textfile \fBAdlisting.txt\fR which contains the ad contents such as
title, body, price etc. All images will be downloaded as well.
.SH "CONFIGURATION"
.IX Header "CONFIGURATION"
You can create a config file to save typing. By default
\&\f(CW\*(C`~/.kleingebaeck.hcl\*(C'\fR is being used but you can specify one with
\&\f(CW\*(C`\-c\*(C'\fR as well.
.PP
Format is simple:
.PP
.Vb 4
\& user = 1010101
\& verbose = true
\& outdir = "test"
\& template = ""
.Ve
.PP
Be carefull if you want to change the template. The default one looks like this:
.PP
.Vb 1
\& Title: {{.Title}}\enPrice: {{.Price}}\enId: {{.Id}}\enCategory: {{.Category}}\enCondition: {{.Condition}}\enCreated: {{.Created}}\en\en{{.Text}}\en
.Ve
.PP
You can left out certain fields and use any formatting you like. Refer
to <https://pkg.go.dev/text/template> for details how to write a template.
.SH "SETUP"
.IX Header "SETUP"
To setup the tool, you need to lookup your userid on
kleinanzeigen.de. Go to your ad overview page while \s-1NOT\s0 being logged
in:
.PP
.Vb 1
\& https://www.kleinanzeigen.de/s\-bestandsliste.html?userId=XXXXXX
.Ve
.PP
The \fB\s-1XXXXX\s0\fR part is your userid.
.PP
Put it into the configfile as outlined above. Also specify an output
directory. Then just execute \f(CW\*(C`kleingebaeck\*(C'\fR.
.PP
You can use the \fB\-v\fR option to get verbose output or \fB\-d\fR to enable
debugging.
.SH "BUGS"
.IX Header "BUGS"
In order to report a bug, unexpected behavior, feature requests
or to submit a patch, please open an issue on github:
<https://github.com/TLINDEN/kleingebaeck/issues>.
.PP
Please repeat the failing command with debugging enabled \f(CW\*(C`\-d\*(C'\fR and
include the output in the issue.
.SH "LIMITATIONS"
.IX Header "LIMITATIONS"
The \f(CW\*(C`kleingebaeck\*(C'\fR doesn't currently check if it has downloaded a
file already, so it downloads everything again every time you execute
it. Be aware of it. This will change in the future.
.PP
Also there's currently no parallelization implemented. This will
change in the future.
.SH "LICENSE"
.IX Header "LICENSE"
Licensed under the \s-1GNU GENERAL PUBLIC LICENSE\s0 version 3.
.SH "Author"
.IX Header "Author"
T.v.Dein <tom \s-1AT\s0 vondein \s-1DOT\s0 org>

View File

@@ -1,7 +1,84 @@
package main package main
var manpage = ` var manpage = `
kleingebaeck NAME
Backup of kleinanzeigen.de kleingebaeck - kleinanzeigen.de backup tool
SYNOPSYS
This is kleingebaeck, the kleinanzeigen.de backup tool.
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options:
--user,-u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output.
--verbose,-v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory)
--manual,-m Show manual.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck).
DESCRIPTION
This tool can be used to backup ads on the german ad page
<https://kleinanzeigen.de>.
It downloads all (or only the specified ones) ads of one user into a
directory, each ad into its own subdirectory. The backup will contain a
textfile Adlisting.txt which contains the ad contents such as title,
body, price etc. All images will be downloaded as well.
CONFIGURATION
You can create a config file to save typing. By default
"~/.kleingebaeck.hcl" is being used but you can specify one with "-c" as
well.
Format is simple:
user = 1010101
verbose = true
outdir = "test"
template = ""
Be carefull if you want to change the template. The default one looks
like this:
Title: {{.Title}}\nPrice: {{.Price}}\nId: {{.Id}}\nCategory: {{.Category}}\nCondition: {{.Condition}}\nCreated: {{.Created}}\n\n{{.Text}}\n
You can left out certain fields and use any formatting you like. Refer
to <https://pkg.go.dev/text/template> for details how to write a
template.
SETUP
To setup the tool, you need to lookup your userid on kleinanzeigen.de.
Go to your ad overview page while NOT being logged in:
https://www.kleinanzeigen.de/s-bestandsliste.html?userId=XXXXXX
The XXXXX part is your userid.
Put it into the configfile as outlined above. Also specify an output
directory. Then just execute "kleingebaeck".
You can use the -v option to get verbose output or -d to enable
debugging.
BUGS
In order to report a bug, unexpected behavior, feature requests or to
submit a patch, please open an issue on github:
<https://github.com/TLINDEN/kleingebaeck/issues>.
Please repeat the failing command with debugging enabled "-d" and
include the output in the issue.
LIMITATIONS
The "kleingebaeck" doesn't currently check if it has downloaded a file
already, so it downloads everything again every time you execute it. Be
aware of it. This will change in the future.
Also there's currently no parallelization implemented. This will change
in the future.
LICENSE
Licensed under the GNU GENERAL PUBLIC LICENSE version 3.
Author
T.v.Dein <tom AT vondein DOT org>
` `

View File

@@ -1,5 +1,90 @@
=head2 kleingebaeck =head1 NAME
kleingebaeck - kleinanzeigen.de backup tool
=head1 SYNOPSYS
This is kleingebaeck, the kleinanzeigen.de backup tool.
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options:
--user,-u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output.
--verbose,-v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory)
--manual,-m Show manual.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck).
=head1 DESCRIPTION
This tool can be used to backup ads on the german ad page L<https://kleinanzeigen.de>.
It downloads all (or only the specified ones) ads of one user into a
directory, each ad into its own subdirectory. The backup will contain
a textfile B<Adlisting.txt> which contains the ad contents such as
title, body, price etc. All images will be downloaded as well.
=head1 CONFIGURATION
You can create a config file to save typing. By default
C<~/.kleingebaeck.hcl> is being used but you can specify one with
C<-c> as well.
Format is simple:
user = 1010101
verbose = true
outdir = "test"
template = ""
Be carefull if you want to change the template. The default one looks like this:
Title: {{.Title}}\nPrice: {{.Price}}\nId: {{.Id}}\nCategory: {{.Category}}\nCondition: {{.Condition}}\nCreated: {{.Created}}\n\n{{.Text}}\n
You can left out certain fields and use any formatting you like. Refer
to L<https://pkg.go.dev/text/template> for details how to write a template.
=head1 SETUP
To setup the tool, you need to lookup your userid on
kleinanzeigen.de. Go to your ad overview page while NOT being logged
in:
https://www.kleinanzeigen.de/s-bestandsliste.html?userId=XXXXXX
The B<XXXXX> part is your userid.
Put it into the configfile as outlined above. Also specify an output
directory. Then just execute C<kleingebaeck>.
You can use the B<-v> option to get verbose output or B<-d> to enable
debugging.
=head1 BUGS
In order to report a bug, unexpected behavior, feature requests
or to submit a patch, please open an issue on github:
L<https://github.com/TLINDEN/kleingebaeck/issues>.
Please repeat the failing command with debugging enabled C<-d> and
include the output in the issue.
=head1 LIMITATIONS
The C<kleingebaeck> doesn't currently check if it has downloaded a
file already, so it downloads everything again every time you execute
it. Be aware of it. This will change in the future.
Also there's currently no parallelization implemented. This will
change in the future.
=head1 LICENSE
Licensed under the GNU GENERAL PUBLIC LICENSE version 3.
=head1 Author
T.v.Dein <tom AT vondein DOT org>
Backup of kleinanzeigen.de
=cut =cut

139
main.go
View File

@@ -20,36 +20,69 @@ package main
import ( import (
"errors" "errors"
"fmt" "fmt"
"log/slog"
"os" "os"
"runtime/debug"
"github.com/lmittmann/tint"
flag "github.com/spf13/pflag" flag "github.com/spf13/pflag"
) )
const VERSION string = "0.0.1" const Usage string = `This is kleingebaeck, the kleinanzeigen.de backup tool.
const Useragent string = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
const Baseuri string = "https://www.kleinanzeigen.de" Options:
const Listuri string = "/s-bestandsliste.html" --user,-u <uid> Backup ads from user with uid <uid>.
const Defaultdir string = "." --debug, -d Enable debug output.
--verbose,-v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory)
--manual,-m Show manual.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck).
If one or more <ad-listing-url>'s are specified, only backup those,
otherwise backup all ads of the given user.`
const LevelNotice = slog.Level(2)
func main() { func main() {
os.Exit(Main()) os.Exit(Main())
} }
func Main() int { func Main() int {
logLevel := &slog.LevelVar{}
opts := &tint.Options{
Level: logLevel,
AddSource: false,
ReplaceAttr: func(groups []string, a slog.Attr) slog.Attr {
// Remove time from the output
if a.Key == slog.TimeKey {
return slog.Attr{}
}
return a
},
}
logLevel.Set(LevelNotice)
var handler slog.Handler = tint.NewHandler(os.Stdout, opts)
logger := slog.New(handler)
slog.SetDefault(logger)
showversion := false showversion := false
showhelp := false showhelp := false
showmanual := false showmanual := false
enabledebug := false enabledebug := false
configfile := "" enableverbose := false
dir := Defaultdir uid := 0
configfile := os.Getenv("HOME") + "/.kleingebaeck.hcl"
dir := ""
flag.BoolVarP(&enabledebug, "debug", "d", false, "debug mode") flag.BoolVarP(&enabledebug, "debug", "d", false, "debug mode")
flag.BoolVarP(&showversion, "version", "v", false, "show version") flag.BoolVarP(&enableverbose, "verbose", "v", false, "be verbose")
flag.BoolVarP(&showversion, "version", "V", false, "show version")
flag.BoolVarP(&showhelp, "help", "h", false, "show usage") flag.BoolVarP(&showhelp, "help", "h", false, "show usage")
flag.BoolVarP(&showmanual, "manual", "m", false, "show manual") flag.BoolVarP(&showmanual, "manual", "m", false, "show manual")
flag.IntVarP(&uid, "user", "u", uid, "user id")
flag.StringVarP(&dir, "output-dir", "o", dir, "where to store ads") flag.StringVarP(&dir, "output-dir", "o", dir, "where to store ads")
flag.StringVarP(&configfile, "config", "c", flag.StringVarP(&configfile, "config", "c", configfile, "config file")
os.Getenv("HOME")+"/.kleingebaeck", "config file")
flag.Parse() flag.Parse()
@@ -58,39 +91,99 @@ func Main() int {
return 0 return 0
} }
/*
if showhelp { if showhelp {
fmt.Println(Usage) fmt.Println(Usage)
return 0 return 0
} }
if enabledebug {
calc.ToggleDebug()
}
if showmanual { if showmanual {
man() err := man()
if err != nil {
return Die(err)
}
return 0 return 0
} }
*/ conf, err := ParseConfigfile(configfile)
if err != nil {
return Die(err)
}
if _, err := os.Stat(dir); errors.Is(err, os.ErrNotExist) { if enableverbose || conf.Verbose {
err := os.Mkdir(dir, os.ModePerm) logLevel.Set(slog.LevelInfo)
}
if enabledebug {
// we're using a more verbose logger in debug mode
buildInfo, _ := debug.ReadBuildInfo()
opts := &tint.Options{
Level: logLevel,
AddSource: true,
}
logLevel.Set(slog.LevelDebug)
var handler slog.Handler = tint.NewHandler(os.Stdout, opts)
debuglogger := slog.New(handler).With(
slog.Group("program_info",
slog.Int("pid", os.Getpid()),
slog.String("go_version", buildInfo.GoVersion),
),
)
slog.SetDefault(debuglogger)
}
slog.Debug("config", "conf", conf)
if len(dir) == 0 {
if len(conf.Outdir) > 0 {
dir = conf.Outdir
} else {
dir = Defaultdir
}
}
// prepare output dir
err = Mkdir(dir)
if err != nil {
return Die(err)
}
// which template to use
template := DefaultTemplate
if len(conf.Template) > 0 {
template = conf.Template
}
// directly backup ad listing[s]
if len(flag.Args()) >= 1 {
for _, uri := range flag.Args() {
err := Scrape(uri, dir, template)
if err != nil { if err != nil {
return Die(err) return Die(err)
} }
} }
if len(flag.Args()) == 1 { return 0
Start(flag.Args()[0], dir) }
// backup all ads of the given user (via config or cmdline)
if uid == 0 && conf.User > 0 {
uid = conf.User
}
if uid > 0 {
err := Start(fmt.Sprintf("%d", uid), dir, template)
if err != nil {
return Die(err)
}
} else {
return Die(errors.New("invalid or no user id specified"))
} }
return 0 return 0
} }
func Die(err error) int { func Die(err error) int {
fmt.Println(err) slog.Error("Failure", "error", err.Error())
return 1 return 1
} }

View File

@@ -46,7 +46,7 @@ for D in $DIST; do
GOOS=${os} GOARCH=${arch} go build -tags osusergo,netgo -ldflags "-extldflags=-static" -o ${binfile} GOOS=${os} GOARCH=${arch} go build -tags osusergo,netgo -ldflags "-extldflags=-static" -o ${binfile}
mkdir -p ${tardir} mkdir -p ${tardir}
cp ${binfile} README.md LICENSE ${tardir}/ cp ${binfile} README.md LICENSE ${tardir}/
echo 'tool = rpn echo 'tool = kleingebaeck
PREFIX = /usr/local PREFIX = /usr/local
UID = root UID = root
GID = 0 GID = 0

121
scrape.go
View File

@@ -21,10 +21,10 @@ import (
"errors" "errors"
"fmt" "fmt"
"io" "io"
"os" "log/slog"
"strings"
"net/http" "net/http"
"strings"
"sync"
"astuart.co/goq" "astuart.co/goq"
) )
@@ -33,6 +33,29 @@ type Index struct {
Links []string `goquery:".text-module-begin a,[href]"` Links []string `goquery:".text-module-begin a,[href]"`
} }
type Ad struct {
Title string `goquery:"h1"`
Slug string
Id string
Condition string
Category string
Price string `goquery:"h2#viewad-price"`
Created string `goquery:"#viewad-extra-info,text"`
Text string `goquery:"p#viewad-description-text,html"`
Images []string `goquery:".galleryimage-element img,[src]"`
Meta []string `goquery:".addetailslist--detail--value,text"`
}
func (ad *Ad) LogValue() slog.Value {
return slog.GroupValue(
slog.String("title", ad.Title),
slog.String("price", ad.Price),
slog.String("id", ad.Id),
slog.Int("imagecount", len(ad.Images)),
slog.Int("bodysize", len(ad.Text)),
)
}
// fetch some web page content // fetch some web page content
func Get(uri string, client *http.Client) (io.ReadCloser, error) { func Get(uri string, client *http.Client) (io.ReadCloser, error) {
req, err := http.NewRequest("GET", uri, nil) req, err := http.NewRequest("GET", uri, nil)
@@ -42,28 +65,32 @@ func Get(uri string, client *http.Client) (io.ReadCloser, error) {
req.Header.Set("User-Agent", Useragent) req.Header.Set("User-Agent", Useragent)
// fmt.Println(uri)
res, err := client.Do(req) res, err := client.Do(req)
if err != nil { if err != nil {
return nil, err return nil, err
} }
slog.Debug("response", "code", res.StatusCode, "status",
res.Status, "size", res.ContentLength)
return res.Body, nil return res.Body, nil
} }
// extract links from all ad listing pages (that is: use pagination) // extract links from all ad listing pages (that is: use pagination)
// and scrape every page // and scrape every page
func Start(uid string, dir string) error { func Start(uid string, dir string, template string) error {
client := &http.Client{} client := &http.Client{}
ads := []string{} adlinks := []string{}
baseuri := Baseuri + Listuri + "?userId=" + uid baseuri := Baseuri + Listuri + "?userId=" + uid
page := 1 page := 1
uri := baseuri uri := baseuri
slog.Info("fetching ad pages", "user", uid)
for { for {
var index Index var index Index
slog.Debug("fetching page", "uri", uri)
body, err := Get(uri, client) body, err := Get(uri, client)
if err != nil { if err != nil {
return err return err
@@ -79,70 +106,95 @@ func Start(uid string, dir string) error {
break break
} }
slog.Debug("extracted ad links", "count", len(index.Links))
for _, href := range index.Links { for _, href := range index.Links {
ads = append(ads, href) adlinks = append(adlinks, href)
fmt.Println(href) slog.Debug("ad link", "href", href)
} }
page++ page++
uri = baseuri + "&pageNum=" + fmt.Sprintf("%d", page) uri = baseuri + "&pageNum=" + fmt.Sprintf("%d", page)
} }
for _, ad := range ads { for _, adlink := range adlinks {
err := Scrape(ad, dir) err := Scrape(Baseuri+adlink, dir, template)
if err != nil { if err != nil {
return err return err
} }
return nil
} }
return nil return nil
} }
type Ad struct { // scrape an ad. uri is the full uri of the ad, dir is the basedir
Title string `goquery:"h1"` func Scrape(uri string, dir string, template string) error {
Text string `goquery:"p#viewad-description-text,html"`
Images []string `goquery:".galleryimage-element img,[src]"`
Price string `goquery:"h2#viewad-price"`
}
func Scrape(link string, dir string) error {
client := &http.Client{} client := &http.Client{}
uri := Baseuri + link ad := &Ad{}
slurp := strings.Split(uri, "/")[1]
var ad Ad // extract slug and id from uri
uriparts := strings.Split(uri, "/")
if len(uriparts) < 6 {
return errors.New("invalid uri")
}
ad.Slug = uriparts[4]
ad.Id = uriparts[5]
// get the ad
slog.Debug("fetching ad page", "uri", uri)
body, err := Get(uri, client) body, err := Get(uri, client)
if err != nil { if err != nil {
return err return err
} }
defer body.Close() defer body.Close()
// extract ad contents with goquery/goq
err = goq.NewDecoder(body).Decode(&ad) err = goq.NewDecoder(body).Decode(&ad)
if err != nil { if err != nil {
return err return err
} }
if len(ad.Meta) == 2 {
ad.Category = ad.Meta[0]
ad.Condition = ad.Meta[1]
}
slog.Debug("extracted ad listing", "ad", ad)
f, err := os.Create(strings.Join([]string{dir, slurp, "Anzeige.txt"}, "/")) // write listing
err = WriteAd(dir, ad, template)
if err != nil { if err != nil {
return err return err
} }
ad.Text = strings.ReplaceAll(ad.Text, "<br/>", "\n") return ScrapeImages(dir, ad)
_, err = fmt.Fprintf(f, "Title: %s\nPrice: %s\n\n%s", ad.Title, ad.Price, ad.Text)
if err != nil {
return err
} }
func ScrapeImages(dir string, ad *Ad) error {
// fetch images
img := 1 img := 1
var wg sync.WaitGroup
wg.Add(len(ad.Images))
failure := make(chan string)
for _, imguri := range ad.Images { for _, imguri := range ad.Images {
file := fmt.Sprintf("%s/%d.jpg", dir, img) file := fmt.Sprintf("%s/%d.jpg", dir, img)
go func() {
defer wg.Done()
err := Getimage(imguri, file) err := Getimage(imguri, file)
if err != nil { if err != nil {
return err failure <- err.Error()
return
}
slog.Info("wrote ad image", "image", file)
}()
img++
} }
img++ close(failure)
wg.Wait()
goterr := <-failure
if goterr != "" {
return errors.New(goterr)
} }
return nil return nil
@@ -150,6 +202,7 @@ func Scrape(link string, dir string) error {
// fetch an image // fetch an image
func Getimage(uri, fileName string) error { func Getimage(uri, fileName string) error {
slog.Debug("fetching ad image", "uri", uri)
response, err := http.Get(uri) response, err := http.Get(uri)
if err != nil { if err != nil {
return err return err
@@ -160,13 +213,7 @@ func Getimage(uri, fileName string) error {
return errors.New("received non 200 response code") return errors.New("received non 200 response code")
} }
file, err := os.Create(fileName) err = WriteImage(fileName, response.Body)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(file, response.Body)
if err != nil { if err != nil {
return err return err
} }

72
store.go Normal file
View File

@@ -0,0 +1,72 @@
/*
Copyright © 2023 Thomas von Dein
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
package main
import (
"io"
"log/slog"
"os"
"strings"
tpl "text/template"
)
func WriteAd(dir string, ad *Ad, template string) error {
// prepare output dir
dir = dir + "/" + ad.Slug
err := Mkdir(dir)
if err != nil {
return err
}
// write ad file
listingfile := strings.Join([]string{dir, "Adlisting.txt"}, "/")
f, err := os.Create(listingfile)
if err != nil {
return err
}
ad.Text = strings.ReplaceAll(ad.Text, "<br/>", "\n")
tmpl, err := tpl.New("adlisting").Parse(template)
if err != nil {
return err
}
err = tmpl.Execute(f, ad)
if err != nil {
return err
}
slog.Info("wrote ad listing", "listingfile", listingfile)
return nil
}
func WriteImage(filename string, reader io.ReadCloser) error {
file, err := os.Create(filename)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(file, reader)
if err != nil {
return err
}
return nil
}

55
util.go Normal file
View File

@@ -0,0 +1,55 @@
/*
Copyright © 2023 Thomas von Dein
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
package main
import (
"bytes"
"errors"
"os"
"os/exec"
)
func Mkdir(dir string) error {
if _, err := os.Stat(dir); errors.Is(err, os.ErrNotExist) {
err := os.Mkdir(dir, os.ModePerm)
if err != nil {
return err
}
}
return nil
}
func man() error {
man := exec.Command("less", "-")
var b bytes.Buffer
b.Write([]byte(manpage))
man.Stdout = os.Stdout
man.Stdin = &b
man.Stderr = os.Stderr
err := man.Run()
if err != nil {
return err
}
return nil
}