Compare commits

..

11 Commits

Author SHA1 Message Date
T.v.Dein
2c62f9eb17 fix invalid mod load (#14)
Co-authored-by: Thomas von Dein <tom@vondein.org>
2023-12-19 18:27:20 +01:00
T.v.Dein
bff0ae553e Bugfixes (#13)
* several fixes:

- fix #9 + #10: switched to koanf module and dropped support for HCL
- fix #11: disabling colors on windows
- fix #12: fixed race condition in go routine call inside for loop,
  images had been downloaded multiple times
- remove hcl support and use toml format (same thing, better parser)
- update documentation and example config on TOML format of config file
- use Config as arg instead of singular args
- use x/errgroup instead of sync.Waitgroup inside image download loop

---------

Co-authored-by: Thomas von Dein <tom@vondein.org>
2023-12-19 18:23:41 +01:00
T.v.Dein
450d44d129 Dev (#8)
* fixed conf parsing: variables can now be omitted from the config
* fix newlines: use CRLF on windows
* bump version

---------

Co-authored-by: Thomas von Dein <tom@vondein.org>
2023-12-18 20:18:37 +01:00
T.v.Dein
18f7e0fe49 added proper install instructions (#7)
Co-authored-by: Thomas von Dein <tom@vondein.org>
2023-12-18 09:48:00 +01:00
T.v.Dein
def063afe9 Merge pull request #6 from TLINDEN/dev 2023-12-18 09:23:55 +01:00
f1908f02cb bump version 2023-12-18 09:23:18 +01:00
4a528ad9d1 fix #5: add exe extension to built windows binaries 2023-12-18 09:22:08 +01:00
5c1161f227 fix #4, use filepath.Join to create portable path's 2023-12-18 09:21:26 +01:00
bd9d8fdb2c fix version finding 2023-12-17 17:53:01 +01:00
T.v.Dein
1ee886c504 Merge pull request #2 from TLINDEN/dev
re-orgainzied code a little, using go templates instead format string
2023-12-17 17:49:27 +01:00
f932d7be83 re-orgainzied code a little, using go templates instead format string 2023-12-17 17:32:05 +01:00
14 changed files with 554 additions and 251 deletions

View File

@@ -17,7 +17,7 @@
# #
# no need to modify anything below # no need to modify anything below
tool = kleingebaeck tool = kleingebaeck
VERSION = $(shell grep VERSION main.go | head -1 | cut -d '"' -f2) VERSION = $(shell grep VERSION config.go | head -1 | cut -d '"' -f2)
archs = darwin freebsd linux windows archs = darwin freebsd linux windows
PREFIX = /usr/local PREFIX = /usr/local
UID = root UID = root
@@ -86,3 +86,6 @@ show-versions: buildlocal
@echo @echo
@echo "### go version used for building:" @echo "### go version used for building:"
@grep -m 1 go go.mod @grep -m 1 go go.mod
lint:
golangci-lint run -p bugs -p unused

View File

@@ -15,25 +15,65 @@ directory, each ad into its own subdirectory. The backup will contain
a textfile `Adlisting.txt` which contains the ad contents as the a textfile `Adlisting.txt` which contains the ad contents as the
title, body, price etc. All images will be downloaded as well. title, body, price etc. All images will be downloaded as well.
## Installation
The tool doesn't need authentication and doesn't have any The tool doesn't need authentication and doesn't have any
dependencies. Just download the binary for your platform from the dependencies. Just download the binary for your platform from the
releases page and you're good to go. releases page and you're good to go.
The releases also include a handy tarball which you can use to install ### Installation using a pre-compiled binary
the tool system-wide including the manual page. Just extract it and
type: `make install`. Go to the [latest release
page](https://github.com/TLINDEN/kleingebaeck/releases/latest) and
look for your OS and platform. There are two options to install the binary:
1. Directly download the binary for your platoform,
e.g. `kleingebaeck-linux-amd64-0.0.5`, rename it to `kleingebaeck`
(or whatever you like more!) and put it into your bin dir
(e.g. `$HOME/bin` or as root to `/usr/local/bin`).
Be sure to verify the signature of the binary file. For this also download the matching `kleingebaeck-linux-amd64-0.0.5.sha256` file and:
```shell
cat kleingebaeck-linux-amd64-0.0.5.sha25 && sha256sum kleingebaeck-linux-amd64-0.0.5
```
You should see the same SHA256 hash.
2. You may also download a binary tarball for your platform,
e.g. `kleingebaeck-linux-amd64-0.0.5.tar.gz`, unpack and install
it. GNU Make is required for this:
```shell
tar xvfz kleingebaeck-linux-amd64-0.0.5.tar.gz
cd kleingebaeck-linux-amd64-0.0.5
sudo make install
```
### Installation from source
You will need the Golang toolchain in order to build from source. GNU
Make will also help but is not strictly neccessary.
If you want to compile the tool yourself, use `git clone` to clone the
repository. Then execute `go mod tidy` to install all
dependencies. Then just enter `go build` or - if you have GNU Make
installed - `make`.
To install after building either copy the binary or execute `sudo make install`.
## Commandline options: ## Commandline options:
``` ```
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...] Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options: Options:
--user,-u <uid> Backup ads from user with uid <uid>. --user -u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output. --debug -d Enable debug output.
--verbose,-v Enable verbose output. --verbose -v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory) --outdir -o <dir> Set output dir (default: current directory)
--manual,-m Show manual. --limit -l <num> Limit the ads to download to <num>, default: load all.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck). --config -c <file> Use config file <file> (default: ~/.kleingebaeck).
--manual -m Show manual.
--help -h Show usage.
If one or more <ad-listing-url>'s are specified, only backup those, If one or more <ad-listing-url>'s are specified, only backup those,
otherwise backup all ads of the given user. otherwise backup all ads of the given user.
@@ -42,16 +82,15 @@ otherwise backup all ads of the given user.
## Configfile ## Configfile
You can create a config file to save typing. By default You can create a config file to save typing. By default
`~/.kleingebaeck.hcl` is being used but you can specify one with `~/.kleingebaeck` is being used but you can specify one with
`-c` as well. `-c` as well.
Format is simple: Format is simple:
``` ```
user = 1010101 user = 1010101
verbose = true loglevel = verbose
outdir = "test" outdir = "test"
template = ""
``` ```
## Usage ## Usage
@@ -67,6 +106,27 @@ The `XXXXX` part is your userid.
Put it into the configfile as outlined above. Also specify an output Put it into the configfile as outlined above. Also specify an output
directory. Then just execute `kleingebaeck`. directory. Then just execute `kleingebaeck`.
Inside the output directory you'll find a new subdirectory for each
ad. Every directory contains a file `Adlisting.txt`, which will look
somewhat like this:
```default
Title: A book I sell
Price: 99 € VB
Id: 1919191919
Category: Sachbücher
Condition: Sehr Gut
Created: 10.12.2023
This is the description text.
Pay with paypal.
```
You can change the formatting using the `template` config
variable. The supplied sample config contains the default template.
All images will be stored in the same directory.
## Kleingebäck? ## Kleingebäck?

167
config.go
View File

@@ -17,37 +17,162 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
package main package main
import ( import (
"errors"
"fmt"
"os" "os"
"path/filepath"
"runtime"
"github.com/hashicorp/hcl/v2/hclsimple" "github.com/knadh/koanf/parsers/toml"
"github.com/knadh/koanf/providers/confmap"
"github.com/knadh/koanf/providers/file"
"github.com/knadh/koanf/providers/posflag"
"github.com/knadh/koanf/v2"
flag "github.com/spf13/pflag"
) )
const (
VERSION string = "0.1.0"
Baseuri string = "https://www.kleinanzeigen.de"
Listuri string = "/s-bestandsliste.html"
Defaultdir string = "."
DefaultTemplate string = "Title: {{.Title}}\nPrice: {{.Price}}\nId: {{.Id}}\n" +
"Category: {{.Category}}\nCondition: {{.Condition}}\nCreated: {{.Created}}\n\n{{.Text}}\n"
DefaultTemplateWin string = "Title: {{.Title}}\r\nPrice: {{.Price}}\r\nId: {{.Id}}\r\n" +
"Category: {{.Category}}\r\nCondition: {{.Condition}}\r\nCreated: {{.Created}}\r\n\r\n{{.Text}}\r\n"
Useragent string = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
const Usage string = `This is kleingebaeck, the kleinanzeigen.de backup tool.
Usage: kleingebaeck [-dvVhmoclu] [<ad-listing-url>,...]
Options:
--user -u <uid> Backup ads from user with uid <uid>.
--debug -d Enable debug output.
--verbose -v Enable verbose output.
--outdir -o <dir> Set output dir (default: current directory)
--limit -l <num> Limit the ads to download to <num>, default: load all.
--config -c <file> Use config file <file> (default: ~/.kleingebaeck).
--manual -m Show manual.
--help -h Show usage.
--version -V Show program version.
If one or more ad listing url's are specified, only backup those,
otherwise backup all ads of the given user.`
type Config struct { type Config struct {
Verbose bool `hcl:"verbose"` Verbose bool `koanf:"verbose"` // loglevel=info
User int `hcl:"user"` Debug bool `koanf:"debug"` // loglevel=debug
Outdir string `hcl:"outdir"` Showversion bool `koanf:"version"` // -v
Template string `hcl:"template"` Showhelp bool `koanf:"help"` // -h
Showmanual bool `koanf:"manual"` // -m
User int `koanf:"user"`
Outdir string `koanf:"outdir"`
Template string `koanf:"template"`
Loglevel string `koanf:"loglevel"`
Limit int `koanf:"limit"`
Adlinks []string
StatsCountAds int
StatsCountImages int
} }
func ParseConfigfile(file string) (*Config, error) { func (c *Config) IncrAds() {
c := Config{} c.StatsCountAds++
if path, err := os.Stat(file); !os.IsNotExist(err) { }
func (c *Config) IncrImgs(num int) {
c.StatsCountImages += num
}
// load commandline flags and config file
func InitConfig() (*Config, error) {
var k = koanf.New(".")
// determine template based on os
template := DefaultTemplate
if runtime.GOOS == "windows" {
template = DefaultTemplateWin
}
// Load default values using the confmap provider.
k.Load(confmap.Provider(map[string]interface{}{
"template": template,
"outdir": ".",
"loglevel": "notice",
"userid": 0,
}, "."), nil)
// setup custom usage
f := flag.NewFlagSet("config", flag.ContinueOnError)
f.Usage = func() {
fmt.Println(Usage)
os.Exit(0)
}
// parse commandline flags
f.StringP("config", "c", "", "config file")
f.StringP("outdir", "o", "", "directory where to store ads")
f.IntP("user", "u", 0, "user id")
f.IntP("limit", "l", 0, "limit ads to be downloaded (default 0, unlimited)")
f.BoolP("verbose", "v", false, "be verbose")
f.BoolP("debug", "d", false, "enable debug log")
f.BoolP("version", "V", false, "show program version")
f.BoolP("help", "h", false, "show usage")
f.BoolP("manual", "m", false, "show manual")
f.Parse(os.Args[1:])
// generate a list of config files to try to load, including the
// one provided via -c, if any
var configfiles []string
configfile, _ := f.GetString("config")
home, _ := os.UserHomeDir()
if configfile != "" {
configfiles = []string{configfile}
} else {
configfiles = []string{
"/etc/kleingebaeck.conf", "/usr/local/etc/kleingebaeck.conf", // unix variants
filepath.Join(home, ".config", "kleingebaeck", "config"),
filepath.Join(home, ".kleingebaeck"),
"kleingebaeck.conf",
}
}
// Load the config file[s]
for _, cfgfile := range configfiles {
if path, err := os.Stat(cfgfile); !os.IsNotExist(err) {
if !path.IsDir() { if !path.IsDir() {
configstring, err := os.ReadFile(file) if err := k.Load(file.Provider(cfgfile), toml.Parser()); err != nil {
if err != nil { return nil, errors.New("error loading config file: " + err.Error())
return nil, err }
}
}
// else: we ignore the file if it doesn't exists
} }
err = hclsimple.Decode( // command line overrides config file
path.Name(), configstring, if err := k.Load(posflag.Provider(f, ".", k), nil); err != nil {
nil, &c, return nil, errors.New("error loading flags: " + err.Error())
)
if err != nil {
return nil, err
}
}
} }
return &c, nil // fetch values
conf := &Config{}
if err := k.Unmarshal("", &conf); err != nil {
return nil, errors.New("error unmarshalling: " + err.Error())
}
// adjust loglevel
switch conf.Loglevel {
case "verbose":
conf.Verbose = true
case "debug":
conf.Debug = true
}
// are there any args left on commandline? if so threat them as adlinks
conf.Adlinks = f.Args()
return conf, nil
} }

View File

@@ -1,6 +1,6 @@
# #
# kleingebaeck sample configuration file. # kleingebaeck sample configuration file.
# put this to ~/.kleingebaeck.hcl. # put this to ~/.kleingebaeck.
# #
# Comments start with the '#' character. # Comments start with the '#' character.
@@ -8,13 +8,23 @@
user = 00000000 user = 00000000
# enable verbose output (same as -v), may be true or false. # enable verbose output (same as -v), may be true or false.
verbose = true # other values: notice or debug
loglevel = "verbose"
# directory where to store downloaded ads. kleingebaeck will try to # directory where to store downloaded ads. kleingebaeck will try to
# create it. must be a quoted string. # create it. must be a quoted string.
outdir = "test" outdir = "test"
# template. leave empty to use the default one, which is: # template for stored adlistings. To enable it, remove the comment
# Title: %s\nPrice: %s\nId: %s\nCategory: %s\nCondition: %s\nCreated: %s\nBody:\n\n%s\n # chars up until the last #"""
# take care to include exactly 7 times '%s'! #template="""
template = "" #Title: {{.Title}}
#Price: {{.Price}}
#Id: {{.Id}}
#Category: {{.Category}}
#Condition: {{.Condition}}
#Created: {{.Created}}
#{{.Text}}
# """

31
go.mod
View File

@@ -3,19 +3,28 @@ module kleingebaeck
go 1.21 go 1.21
require ( require (
astuart.co/goq v1.0.0 // indirect astuart.co/goq v1.0.0
github.com/knadh/koanf/parsers/toml v0.1.0
github.com/knadh/koanf/providers/confmap v0.1.0
github.com/knadh/koanf/providers/file v0.1.0
github.com/knadh/koanf/providers/posflag v0.1.0
github.com/knadh/koanf/v2 v2.0.1
github.com/lmittmann/tint v1.0.3
github.com/mattn/go-isatty v0.0.20
github.com/spf13/pflag v1.0.5
)
require (
github.com/PuerkitoBio/goquery v1.5.0 // indirect github.com/PuerkitoBio/goquery v1.5.0 // indirect
github.com/agext/levenshtein v1.2.1 // indirect
github.com/andybalholm/cascadia v1.0.0 // indirect github.com/andybalholm/cascadia v1.0.0 // indirect
github.com/apparentlymart/go-textseg/v13 v13.0.0 // indirect github.com/fsnotify/fsnotify v1.6.0 // indirect
github.com/apparentlymart/go-textseg/v15 v15.0.0 // indirect github.com/knadh/koanf/maps v0.1.1 // indirect
github.com/google/go-cmp v0.3.1 // indirect github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/hashicorp/hcl/v2 v2.19.1 // indirect github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/lmittmann/tint v1.0.3 // indirect github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/mitchellh/go-wordwrap v0.0.0-20150314170334-ad45545899c7 // indirect github.com/pelletier/go-toml v1.9.5 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/zclconf/go-cty v1.13.0 // indirect
golang.org/x/net v0.0.0-20190606173856-1492cefac77f // indirect golang.org/x/net v0.0.0-20190606173856-1492cefac77f // indirect
golang.org/x/text v0.11.0 // indirect golang.org/x/sync v0.5.0 // indirect
golang.org/x/sys v0.6.0 // indirect
) )

52
go.sum
View File

@@ -2,36 +2,56 @@ astuart.co/goq v1.0.0 h1:nnYIhu/Z/j0VaX9Dp+pmh2Uh7ldEz6XfgSg+bAY5Yrw=
astuart.co/goq v1.0.0/go.mod h1:+fokcnFrO8Pw2fj8drdStJvzoMFebJH69rw8IC21rno= astuart.co/goq v1.0.0/go.mod h1:+fokcnFrO8Pw2fj8drdStJvzoMFebJH69rw8IC21rno=
github.com/PuerkitoBio/goquery v1.5.0 h1:uGvmFXOA73IKluu/F84Xd1tt/z07GYm8X49XKHP7EJk= github.com/PuerkitoBio/goquery v1.5.0 h1:uGvmFXOA73IKluu/F84Xd1tt/z07GYm8X49XKHP7EJk=
github.com/PuerkitoBio/goquery v1.5.0/go.mod h1:qD2PgZ9lccMbQlc7eEOjaeRlFQON7xY8kdmcsrnKqMg= github.com/PuerkitoBio/goquery v1.5.0/go.mod h1:qD2PgZ9lccMbQlc7eEOjaeRlFQON7xY8kdmcsrnKqMg=
github.com/agext/levenshtein v1.2.1 h1:QmvMAjj2aEICytGiWzmxoE0x2KZvE0fvmqMOfy2tjT8=
github.com/agext/levenshtein v1.2.1/go.mod h1:JEDfjyjHDjOF/1e4FlBE/PkbqA9OfWu2ki2W0IB5558=
github.com/andybalholm/cascadia v1.0.0 h1:hOCXnnZ5A+3eVDX8pvgl4kofXv2ELss0bKcqRySc45o= github.com/andybalholm/cascadia v1.0.0 h1:hOCXnnZ5A+3eVDX8pvgl4kofXv2ELss0bKcqRySc45o=
github.com/andybalholm/cascadia v1.0.0/go.mod h1:GsXiBklL0woXo1j/WYWtSYYC4ouU9PqHO0sqidkEA4Y= github.com/andybalholm/cascadia v1.0.0/go.mod h1:GsXiBklL0woXo1j/WYWtSYYC4ouU9PqHO0sqidkEA4Y=
github.com/apparentlymart/go-textseg/v13 v13.0.0 h1:Y+KvPE1NYz0xl601PVImeQfFyEy6iT90AvPUL1NNfNw=
github.com/apparentlymart/go-textseg/v13 v13.0.0/go.mod h1:ZK2fH7c4NqDTLtiYLvIkEghdlcqw7yxLeM89kiTRPUo=
github.com/apparentlymart/go-textseg/v15 v15.0.0 h1:uYvfpb3DyLSCGWnctWKGj857c6ew1u1fNQOlOtuGxQY=
github.com/apparentlymart/go-textseg/v15 v15.0.0/go.mod h1:K8XmNZdhEBkdlyDdvbmmsvpAG721bKi0joRfFdHIWJ4=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/google/go-cmp v0.3.1 h1:Xye71clBPdm5HgqGwUkwhbynsUJZhDbS20FvLhQ2izg= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/hashicorp/hcl/v2 v2.19.1 h1://i05Jqznmb2EXqa39Nsvyan2o5XyMowW5fnCKW5RPI= github.com/fsnotify/fsnotify v1.6.0 h1:n+5WquG0fcWoWp6xPWfHdbskMCQaFnG6PfBrh1Ky4HY=
github.com/hashicorp/hcl/v2 v2.19.1/go.mod h1:ThLC89FV4p9MPW804KVbe/cEXoQ8NZEh+JtMeeGErHE= github.com/fsnotify/fsnotify v1.6.0/go.mod h1:sl3t1tCWJFWoRz9R8WJCbQihKKwmorjAbSClcnxKAGw=
github.com/knadh/koanf/maps v0.1.1 h1:G5TjmUh2D7G2YWf5SQQqSiHRJEjaicvU0KpypqB3NIs=
github.com/knadh/koanf/maps v0.1.1/go.mod h1:npD/QZY3V6ghQDdcQzl1W4ICNVTkohC8E73eI2xW4yI=
github.com/knadh/koanf/parsers/toml v0.1.0 h1:S2hLqS4TgWZYj4/7mI5m1CQQcWurxUz6ODgOub/6LCI=
github.com/knadh/koanf/parsers/toml v0.1.0/go.mod h1:yUprhq6eo3GbyVXFFMdbfZSo928ksS+uo0FFqNMnO18=
github.com/knadh/koanf/providers/confmap v0.1.0 h1:gOkxhHkemwG4LezxxN8DMOFopOPghxRVp7JbIvdvqzU=
github.com/knadh/koanf/providers/confmap v0.1.0/go.mod h1:2uLhxQzJnyHKfxG927awZC7+fyHFdQkd697K4MdLnIU=
github.com/knadh/koanf/providers/file v0.1.0 h1:fs6U7nrV58d3CFAFh8VTde8TM262ObYf3ODrc//Lp+c=
github.com/knadh/koanf/providers/file v0.1.0/go.mod h1:rjJ/nHQl64iYCtAW2QQnF0eSmDEX/YZ/eNFj5yR6BvA=
github.com/knadh/koanf/providers/posflag v0.1.0 h1:mKJlLrKPcAP7Ootf4pBZWJ6J+4wHYujwipe7Ie3qW6U=
github.com/knadh/koanf/providers/posflag v0.1.0/go.mod h1:SYg03v/t8ISBNrMBRMlojH8OsKowbkXV7giIbBVgbz0=
github.com/knadh/koanf/v2 v2.0.1 h1:1dYGITt1I23x8cfx8ZnldtezdyaZtfAuRtIFOiRzK7g=
github.com/knadh/koanf/v2 v2.0.1/go.mod h1:ZeiIlIDXTE7w1lMT6UVcNiRAS2/rCeLn/GdLNvY1Dus=
github.com/lmittmann/tint v1.0.3 h1:W5PHeA2D8bBJVvabNfQD/XW9HPLZK1XoPZH0cq8NouQ= github.com/lmittmann/tint v1.0.3 h1:W5PHeA2D8bBJVvabNfQD/XW9HPLZK1XoPZH0cq8NouQ=
github.com/lmittmann/tint v1.0.3/go.mod h1:HIS3gSy7qNwGCj+5oRjAutErFBl4BzdQP6cJZ0NfMwE= github.com/lmittmann/tint v1.0.3/go.mod h1:HIS3gSy7qNwGCj+5oRjAutErFBl4BzdQP6cJZ0NfMwE=
github.com/mitchellh/go-wordwrap v0.0.0-20150314170334-ad45545899c7 h1:DpOJ2HYzCv8LZP15IdmG+YdwD2luVPHITV96TkirNBM= github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mitchellh/go-wordwrap v0.0.0-20150314170334-ad45545899c7/go.mod h1:ZXFpozHsX6DPmq2I0TCekCxypsnAUbP2oI0UX1GXzOo= github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw=
github.com/mitchellh/copystructure v1.2.0/go.mod h1:qLl+cE2AmVv+CoeAwDPye/v+N2HKCj9FbZEVFJRxO9s=
github.com/mitchellh/mapstructure v1.5.0 h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY=
github.com/mitchellh/mapstructure v1.5.0/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RRV2QTWOzhPopBRo=
github.com/mitchellh/reflectwalk v1.0.2 h1:G2LzWKi524PWgd3mLHV8Y5k7s6XUvT0Gef6zxSIeXaQ=
github.com/mitchellh/reflectwalk v1.0.2/go.mod h1:mSTlrgnPZtwu0c4WaC2kGObEpuNDbx0jmZXqmk4esnw=
github.com/pelletier/go-toml v1.9.5 h1:4yBQzkHv+7BHq2PQUZF3Mx0IYxG7LsP222s7Agd3ve8=
github.com/pelletier/go-toml v1.9.5/go.mod h1:u1nR/EPcESfeI/szUZKdtJ0xRNbUoANCkoOuaOx1Y+c=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/zclconf/go-cty v1.13.0 h1:It5dfKTTZHe9aeppbNOda3mN7Ag7sg6QkBNm6TkyFa0= github.com/stretchr/testify v1.8.1 h1:w7B6lhMri9wdJUVmEZPGGhZzrYTPvgJArz7wNPgYKsk=
github.com/zclconf/go-cty v1.13.0/go.mod h1:YKQzy/7pZ7iq2jNFzy5go57xdxdWoLLpaEp4u238AE0= github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190606173856-1492cefac77f h1:IWHgpgFqnL5AhBUBZSgBdjl2vkQUEzcY+JNKWfcgAU0= golang.org/x/net v0.0.0-20190606173856-1492cefac77f h1:IWHgpgFqnL5AhBUBZSgBdjl2vkQUEzcY+JNKWfcgAU0=
golang.org/x/net v0.0.0-20190606173856-1492cefac77f/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks= golang.org/x/net v0.0.0-20190606173856-1492cefac77f/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks=
golang.org/x/sync v0.5.0 h1:60k92dhOjHxJkrqnwsfl8KuaHbn/5dl0lUPUklKo3qE=
golang.org/x/sync v0.5.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20220908164124-27713097b956/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0 h1:MVltZSvRTcU2ljQOhs94SXPftV6DCNnZViHeQps87pQ=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.11.0 h1:LAntKIrcmeSKERyiOh0XMV39LXS8IE9UL2yP7+f5ij4= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
golang.org/x/text v0.11.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View File

@@ -133,7 +133,7 @@
.\" ======================================================================== .\" ========================================================================
.\" .\"
.IX Title "KLEINGEBAECK 1" .IX Title "KLEINGEBAECK 1"
.TH KLEINGEBAECK 1 "2023-12-16" "1" "User Commands" .TH KLEINGEBAECK 1 "2023-12-19" "1" "User Commands"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents. .\" way too many mistakes in technical documents.
.if n .ad l .if n .ad l
@@ -142,16 +142,17 @@
kleingebaeck \- kleinanzeigen.de backup tool kleingebaeck \- kleinanzeigen.de backup tool
.SH "SYNOPSYS" .SH "SYNOPSYS"
.IX Header "SYNOPSYS" .IX Header "SYNOPSYS"
.Vb 9 .Vb 10
\& This is kleingebaeck, the kleinanzeigen.de backup tool.
\& Usage: kleingebaeck [\-dvVhmoc] [<ad\-listing\-url>,...] \& Usage: kleingebaeck [\-dvVhmoc] [<ad\-listing\-url>,...]
\& Options: \& Options:
\& \-\-user,\-u <uid> Backup ads from user with uid <uid>. \& \-\-user \-u <uid> Backup ads from user with uid <uid>.
\& \-\-debug, \-d Enable debug output. \& \-\-debug \-d Enable debug output.
\& \-\-verbose,\-v Enable verbose output. \& \-\-verbose \-v Enable verbose output.
\& \-\-output\-dir,\-o <dir> Set output dir (default: current directory) \& \-\-outdir \-o <dir> Set output dir (default: current directory)
\& \-\-manual,\-m Show manual. \& \-\-limit \-l <num> Limit the ads to download to <num>, default: load all.
\& \-\-config,\-c <file> Use config file <file> (default: ~/.kleingebaeck). \& \-\-config \-c <file> Use config file <file> (default: ~/.kleingebaeck).
\& \-\-manual \-m Show manual.
\& \-\-help \-h Show usage.
.Ve .Ve
.SH "DESCRIPTION" .SH "DESCRIPTION"
.IX Header "DESCRIPTION" .IX Header "DESCRIPTION"
@@ -164,25 +165,41 @@ title, body, price etc. All images will be downloaded as well.
.SH "CONFIGURATION" .SH "CONFIGURATION"
.IX Header "CONFIGURATION" .IX Header "CONFIGURATION"
You can create a config file to save typing. By default You can create a config file to save typing. By default
\&\f(CW\*(C`~/.kleingebaeck.hcl\*(C'\fR is being used but you can specify one with \&\f(CW\*(C`~/.kleingebaeck\*(C'\fR is being used but you can specify one with \f(CW\*(C`\-c\*(C'\fR as
\&\f(CW\*(C`\-c\*(C'\fR as well. well. We use \s-1TOML\s0 as our configuration language. See
<https://toml.io/en/>.
.PP .PP
Format is simple: Format is pretty simple:
.PP .PP
.Vb 4 .Vb 10
\& user = 1010101 \& user = 1010101
\& verbose = true \& loglevel = verbose
\& outdir = "test" \& outdir = "test"
\& template = "" \& template = """
\& Title: {{.Title}}
\& Price: {{.Price}}
\& Id: {{.Id}}
\& Category: {{.Category}}
\& Condition: {{.Condition}}
\& Created: {{.Created}}
\&
\& {{.Text}}
\& """
.Ve .Ve
.PP .PP
Be carefull if you want to change the template. The default one looks like this: Be carefull if you want to change the template. The variable is a
multiline string surrounded by three double quotes. You can left out
certain fields and use any formatting you like. Refer to
<https://pkg.go.dev/text/template> for details how to write a
template.
.PP
If you're on windows and want to customize the output directory, put
it into single quotes to avoid the backslashes interpreted as escape
chars like this:
.PP .PP
.Vb 1 .Vb 1
\& Title: %s\enPrice: %s\enId: %s\enCategory: %s\enCondition: %s\enCreated: %s\enBody:\en\en%s\en \& outdir = \*(AqC:\eData\eAds\*(Aq
.Ve .Ve
.PP
If you change it, include 7 times the '%s' format tag.
.SH "SETUP" .SH "SETUP"
.IX Header "SETUP" .IX Header "SETUP"
To setup the tool, you need to lookup your userid on To setup the tool, you need to lookup your userid on

View File

@@ -5,15 +5,16 @@ NAME
kleingebaeck - kleinanzeigen.de backup tool kleingebaeck - kleinanzeigen.de backup tool
SYNOPSYS SYNOPSYS
This is kleingebaeck, the kleinanzeigen.de backup tool.
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...] Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options: Options:
--user,-u <uid> Backup ads from user with uid <uid>. --user -u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output. --debug -d Enable debug output.
--verbose,-v Enable verbose output. --verbose -v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory) --outdir -o <dir> Set output dir (default: current directory)
--manual,-m Show manual. --limit -l <num> Limit the ads to download to <num>, default: load all.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck). --config -c <file> Use config file <file> (default: ~/.kleingebaeck).
--manual -m Show manual.
--help -h Show usage.
DESCRIPTION DESCRIPTION
This tool can be used to backup ads on the german ad page This tool can be used to backup ads on the german ad page
@@ -26,22 +27,36 @@ DESCRIPTION
CONFIGURATION CONFIGURATION
You can create a config file to save typing. By default You can create a config file to save typing. By default
"~/.kleingebaeck.hcl" is being used but you can specify one with "-c" as "~/.kleingebaeck" is being used but you can specify one with "-c" as
well. well. We use TOML as our configuration language. See
<https://toml.io/en/>.
Format is simple: Format is pretty simple:
user = 1010101 user = 1010101
verbose = true loglevel = verbose
outdir = "test" outdir = "test"
template = "" template = """
Title: {{.Title}}
Price: {{.Price}}
Id: {{.Id}}
Category: {{.Category}}
Condition: {{.Condition}}
Created: {{.Created}}
Be carefull if you want to change the template. The default one looks {{.Text}}
"""
Be carefull if you want to change the template. The variable is a
multiline string surrounded by three double quotes. You can left out
certain fields and use any formatting you like. Refer to
<https://pkg.go.dev/text/template> for details how to write a template.
If you're on windows and want to customize the output directory, put it
into single quotes to avoid the backslashes interpreted as escape chars
like this: like this:
Title: %s\nPrice: %s\nId: %s\nCategory: %s\nCondition: %s\nCreated: %s\nBody:\n\n%s\n outdir = 'C:\Data\Ads'
If you change it, include 7 times the '%s' format tag.
SETUP SETUP
To setup the tool, you need to lookup your userid on kleinanzeigen.de. To setup the tool, you need to lookup your userid on kleinanzeigen.de.

View File

@@ -4,15 +4,17 @@ kleingebaeck - kleinanzeigen.de backup tool
=head1 SYNOPSYS =head1 SYNOPSYS
This is kleingebaeck, the kleinanzeigen.de backup tool.
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...] Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options: Options:
--user,-u <uid> Backup ads from user with uid <uid>. --user -u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output. --debug -d Enable debug output.
--verbose,-v Enable verbose output. --verbose -v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory) --outdir -o <dir> Set output dir (default: current directory)
--manual,-m Show manual. --limit -l <num> Limit the ads to download to <num>, default: load all.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck). --config -c <file> Use config file <file> (default: ~/.kleingebaeck).
--manual -m Show manual.
--help -h Show usage.
--version -V Show program version.
=head1 DESCRIPTION =head1 DESCRIPTION
@@ -26,21 +28,37 @@ title, body, price etc. All images will be downloaded as well.
=head1 CONFIGURATION =head1 CONFIGURATION
You can create a config file to save typing. By default You can create a config file to save typing. By default
C<~/.kleingebaeck.hcl> is being used but you can specify one with C<~/.kleingebaeck> is being used but you can specify one with C<-c> as
C<-c> as well. well. We use TOML as our configuration language. See
L<https://toml.io/en/>.
Format is simple: Format is pretty simple:
user = 1010101 user = 1010101
verbose = true loglevel = verbose
outdir = "test" outdir = "test"
template = "" template = """
Title: {{.Title}}
Price: {{.Price}}
Id: {{.Id}}
Category: {{.Category}}
Condition: {{.Condition}}
Created: {{.Created}}
Be carefull if you want to change the template. The default one looks like this: {{.Text}}
"""
Title: %s\nPrice: %s\nId: %s\nCategory: %s\nCondition: %s\nCreated: %s\nBody:\n\n%s\n Be carefull if you want to change the template. The variable is a
multiline string surrounded by three double quotes. You can left out
certain fields and use any formatting you like. Refer to
L<https://pkg.go.dev/text/template> for details how to write a
template.
If you change it, include 7 times the '%s' format tag. If you're on windows and want to customize the output directory, put
it into single quotes to avoid the backslashes interpreted as escape
chars like this:
outdir = 'C:\Data\Ads'
=head1 SETUP =head1 SETUP

109
main.go
View File

@@ -25,30 +25,8 @@ import (
"runtime/debug" "runtime/debug"
"github.com/lmittmann/tint" "github.com/lmittmann/tint"
flag "github.com/spf13/pflag"
) )
const VERSION string = "0.0.3"
const Useragent string = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
const Baseuri string = "https://www.kleinanzeigen.de"
const Listuri string = "/s-bestandsliste.html"
const Defaultdir string = "."
const DefaultTemplate string = "Title: %s\nPrice: %s\nId: %s\nCategory: %s\nCondition: %s\nCreated: %s\nBody:\n\n%s\n"
const Usage string = `This is kleingebaeck, the kleinanzeigen.de backup tool.
Usage: kleingebaeck [-dvVhmoc] [<ad-listing-url>,...]
Options:
--user,-u <uid> Backup ads from user with uid <uid>.
--debug, -d Enable debug output.
--verbose,-v Enable verbose output.
--output-dir,-o <dir> Set output dir (default: current directory)
--manual,-m Show manual.
--config,-c <file> Use config file <file> (default: ~/.kleingebaeck).
If one or more <ad-listing-url>'s are specified, only backup those,
otherwise backup all ads of the given user.`
const LevelNotice = slog.Level(2) const LevelNotice = slog.Level(2)
func main() { func main() {
@@ -67,6 +45,7 @@ func Main() int {
} }
return a return a
}, },
NoColor: IsNoTty(),
} }
logLevel.Set(LevelNotice) logLevel.Set(LevelNotice)
@@ -74,37 +53,22 @@ func Main() int {
logger := slog.New(handler) logger := slog.New(handler)
slog.SetDefault(logger) slog.SetDefault(logger)
showversion := false conf, err := InitConfig()
showhelp := false if err != nil {
showmanual := false return Die(err)
enabledebug := false }
enableverbose := false
uid := 0
configfile := os.Getenv("HOME") + "/.kleingebaeck.hcl"
dir := ""
flag.BoolVarP(&enabledebug, "debug", "d", false, "debug mode") if conf.Showversion {
flag.BoolVarP(&enableverbose, "verbose", "v", false, "be verbose")
flag.BoolVarP(&showversion, "version", "V", false, "show version")
flag.BoolVarP(&showhelp, "help", "h", false, "show usage")
flag.BoolVarP(&showmanual, "manual", "m", false, "show manual")
flag.IntVarP(&uid, "user", "u", uid, "user id")
flag.StringVarP(&dir, "output-dir", "o", dir, "where to store ads")
flag.StringVarP(&configfile, "config", "c", configfile, "config file")
flag.Parse()
if showversion {
fmt.Printf("This is kleingebaeck version %s\n", VERSION) fmt.Printf("This is kleingebaeck version %s\n", VERSION)
return 0 return 0
} }
if showhelp { if conf.Showhelp {
fmt.Println(Usage) fmt.Println(Usage)
return 0 return 0
} }
if showmanual { if conf.Showmanual {
err := man() err := man()
if err != nil { if err != nil {
return Die(err) return Die(err)
@@ -112,21 +76,17 @@ func Main() int {
return 0 return 0
} }
conf, err := ParseConfigfile(configfile) if conf.Verbose {
if err != nil {
return Die(err)
}
if enableverbose || conf.Verbose {
logLevel.Set(slog.LevelInfo) logLevel.Set(slog.LevelInfo)
} }
if enabledebug { if conf.Debug {
// we're using a more verbose logger in debug mode // we're using a more verbose logger in debug mode
buildInfo, _ := debug.ReadBuildInfo() buildInfo, _ := debug.ReadBuildInfo()
opts := &tint.Options{ opts := &tint.Options{
Level: logLevel, Level: logLevel,
AddSource: true, AddSource: true,
NoColor: IsNoTty(),
} }
logLevel.Set(slog.LevelDebug) logLevel.Set(slog.LevelDebug)
@@ -142,50 +102,39 @@ func Main() int {
slog.Debug("config", "conf", conf) slog.Debug("config", "conf", conf)
if len(dir) == 0 {
if len(conf.Outdir) > 0 {
dir = conf.Outdir
} else {
dir = Defaultdir
}
}
// prepare output dir // prepare output dir
err = Mkdir(dir) err = Mkdir(conf.Outdir)
if err != nil { if err != nil {
return Die(err) return Die(err)
} }
// which template to use if len(conf.Adlinks) >= 1 {
template := DefaultTemplate
if len(conf.Template) > 0 {
template = conf.Template
}
// directly backup ad listing[s] // directly backup ad listing[s]
if len(flag.Args()) >= 1 { for _, uri := range conf.Adlinks {
for _, uri := range flag.Args() { err := Scrape(conf, uri)
err := Scrape(uri, dir, template)
if err != nil { if err != nil {
return Die(err) return Die(err)
} }
} }
} else if conf.User > 0 {
return 0
}
// backup all ads of the given user (via config or cmdline) // backup all ads of the given user (via config or cmdline)
if uid == 0 && conf.User > 0 { err := Start(conf)
uid = conf.User
}
if uid > 0 {
err := Start(fmt.Sprintf("%d", uid), dir, template)
if err != nil { if err != nil {
return Die(err) return Die(err)
} }
} else { } else {
return Die(errors.New("invalid or no user id specified")) return Die(errors.New("invalid or no user id or no ad link specified"))
}
if conf.StatsCountAds > 0 {
adstr := "ads"
if conf.StatsCountAds == 1 {
adstr = "ad"
}
fmt.Printf("Successfully downloaded %d %s with %d images to %s.\n",
conf.StatsCountAds, adstr, conf.StatsCountImages, conf.Outdir)
} else {
fmt.Printf("No ads found.")
} }
return 0 return 0

View File

@@ -40,6 +40,11 @@ for D in $DIST; do
os=${D/\/*/} os=${D/\/*/}
arch=${D/*\//} arch=${D/*\//}
binfile="releases/${tool}-${os}-${arch}-${version}" binfile="releases/${tool}-${os}-${arch}-${version}"
if test "$os" = "windows"; then
binfile="${binfile}.exe"
fi
tardir="${tool}-${os}-${arch}-${version}" tardir="${tool}-${os}-${arch}-${version}"
tarfile="releases/${tool}-${os}-${arch}-${version}.tar.gz" tarfile="releases/${tool}-${os}-${arch}-${version}.tar.gz"
set -x set -x

View File

@@ -23,11 +23,11 @@ import (
"io" "io"
"log/slog" "log/slog"
"net/http" "net/http"
"os" "path/filepath"
"strings" "strings"
"sync"
"astuart.co/goq" "astuart.co/goq"
"golang.org/x/sync/errgroup"
) )
type Index struct { type Index struct {
@@ -79,15 +79,15 @@ func Get(uri string, client *http.Client) (io.ReadCloser, error) {
// extract links from all ad listing pages (that is: use pagination) // extract links from all ad listing pages (that is: use pagination)
// and scrape every page // and scrape every page
func Start(uid string, dir string, template string) error { func Start(conf *Config) error {
client := &http.Client{} client := &http.Client{}
adlinks := []string{} adlinks := []string{}
baseuri := Baseuri + Listuri + "?userId=" + uid baseuri := fmt.Sprintf("%s%s?userId=%d", Baseuri, Listuri, conf.User)
page := 1 page := 1
uri := baseuri uri := baseuri
slog.Info("fetching ad pages", "user", uid) slog.Info("fetching ad pages", "user", conf.User)
for { for {
var index Index var index Index
@@ -118,18 +118,22 @@ func Start(uid string, dir string, template string) error {
uri = baseuri + "&pageNum=" + fmt.Sprintf("%d", page) uri = baseuri + "&pageNum=" + fmt.Sprintf("%d", page)
} }
for _, adlink := range adlinks { for i, adlink := range adlinks {
err := Scrape(Baseuri+adlink, dir, template) err := Scrape(conf, Baseuri+adlink)
if err != nil { if err != nil {
return err return err
} }
if conf.Limit > 0 && i == conf.Limit-1 {
break
}
} }
return nil return nil
} }
// scrape an ad. uri is the full uri of the ad, dir is the basedir // scrape an ad. uri is the full uri of the ad, dir is the basedir
func Scrape(uri string, dir string, template string) error { func Scrape(c *Config, uri string) error {
client := &http.Client{} client := &http.Client{}
ad := &Ad{} ad := &Ad{}
@@ -160,60 +164,43 @@ func Scrape(uri string, dir string, template string) error {
} }
slog.Debug("extracted ad listing", "ad", ad) slog.Debug("extracted ad listing", "ad", ad)
// prepare output dir // write listing
dir = dir + "/" + ad.Slug err = WriteAd(c.Outdir, ad, c.Template)
err = Mkdir(dir)
if err != nil { if err != nil {
return err return err
} }
// write ad file c.IncrAds()
listingfile := strings.Join([]string{dir, "Adlisting.txt"}, "/")
f, err := os.Create(listingfile)
if err != nil {
return err
}
ad.Text = strings.ReplaceAll(ad.Text, "<br/>", "\n") return ScrapeImages(c, ad)
_, err = fmt.Fprintf(f, template,
ad.Title, ad.Price, ad.Id, ad.Category, ad.Condition, ad.Created, ad.Text)
if err != nil {
return err
}
slog.Info("wrote ad listing", "listingfile", listingfile)
return ScrapeImages(dir, ad)
} }
func ScrapeImages(dir string, ad *Ad) error { func ScrapeImages(c *Config, ad *Ad) error {
// fetch images // fetch images
img := 1 img := 1
var wg sync.WaitGroup g := new(errgroup.Group)
wg.Add(len(ad.Images))
failure := make(chan string)
for _, imguri := range ad.Images { for _, imguri := range ad.Images {
file := fmt.Sprintf("%s/%d.jpg", dir, img) imguri := imguri
go func() { file := filepath.Join(c.Outdir, ad.Slug, fmt.Sprintf("%d.jpg", img))
defer wg.Done() g.Go(func() error {
err := Getimage(imguri, file) err := Getimage(imguri, file)
if err != nil { if err != nil {
failure <- err.Error() return err
return
} }
slog.Info("wrote ad image", "image", file) slog.Info("wrote ad image", "image", file)
}()
return nil
})
img++ img++
} }
close(failure) if err := g.Wait(); err != nil {
wg.Wait() return err
goterr := <-failure
if goterr != "" {
return errors.New(goterr)
} }
c.IncrImgs(len(ad.Images))
return nil return nil
} }
@@ -230,13 +217,7 @@ func Getimage(uri, fileName string) error {
return errors.New("received non 200 response code") return errors.New("received non 200 response code")
} }
file, err := os.Create(fileName) err = WriteImage(fileName, response.Body)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(file, response.Body)
if err != nil { if err != nil {
return err return err
} }

78
store.go Normal file
View File

@@ -0,0 +1,78 @@
/*
Copyright © 2023 Thomas von Dein
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
package main
import (
"io"
"log/slog"
"os"
"path/filepath"
"runtime"
"strings"
tpl "text/template"
)
func WriteAd(dir string, ad *Ad, template string) error {
// prepare output dir
dir = filepath.Join(dir, ad.Slug)
err := Mkdir(dir)
if err != nil {
return err
}
// write ad file
listingfile := filepath.Join(dir, "Adlisting.txt")
f, err := os.Create(listingfile)
if err != nil {
return err
}
if runtime.GOOS == "windows" {
ad.Text = strings.ReplaceAll(ad.Text, "<br/>", "\r\n")
} else {
ad.Text = strings.ReplaceAll(ad.Text, "<br/>", "\n")
}
tmpl, err := tpl.New("adlisting").Parse(template)
if err != nil {
return err
}
err = tmpl.Execute(f, ad)
if err != nil {
return err
}
slog.Info("wrote ad listing", "listingfile", listingfile)
return nil
}
func WriteImage(filename string, reader io.ReadCloser) error {
file, err := os.Create(filename)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(file, reader)
if err != nil {
return err
}
return nil
}

13
util.go
View File

@@ -22,6 +22,9 @@ import (
"errors" "errors"
"os" "os"
"os/exec" "os/exec"
"runtime"
"github.com/mattn/go-isatty"
) )
func Mkdir(dir string) error { func Mkdir(dir string) error {
@@ -53,3 +56,13 @@ func man() error {
return nil return nil
} }
// returns TRUE if stdout is NOT a tty or windows
func IsNoTty() bool {
if runtime.GOOS == "windows" || !isatty.IsTerminal(os.Stdout.Fd()) {
return true
}
// it is a tty
return false
}