Discussion:
Chinese downloads overloading my website
(too old to reply)
legg
2024-03-07 17:49:30 UTC
Permalink
Got a note from an ISP today indicating that my website
was suspended due to data transfer over-use for the month. (>50G)
It's only the 7th day of the month and this hadn't been a
problem in the 6 years they'd hosted the service.

Turns out that three chinese sources had downloaded the same
set of files, each 262 times. That would do it.

So, anyone else looking to update bipolar semiconductor,
packaging or spice parameter spreadsheets; look at K.A.Pullen's
'Conductance Design Curve Manual' or any of the other bits
stored at ve3ute.ca are out of luck, for the rest of the month .

Seems strange that the same three addresses downloaded the
same files, the same number of times. Is this a denial of
service attack?

RL
John R Walliker
2024-03-07 19:35:46 UTC
Permalink
Post by legg
Got a note from an ISP today indicating that my website
was suspended due to data transfer over-use for the month. (>50G)
It's only the 7th day of the month and this hadn't been a
problem in the 6 years they'd hosted the service.
Turns out that three chinese sources had downloaded the same
set of files, each 262 times. That would do it.
So, anyone else looking to update bipolar semiconductor,
packaging or spice parameter spreadsheets; look at K.A.Pullen's
'Conductance Design Curve Manual' or any of the other bits
stored at ve3ute.ca are out of luck, for the rest of the month .
Seems strange that the same three addresses downloaded the
same files, the same number of times. Is this a denial of
service attack?
RL
I have seen DNS servers in China poisoned in such a way that lookups
of sites that are deemed to be inappropriate are responded to with the
address of some random but genuine site. This happened to a company in
the UK and resulted in a huge amount of traffic.
Why such a DNS poisoning would lead to lots of downloads is less obvious.
John
Don Y
2024-03-07 19:40:32 UTC
Permalink
Post by legg
Got a note from an ISP today indicating that my website
was suspended due to data transfer over-use for the month. (>50G)
It's only the 7th day of the month and this hadn't been a
problem in the 6 years they'd hosted the service.
Turns out that three chinese sources had downloaded the same
set of files, each 262 times. That would do it.
So, anyone else looking to update bipolar semiconductor,
packaging or spice parameter spreadsheets; look at K.A.Pullen's
'Conductance Design Curve Manual' or any of the other bits
stored at ve3ute.ca are out of luck, for the rest of the month .
Seems strange that the same three addresses downloaded the
same files, the same number of times. Is this a denial of
service attack?
Of sorts.

You might look at the *times* to see if it looks "mechanical"
or "human initiated".

You could change your "service" to one that delivers *requested*
content; email driven so you can insert your own metering function
in that loop. Or, a combination of the two -- hide the content
and return a one-time, unique, time-limited URL as the result of
an *approved* email request...

[Or, you can *hide* your site and only make it available by
invitation]
legg
2024-03-07 22:12:27 UTC
Permalink
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.

All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.

Same files 262 times each, 17Gb each.

Not normal web activity, as I know it.

RL
Jan Panteltje
2024-03-08 06:43:49 UTC
Permalink
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
legg
2024-03-10 01:59:19 UTC
Permalink
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
Any idea what's involved - preferably anything that doesn't
owe to Google?

ISP bumped up limit for this month as courtesy, after blocking the
first three hosts, but a fourth host just gobbled that up.

3rd March
1.82.160.27
Chinanet Shaanxi, China telecom #56 Gaoxin St Beijing 100032

5th March
183.197.52.166
China Mobile Communications

6th March
42.184.167.97
Chinanet Heilongjiang, Heilongjiang Telecom,#178 Zhongshan Rd Haerbin
150040

8th March
106.46.35.206
Chinanet Henan, Henan Telecom

I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.

Have access to Pldesk, but am unfamiliar with capabilities
and clued out how to do much of anything save file transfer.

RL
Jan Panteltje
2024-03-10 06:08:15 UTC
Permalink
On a sunny day (Sat, 09 Mar 2024 20:59:19 -0500) it happened legg
Post by legg
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
Any idea what's involved - preferably anything that doesn't
owe to Google?
...
I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.
I no longer run an ftp server (for many years now),
the old one here needed a password.
Some parts of my website used to be password protected.
When I ask google for "how to add a captcha to your website"
I see many solutions, for example this:
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/

Maybe some html guru here nows?
Liz Tuddenham
2024-03-10 09:28:12 UTC
Permalink
Post by Jan Panteltje
On a sunny day (Sat, 09 Mar 2024 20:59:19 -0500) it happened legg
Post by Jan Panteltje
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through
to get access.
Any idea what's involved - preferably anything that doesn't
owe to Google?
...
I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.
I no longer run an ftp server (for many years now),
the old one here needed a password.
Some parts of my website used to be password protected.
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-ht
ml-and-javascript/
Maybe some html guru here nows?
If you can password-protect the pages, why not do that but include the
password in the text so that any human can see it and copy it? i.e.

~~~~~~~~
To prove you are human you must type in the password, the password is
ABC
Password: ___

~~~~~~~~

I don't think there is an easy way of writing anything automatic in the
HTML Body text but you might be able to add a script to the Head that
checks the IP address and blocks the ones you don't want.

If you can write PHP, you could easily write your own version of Captcha
or write a script that limits the number of repeat visits from the same
IP address in a given time. Mixing PHP into HTML pages is easy but you
have to change the file extension of each page from .htm to .php

Servers generally have facilities for PHP already built-in and the W3
Schools tutorials can get you started.
--
~ Liz Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk
Jeff Liebermann
2024-03-10 19:29:43 UTC
Permalink
Post by Liz Tuddenham
If you can password-protect the pages, why not do that but include the
password in the text so that any human can see it and copy it? i.e.
~~~~~~~~
To prove you are human you must type in the password, the password is
ABC
Password: ___
~~~~~~~~
That doesn't work if humans are doing the work in human Captcha
solving services:

"I Was a Human CAPTCHA Solver"
<https://www.f5.com/labs/articles/cisotociso/i-was-a-human-captcha-solver>

More of the same:
<https://www.google.com/search?q=captcha+solving+services>
--
Jeff Liebermann ***@cruzio.com
PO Box 272 http://www.LearnByDestroying.com
Ben Lomond CA 95005-0272
Skype: JeffLiebermann AE6KS 831-336-2558
legg
2024-03-10 17:47:48 UTC
Permalink
Post by Jan Panteltje
On a sunny day (Sat, 09 Mar 2024 20:59:19 -0500) it happened legg
Post by legg
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
Any idea what's involved - preferably anything that doesn't
owe to Google?
...
I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.
I no longer run an ftp server (for many years now),
the old one here needed a password.
Some parts of my website used to be password protected.
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/
Maybe some html guru here nows?
That looks like it's good for accessing an html page.
So far the chinese are accessing the top level index, where
files are offered for download at a click.

Ideally, if they can't access the top level, a direct address
access to the files might be prevented?

The website's down after a fifth excursion pushed volumes above
85g on a 70G temporary extension. What's the bet it was 17G
accumulated in 262 'visits'.

Can't ID that final hosts IP address while I'm locked out.

Luckily (~) for users, you can still access most of the usefull
files, updated in January 2024, through the Wayback Machine.

https://web.archive.org/web/20240000000000*/http://www.ve3ute.ca/

Probably the best place for it, in some people's opinion, anyways.

YOU can make stuff available to others, in the future, by 'suggesting'
relevent site addresses to the Internet Archive, if they're not
already being covered.

Once a 'captcha' or other security device is added, you can kiss
Wayback updates goodbye, as most bots will get the message.
I don't mind bots - thay can do good work.

Pity you can't just put stuff up in the public domain without
this kind of bullshit.

RL
Don Y
2024-03-10 20:48:54 UTC
Permalink
Post by legg
So far the chinese are accessing the top level index, where
files are offered for download at a click.
Ideally, if they can't access the top level, a direct address
access to the files might be prevented?
Many file sharing services deliberately do NOT offer access
to a "folder index" for similar reasons. This allows the
owner of the file(s) to publish specific links to individual files
while keeping the folder, itself, hidden.

This is done by creating unique URLs for each file.
I.e., instead of ..../foldername/filename you publish
.../foldername/pseudorandomappearingstring/filename
where "foldername" is some bogus sequence of characters
and pseudorandomappearingstring varies from file to file!
Post by legg
The website's down after a fifth excursion pushed volumes above
85g on a 70G temporary extension. What's the bet it was 17G
accumulated in 262 'visits'.
Can't ID that final hosts IP address while I'm locked out.
Luckily (~) for users, you can still access most of the usefull
files, updated in January 2024, through the Wayback Machine.
https://web.archive.org/web/20240000000000*/http://www.ve3ute.ca/
Probably the best place for it, in some people's opinion, anyways.
There's no guarantee that the *files* will be accessible via those
links. I have often gone looking for something that has disappeared
from its original home and able to find the *pages* that reference
them but not the actual *payloads*. (this happened as recently as
yesterday)

Pages take up far less space than payloads, typically, so it is
understandable that they would capture the page but not the
files referenced from it.
Post by legg
YOU can make stuff available to others, in the future, by 'suggesting'
relevent site addresses to the Internet Archive, if they're not
already being covered.
Once a 'captcha' or other security device is added, you can kiss
Wayback updates goodbye, as most bots will get the message.
I don't mind bots - thay can do good work.
Pity you can't just put stuff up in the public domain without
this kind of bullshit.
Making it accessible to *all* means you have to expect *all* to
access it. Hard to blame your ISP for wanting to put a limit on the
traffic to the site (my AUP forbids me from operating a public
server so I have to use more clandestine means of "publishing")

If demand is low enough (you can determine that by looking at past
"legitimate" traffic), you can insert yourself in the process by
requesting a form completion: "These are the things that I have
available. Type the name of the item into the box provided"

This eliminates LINKS on the page and requires someone who can
read the text to identify the item(s) of interest. This allows
you to intervene even if the "user" is not a 'bot but a poorly
paid urchin trying to harvest content.
Jan Panteltje
2024-03-11 06:05:26 UTC
Permalink
On a sunny day (Sun, 10 Mar 2024 13:47:48 -0400) it happened legg
Post by legg
Post by Jan Panteltje
On a sunny day (Sat, 09 Mar 2024 20:59:19 -0500) it happened legg
Post by legg
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
Any idea what's involved - preferably anything that doesn't
owe to Google?
...
I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.
I no longer run an ftp server (for many years now),
the old one here needed a password.
Some parts of my website used to be password protected.
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/
Maybe some html guru here nows?
That looks like it's good for accessing an html page.
So far the chinese are accessing the top level index, where
files are offered for download at a click.
Ideally, if they can't access the top level, a direct address
access to the files might be prevented?
What I am doing now is using a html://mywebsite/pub/ directory
with lots of files in it that I want to publish in for example this newsgroup,
I then just post a direct link to that file.
So it has no index file and no links to it from the main site.
It has many sub directories too.
Loading Image...
https://panteltje.nl/pub/pwfax-0.1/README

So you need the exact link to access anything
fine for publishing here...
Maybe Usenet conversations are saved somewhere ? google still holds the archive?
I have most postings saved here on the Raspberry Pi4 8GB I am using for web browsing and Usenet
for what I found interesting back to 2006, older to back 1998 maybe on the old PC upstairs

raspberrypi: ~/.NewsFleX # l
total 692
-rw-r--r-- 1 root root 21971 Jan 9 2006 NewsFleX.xpm
-rw-r--r-- 1 root root 2576 Jul 30 2006 newsservers.dat.bak
drwxr-xr-x 5 root root 4096 Apr 1 2008 news.isu.edu.tw/
drwxr-xr-x 5 root root 4096 Apr 1 2008 textnews.news.cambrium.nl/
-rw-r--r-- 1 root root 1 Mar 5 2009 global_custom_head
drwx------ 4 root root 4096 Dec 6 2009 http/
-rw-r--r-- 1 root root 99 Apr 4 2010 signature.org
-rw-r--r-- 1 root root 8531 Apr 4 2010 signature~
-rw-r--r-- 1 root root 8531 Apr 4 2010 signature
-rw-r--r-- 1 root root 816 Nov 9 2011 filters.dat.OK
drwxr-xr-x 3 root root 4096 Jul 5 2012 nntp.ioe.org/
drwxr-xr-x 2 root root 4096 Mar 30 2015 news.altopia.com/
drwxr-xr-x 25 root root 4096 Mar 1 2020 news2.datemas.de/
drwxr-xr-x 109 root root 4096 Jun 1 2020 news.albasani.net/
drwxr-xr-x 2 root root 4096 Nov 28 2020 setup/
drwxr-xr-x 10 root root 4096 Mar 1 2021 news.ziggo.nl/
drwxr-xr-x 6 root root 4096 Jun 1 2021 news.chello.nl/
drwxr-xr-x 2 root root 4096 Aug 19 2021 news.neodome.net/
drwxr-xr-x 6 root root 4096 Sep 1 2022 news.tornevall.net/
drwxr-xr-x 156 root root 4096 Nov 1 2022 news.datemas.de/
drwxr-xr-x 23 root root 4096 Jan 1 2023 news.aioe.cjb.net/
drwxr-xr-x 4 root root 4096 Jan 1 2023 news.cambrium.nl/
drwxr-xr-x 52 root root 4096 Jan 1 2023 news.netfront.net/
drwxr-xr-x 60 root root 4096 Feb 1 2023 freenews.netfront.net/
-rw-r--r-- 1 root root 1651 Feb 1 2023 urls.dat~
drwxr-xr-x 49 root root 4096 Apr 2 2023 freetext.usenetserver.com/
-rw-r--r-- 1 root root 1698 Apr 18 2023 urls.dat
drwxr-xr-x 15 root root 4096 Aug 2 2023 localhost/
drwxr-xr-x 11 root root 4096 Dec 15 06:57 194.177.96.78/
drwxr-xr-x 190 root root 4096 Dec 15 06:58 nntp.aioe.org/
-rw-r--r-- 1 root root 1106 Feb 23 06:43 error_log.txt
-rw-r--r-- 1 root root 966 Feb 23 13:33 filters.dat~
-rw-r--r-- 1 root root 973 Mar 2 06:28 filters.dat
drwxr-xr-x 57 root root 4096 Mar 3 11:42 news.eternal-september.org/
drwxr-xr-x 14 root root 4096 Mar 3 11:42 news.solani.org/
drwxr-xr-x 197 root root 4096 Mar 3 11:42 postings/
-rw-r--r-- 1 root root 184263 Mar 6 04:45 newsservers.dat~
-rw-r--r-- 1 root root 2407 Mar 6 04:45 posting_periods.dat~
-rw-r--r-- 1 root root 0 Mar 6 06:27 lockfile
-rw-r--r-- 1 root root 87 Mar 6 06:27 kernel_version
-rw-r--r-- 1 root root 107930 Mar 6 06:27 fontlist.txt
-rw-r--r-- 1 root root 184263 Mar 6 06:27 newsservers.dat
-rw-r--r-- 1 root root 2407 Mar 6 06:27 posting_periods.dat
....
lots of newsservers came and went over time...

I have backups of my website on harddisk, optical and of course my hosting provider.
jim whitby
2024-03-11 06:43:34 UTC
Permalink
Post by Jan Panteltje
On a sunny day (Sun, 10 Mar 2024 13:47:48 -0400) it happened legg
Post by legg
Post by Jan Panteltje
On a sunny day (Sat, 09 Mar 2024 20:59:19 -0500) it happened legg
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
A quick response from the ISP says they're blocking the three hosts
and 'monitoring the situatio'.
All the downloading was occuring between certain hours of the day in
sequence - first one host between 11 and 12pm. one days rest, then
the second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
Any idea what's involved - preferably anything that doesn't owe to
Google?
...
I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.
I no longer run an ftp server (for many years now),
the old one here needed a password.
Some parts of my website used to be password protected.
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-
in-html-and-javascript/
Post by Jan Panteltje
Post by legg
Post by Jan Panteltje
Maybe some html guru here nows?
That looks like it's good for accessing an html page.
So far the chinese are accessing the top level index, where files are
offered for download at a click.
Ideally, if they can't access the top level, a direct address access to
the files might be prevented?
What I am doing now is using a html://mywebsite/pub/ directory with lots
of files in it that I want to publish in for example this newsgroup,
I then just post a direct link to that file.
So it has no index file and no links to it from the main site.
It has many sub directories too.
https://panteltje.nl/pub/
GPS_to_USB_module_component_site_IXIMG_1360.JPG
Post by Jan Panteltje
https://panteltje.nl/pub/pwfax-0.1/README
So you need the exact link to access anything fine for publishing
here...
Maybe Usenet conversations are saved somewhere ? google still holds the archive?
I have most postings saved here on the Raspberry Pi4 8GB I am using for
web browsing and Usenet for what I found interesting back to 2006, older
to back 1998 maybe on the old PC upstairs
raspberrypi: ~/.NewsFleX # l total 692 -rw-r--r-- 1 root root 21971
Jan 9 2006 NewsFleX.xpm -rw-r--r-- 1 root root 2576 Jul 30 2006
newsservers.dat.bak drwxr-xr-x 5 root root 4096 Apr 1 2008
news.isu.edu.tw/
drwxr-xr-x 5 root root 4096 Apr 1 2008 textnews.news.cambrium.nl/
-rw-r--r-- 1 root root 1 Mar 5 2009 global_custom_head
drwx------ 4 root root 4096 Dec 6 2009 http/
-rw-r--r-- 1 root root 99 Apr 4 2010 signature.org -rw-r--r--
1 root root 8531 Apr 4 2010 signature~
-rw-r--r-- 1 root root 8531 Apr 4 2010 signature -rw-r--r-- 1
root root 816 Nov 9 2011 filters.dat.OK drwxr-xr-x 3 root root
4096 Jul 5 2012 nntp.ioe.org/
drwxr-xr-x 2 root root 4096 Mar 30 2015 news.altopia.com/
drwxr-xr-x 25 root root 4096 Mar 1 2020 news2.datemas.de/
drwxr-xr-x 109 root root 4096 Jun 1 2020 news.albasani.net/
drwxr-xr-x 2 root root 4096 Nov 28 2020 setup/
drwxr-xr-x 10 root root 4096 Mar 1 2021 news.ziggo.nl/
drwxr-xr-x 6 root root 4096 Jun 1 2021 news.chello.nl/
drwxr-xr-x 2 root root 4096 Aug 19 2021 news.neodome.net/
drwxr-xr-x 6 root root 4096 Sep 1 2022 news.tornevall.net/
drwxr-xr-x 156 root root 4096 Nov 1 2022 news.datemas.de/
drwxr-xr-x 23 root root 4096 Jan 1 2023 news.aioe.cjb.net/
drwxr-xr-x 4 root root 4096 Jan 1 2023 news.cambrium.nl/
drwxr-xr-x 52 root root 4096 Jan 1 2023 news.netfront.net/
drwxr-xr-x 60 root root 4096 Feb 1 2023 freenews.netfront.net/
-rw-r--r-- 1 root root 1651 Feb 1 2023 urls.dat~
drwxr-xr-x 49 root root 4096 Apr 2 2023 freetext.usenetserver.com/
-rw-r--r-- 1 root root 1698 Apr 18 2023 urls.dat drwxr-xr-x 15
root root 4096 Aug 2 2023 localhost/
drwxr-xr-x 11 root root 4096 Dec 15 06:57 194.177.96.78/
drwxr-xr-x 190 root root 4096 Dec 15 06:58 nntp.aioe.org/
-rw-r--r-- 1 root root 1106 Feb 23 06:43 error_log.txt -rw-r--r--
1 root root 966 Feb 23 13:33 filters.dat~
-rw-r--r-- 1 root root 973 Mar 2 06:28 filters.dat drwxr-xr-x 57
root root 4096 Mar 3 11:42 news.eternal-september.org/
drwxr-xr-x 14 root root 4096 Mar 3 11:42 news.solani.org/
drwxr-xr-x 197 root root 4096 Mar 3 11:42 postings/
-rw-r--r-- 1 root root 184263 Mar 6 04:45 newsservers.dat~
-rw-r--r-- 1 root root 2407 Mar 6 04:45 posting_periods.dat~
-rw-r--r-- 1 root root 0 Mar 6 06:27 lockfile -rw-r--r-- 1
root root 87 Mar 6 06:27 kernel_version -rw-r--r-- 1 root root
107930 Mar 6 06:27 fontlist.txt -rw-r--r-- 1 root root 184263 Mar 6
06:27 newsservers.dat -rw-r--r-- 1 root root 2407 Mar 6 06:27
posting_periods.dat ....
lots of newsservers came and went over time...
I have backups of my website on harddisk, optical and of course my hosting provider.
You may find the file:

/etc/hosts.deny

useful in this case, you can block by name(s) or ip(s).
Man hosts,deny
for more info
--
Jim Whitby


Famous, adj.:
Conspicuously miserable.
-- Ambrose Bierce, "The Devil's Dictionary"
----------------------
Mageia release 9 (Official) for x86_64
6.6.18-server-1.mga9 unknown
----------------------
Don Y
2024-03-11 09:02:47 UTC
Permalink
Post by jim whitby
/etc/hosts.deny
useful in this case, you can block by name(s) or ip(s).
Man hosts,deny
for more info
My read is not that *he* is having traffic throttled to a
server that *he* operates but, rather, that traffic to
a (virtual) server that his ISP operates on his behalf
is being throttled. I.e., his subscription allows 50GB/month
(and some amount of storage space) and that is being exceeded
by "unfriendly" clients.

As he has no direct control over traffic, he is at the mercy of
unknown (in this case, chinese) users to limit THEIR accesses
to his (virtual) site. I.e., his *provider* needs to restrict
unwanted accesses.

Sort of like complaining to your cellular provider that you are
getting too many text messages from people that you don't want
to hear from and these are eating into your monthly quota...
Jan Panteltje
2024-03-11 09:53:44 UTC
Permalink
On a sunny day (Mon, 11 Mar 2024 06:43:34 -0000 (UTC)) it happened jim whitby
Post by jim whitby
/etc/hosts.deny
useful in this case, you can block by name(s) or ip(s).
Man hosts,deny
for more info
I wrote a small script years ago using Linux iptables to reject bad IP adresses.

raspberrypi: ~ # cat /usr/local/sbin_pi_95/ireject
# this is called to add a input deny for an IP addres to ipchains,
# and save the configuration.

if [ "$1" = "" ]
then
echo "Usage: reject IP_address"
exit 1
fi

# OLD ipchains
##ipchains -A input -s $1 -l -j REJECT
#ipchains -L
##ipchains-save > /root/firewall
##echo "reject: ipchains configuration written to /root/firewall"

#iptables -A INPUT -s $1 -p all -j REJECT
#iptables -A INPUT -s $1 -p all -j DROP

echo "executing iptables -A INPUT -s $1 -p all -j DROP"
iptables -A INPUT -s $1 -p all -j DROP

echo "executing iptables -A OUTPUT -s $1 -p all -j REJECT"
iptables -A OUTPUT -s $1 -p all -j REJECT

iptables-save > /root/firewall2

exit 0

Therr is an other one 'load_firewall somewhere.
raspberrypi: ~ # cat /usr/local/sbin_pi_95/load-firewall
iptables -F
#/sbin/ipchains-restore < /root/firewall
/sbin/iptables-restore < /root/firewall2



There were many many entries in /root/firewall back then, daily work to keep track of attacks.
Now I am on a dynamic IP address and the website is handled by a company,
saves a lot of time.

Things evolve all the time, iptables sets this Raspberry Pi with 8 GB memory as router too,
runs with a Huawei 4G USB stick with IP 192.168.8.100 for net connection, anywhere in Europe I think,
an other script:

raspberrypi: # cat /usr/local/sbin/start_4g_router
#!/usr//bin/bash

iptables -F

route add -net 192.168.0.0/16 dev eth0

echo 1 >/proc/sys/net/ipv4/ip_forward

iptables -t nat -A POSTROUTING ! -d 192.168.0.0/16 -o eth1 -j SNAT --to-source 192.168.8.100
sleep 1

ifconfig eth0 down
sleep 1

ifconfig eth0 192.168.178.1 up
sleep 1

vnstat -i eth1 -s
sleep 1

# default is set to 192.168.8.1, using 8.8.8.8 and 8.8.4.4 google name server lookup
cp /etc/resolv.conf.GOOGLE /etc/resolv.conf
sleep 1

# reduce swapping
sysctl vm.swappiness=5

echo "ready"


There is more, but then again, things change over time too.
legg
2024-03-11 14:40:16 UTC
Permalink
Post by Jan Panteltje
On a sunny day (Mon, 11 Mar 2024 06:43:34 -0000 (UTC)) it happened jim whitby
Post by jim whitby
/etc/hosts.deny
useful in this case, you can block by name(s) or ip(s).
Man hosts,deny
for more info
I wrote a small script years ago using Linux iptables to reject bad IP adresses.
raspberrypi: ~ # cat /usr/local/sbin_pi_95/ireject
# this is called to add a input deny for an IP addres to ipchains,
# and save the configuration.
if [ "$1" = "" ]
then
echo "Usage: reject IP_address"
exit 1
fi
# OLD ipchains
##ipchains -A input -s $1 -l -j REJECT
#ipchains -L
##ipchains-save > /root/firewall
##echo "reject: ipchains configuration written to /root/firewall"
#iptables -A INPUT -s $1 -p all -j REJECT
#iptables -A INPUT -s $1 -p all -j DROP
echo "executing iptables -A INPUT -s $1 -p all -j DROP"
iptables -A INPUT -s $1 -p all -j DROP
echo "executing iptables -A OUTPUT -s $1 -p all -j REJECT"
iptables -A OUTPUT -s $1 -p all -j REJECT
iptables-save > /root/firewall2
exit 0
Therr is an other one 'load_firewall somewhere.
raspberrypi: ~ # cat /usr/local/sbin_pi_95/load-firewall
iptables -F
#/sbin/ipchains-restore < /root/firewall
/sbin/iptables-restore < /root/firewall2
There were many many entries in /root/firewall back then, daily work to keep track of attacks.
Now I am on a dynamic IP address and the website is handled by a company,
saves a lot of time.
Things evolve all the time, iptables sets this Raspberry Pi with 8 GB memory as router too,
runs with a Huawei 4G USB stick with IP 192.168.8.100 for net connection, anywhere in Europe I think,
raspberrypi: # cat /usr/local/sbin/start_4g_router
#!/usr//bin/bash
iptables -F
route add -net 192.168.0.0/16 dev eth0
echo 1 >/proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING ! -d 192.168.0.0/16 -o eth1 -j SNAT --to-source 192.168.8.100
sleep 1
ifconfig eth0 down
sleep 1
ifconfig eth0 192.168.178.1 up
sleep 1
vnstat -i eth1 -s
sleep 1
# default is set to 192.168.8.1, using 8.8.8.8 and 8.8.4.4 google name server lookup
cp /etc/resolv.conf.GOOGLE /etc/resolv.conf
sleep 1
# reduce swapping
sysctl vm.swappiness=5
echo "ready"
There is more, but then again, things change over time too.
Blocking a single IP hasn't worked for my ISP.

Each identical 17G download block (262 visits)was by a new IP
in a completely different location/region.

Beijing, Hearbin, Henan, a mobile and a fifth, so far untraced
due to suspension of my site.

RL
Don Y
2024-03-11 14:48:04 UTC
Permalink
Post by legg
Blocking a single IP hasn't worked for my ISP.
It won't. Even novice users can move to a different IP using reeadily
available mechanisms.

Whitelisting can work (which is the approach that I use) but
it assumes you know who you *want* to access your site.

(It's a lot harder to guess a permitted IP than it is to avoid
an obviously BLOCKED one!)
Post by legg
Each identical 17G download block (262 visits)was by a new IP
in a completely different location/region.
Beijing, Hearbin, Henan, a mobile and a fifth, so far untraced
due to suspension of my site.
There's a reason things like "captcha" exist.

Note that this still doesn't prevent the *page(s)* from being repeatedly
accessed. But, presumably, their size is considerably smaller than
that of the payloads you want to protect.

OTOH, if someone wants to shut down your account due to an exceeded
quota, they can keep reloading those pages until they've eaten up your
traffic quota. And, "they" can be an automated process!

[Operating a server in stealth mode can avoid this. But, then
you're not "open to the public"! :> ]
legg
2024-03-11 16:57:20 UTC
Permalink
On Mon, 11 Mar 2024 07:48:04 -0700, Don Y
Post by Don Y
Post by legg
Blocking a single IP hasn't worked for my ISP.
It won't. Even novice users can move to a different IP using reeadily
available mechanisms.
Whitelisting can work (which is the approach that I use) but
it assumes you know who you *want* to access your site.
(It's a lot harder to guess a permitted IP than it is to avoid
an obviously BLOCKED one!)
Post by legg
Each identical 17G download block (262 visits)was by a new IP
in a completely different location/region.
Beijing, Hearbin, Henan, a mobile and a fifth, so far untraced
due to suspension of my site.
There's a reason things like "captcha" exist.
Note that this still doesn't prevent the *page(s)* from being repeatedly
accessed. But, presumably, their size is considerably smaller than
that of the payloads you want to protect.
OTOH, if someone wants to shut down your account due to an exceeded
quota, they can keep reloading those pages until they've eaten up your
traffic quota. And, "they" can be an automated process!
[Operating a server in stealth mode can avoid this. But, then
you're not "open to the public"! :> ]
Doing some simple experiments by temporarily renaming/replacing
some of the larger files being tageted, just to see how the bot
reacts to the new environment. If they find renamed files it
means something. If visits to get the same 17G alter it means
something else.

This all at the expense and patience of my ISP. Thumbs up there.

RL
Don Y
2024-03-12 05:19:06 UTC
Permalink
Post by legg
Doing some simple experiments by temporarily renaming/replacing
some of the larger files being tageted, just to see how the bot
reacts to the new environment. If they find renamed files it
means something. If visits to get the same 17G alter it means
something else.
That;s probably a good, inexpensive strategy to see how "active"
your "clients" are. Repeated hits on stale URLs would let you
know they are likely just reprobing from previously stored
results vs. actively *exploring* your site.

[Gotta wonder if they aren't a google/archive wannabe and not
smart enough to just *look* at the site.]
Post by legg
This all at the expense and patience of my ISP. Thumbs up there.
Be grateful. Many larger corporate providers would just cite
the AUP and your subscription terms and that would be the
end of THAT "discussion".

I run a thin pipe to the house -- my provider would love to
upsell me. But, it's saturated 95% of the time; a fatter
pipe would be idle while I'm away/asleep. As *latency* isn't
an issue, AVERAGE bandwidth remains the same. (as I download
another terabyte of rainbow tables...)
Martin Brown
2024-03-12 09:41:00 UTC
Permalink
Post by legg
On Mon, 11 Mar 2024 07:48:04 -0700, Don Y
Post by Don Y
Post by legg
Blocking a single IP hasn't worked for my ISP.
It won't. Even novice users can move to a different IP using reeadily
available mechanisms.
Whitelisting can work (which is the approach that I use) but
it assumes you know who you *want* to access your site.
(It's a lot harder to guess a permitted IP than it is to avoid
an obviously BLOCKED one!)
Post by legg
Each identical 17G download block (262 visits)was by a new IP
in a completely different location/region.
Beijing, Hearbin, Henan, a mobile and a fifth, so far untraced
due to suspension of my site.
There's a reason things like "captcha" exist.
Note that this still doesn't prevent the *page(s)* from being repeatedly
accessed. But, presumably, their size is considerably smaller than
that of the payloads you want to protect.
OTOH, if someone wants to shut down your account due to an exceeded
quota, they can keep reloading those pages until they've eaten up your
traffic quota. And, "they" can be an automated process!
[Operating a server in stealth mode can avoid this. But, then
you're not "open to the public"! :> ]
Doing some simple experiments by temporarily renaming/replacing
some of the larger files being tageted, just to see how the bot
reacts to the new environment. If they find renamed files it
means something. If visits to get the same 17G alter it means
something else.
This all at the expense and patience of my ISP. Thumbs up there.
Why don't you block entire blocks of Chinese IP addresses that contain
the ones that have attacked you until the problem ceases?
eg. add a few banned IP destinations to your .htaccess file

https://htaccessbook.com/block-ip-address/

1.80.*.* thru 1.95.*.*
101.16.*.* thru 101.16.*.*
101.144.*.* thru 101.159.*.*

If you block just a few big chunks it should make some difference.
You might have to inflict a bit of collateral damage in the 101.* range.

Otherwise you are stuck with adding some Captcha type thing to prevent
malicious bots hammering your site. I'm a bit surprised that your ISP
doesn't offer or have site wide countermeasures for such DOS attacks.
--
Martin Brown
legg
2024-03-12 13:50:50 UTC
Permalink
On Tue, 12 Mar 2024 09:41:00 +0000, Martin Brown
Post by Martin Brown
Post by legg
On Mon, 11 Mar 2024 07:48:04 -0700, Don Y
Post by Don Y
Post by legg
Blocking a single IP hasn't worked for my ISP.
It won't. Even novice users can move to a different IP using reeadily
available mechanisms.
Whitelisting can work (which is the approach that I use) but
it assumes you know who you *want* to access your site.
(It's a lot harder to guess a permitted IP than it is to avoid
an obviously BLOCKED one!)
Post by legg
Each identical 17G download block (262 visits)was by a new IP
in a completely different location/region.
Beijing, Hearbin, Henan, a mobile and a fifth, so far untraced
due to suspension of my site.
There's a reason things like "captcha" exist.
Note that this still doesn't prevent the *page(s)* from being repeatedly
accessed. But, presumably, their size is considerably smaller than
that of the payloads you want to protect.
OTOH, if someone wants to shut down your account due to an exceeded
quota, they can keep reloading those pages until they've eaten up your
traffic quota. And, "they" can be an automated process!
[Operating a server in stealth mode can avoid this. But, then
you're not "open to the public"! :> ]
Doing some simple experiments by temporarily renaming/replacing
some of the larger files being tageted, just to see how the bot
reacts to the new environment. If they find renamed files it
means something. If visits to get the same 17G alter it means
something else.
This all at the expense and patience of my ISP. Thumbs up there.
Why don't you block entire blocks of Chinese IP addresses that contain
the ones that have attacked you until the problem ceases?
eg. add a few banned IP destinations to your .htaccess file
https://htaccessbook.com/block-ip-address/
1.80.*.* thru 1.95.*.*
101.16.*.* thru 101.16.*.*
101.144.*.* thru 101.159.*.*
If you block just a few big chunks it should make some difference.
You might have to inflict a bit of collateral damage in the 101.* range.
Otherwise you are stuck with adding some Captcha type thing to prevent
malicious bots hammering your site. I'm a bit surprised that your ISP
doesn't offer or have site wide countermeasures for such DOS attacks.
My ISP has blocked all China IP addresses from accessing the
site.

Maybe that's what the bots want; who knows?

Haven't had access to the site to find out what the practical result
was, yet, or what the final probing looked like. Whatever it was, it
didn't result in another 17G block download, before the automated
account suspension reasserted itself, which was the last case
examined. (went 14G overlimit for full 17G load).

RL
Peter
2024-03-12 13:58:04 UTC
Permalink
Post by legg
My ISP has blocked all China IP addresses from accessing the
site.
That will work; the bots can get around it by using a VPN, but more or
less all VPN services which will handle heavy data cost money. So VPNs
are used for hacking but not for a DOS attack.
legg
2024-03-11 16:48:57 UTC
Permalink
Post by Jan Panteltje
On a sunny day (Sun, 10 Mar 2024 13:47:48 -0400) it happened legg
Post by legg
Post by Jan Panteltje
On a sunny day (Sat, 09 Mar 2024 20:59:19 -0500) it happened legg
<snip>
Post by Jan Panteltje
Post by legg
Post by Jan Panteltje
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/
Maybe some html guru here nows?
That looks like it's good for accessing an html page.
So far the chinese are accessing the top level index, where
files are offered for download at a click.
Ideally, if they can't access the top level, a direct address
access to the files might be prevented?
Using barebones (Netscape) Seamonkey Compser, the Oodlestech
script generates a web page with a 4-figure manually-entered
human test.

How do I get a correct response to open the protected web page?
Post by Jan Panteltje
What I am doing now is using a html://mywebsite/pub/ directory
with lots of files in it that I want to publish in for example this newsgroup,
I then just post a direct link to that file.
So it has no index file and no links to it from the main site.
It has many sub directories too.
https://panteltje.nl/pub/GPS_to_USB_module_component_site_IXIMG_1360.JPG
https://panteltje.nl/pub/pwfax-0.1/README
So you need the exact link to access anything
fine for publishing here...
<snip>

The top (~index) web page of my site has lists of direct links
to subdirectories, for double-click download by user.

It also has limks to other web pages that, in turn, offer links or
downloads to on-site and off-site locations. A great number of
off-site links are invalid, after ~10-20years of neglect. They'll
probably stay that way until something or somebody convinces me
that it's all not just a waste of time.

At present, I only maintain data links or electronic publications
that need it. This may not be neccessary, as the files are generally
small enough for the Wayback machine to have scooped up most of the
databases and spreadsheets. They're also showing up in other places,
with my blessing. Hell - Wayback even has tube curve pages from the
'Conductance Curve Design Manual' - they've got to be buried 4 folders
deep - and each is a hefty image.

Somebody, please tell me the the 'Internet Archive' is NOT owned
by Google?

Some off-site links for large image-bound mfr-logo-ident web pages
(c/o ***@scorpiorising) seem already to have introduced a
captcha-type routine. Wouldn't need many bot hits to bump that
location into a data limit. Those pages take a long time
simply to load.

Anyway - how to get the Oodlestech script to open the appropriate
page, after vetting the user as being human?

RL
Don Y
2024-03-12 22:05:00 UTC
Permalink
Post by legg
Post by Jan Panteltje
Post by legg
Post by Jan Panteltje
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/
Maybe some html guru here nows?
That looks like it's good for accessing an html page.
So far the chinese are accessing the top level index, where
files are offered for download at a click.
Ideally, if they can't access the top level, a direct address
access to the files might be prevented?
Using barebones (Netscape) Seamonkey Compser, the Oodlestech
script generates a web page with a 4-figure manually-entered
human test.
How do I get a correct response to open the protected web page?
Why not visit a page that uses it and inspect the source?
Post by legg
Post by Jan Panteltje
What I am doing now is using a html://mywebsite/pub/ directory
with lots of files in it that I want to publish in for example this newsgroup,
I then just post a direct link to that file.
So it has no index file and no links to it from the main site.
It has many sub directories too.
https://panteltje.nl/pub/GPS_to_USB_module_component_site_IXIMG_1360.JPG
https://panteltje.nl/pub/pwfax-0.1/README
So you need the exact link to access anything
fine for publishing here...
<snip>
The top (~index) web page of my site has lists of direct links
to subdirectories, for double-click download by user.
You could omit the actual links and just leave the TEXT for a link
present (i.e., highlight text, copy, paste into address bar) to
see if the "clients" are exploring all of your *links* or are
actually parsing the *text*.
Post by legg
It also has limks to other web pages that, in turn, offer links or
downloads to on-site and off-site locations. A great number of
Whether or not you choose to "protect" those assets is a separate
issue that only you can resolve (what's your "obligation" to a site that
you've referenced on YOUR page?)
Post by legg
off-site links are invalid, after ~10-20years of neglect. They'll
probably stay that way until something or somebody convinces me
that it's all not just a waste of time.
At present, I only maintain data links or electronic publications
that need it. This may not be neccessary, as the files are generally
small enough for the Wayback machine to have scooped up most of the
databases and spreadsheets. They're also showing up in other places,
with my blessing. Hell - Wayback even has tube curve pages from the
'Conductance Curve Design Manual' - they've got to be buried 4 folders
deep - and each is a hefty image.
You can see if bitsavers has an interest in preserving them in a
more "categorical" framework.
Post by legg
Somebody, please tell me the the 'Internet Archive' is NOT owned
by Google?
Some off-site links for large image-bound mfr-logo-ident web pages
captcha-type routine. Wouldn't need many bot hits to bump that
location into a data limit. Those pages take a long time
simply to load.
There is an art to designing all forms of documentation
(web pages just being one). Too abridged and folks spend forever
chasing links (even if it's as easy as "NEXT"). Too verbose and
the page takes a long time to load.

OTOH, when I'm looking to scrape documentation for <whatever>,
I will always take the "one large document" option, if offered.
It's just too damn difficult to rebuild a site's structure,
off-line, in (e.g.) a PDF. And, load times for large LOCAL documents
is insignificant.
Post by legg
Anyway - how to get the Oodlestech script to open the appropriate
page, after vetting the user as being human?
No examples, there?
legg
2024-03-13 00:08:47 UTC
Permalink
On Tue, 12 Mar 2024 15:05:00 -0700, Don Y
Post by Don Y
Post by legg
Post by Jan Panteltje
Post by legg
Post by Jan Panteltje
When I ask google for "how to add a captcha to your website"
https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/
Maybe some html guru here nows?
That looks like it's good for accessing an html page.
So far the chinese are accessing the top level index, where
files are offered for download at a click.
Ideally, if they can't access the top level, a direct address
access to the files might be prevented?
Using barebones (Netscape) Seamonkey Compser, the Oodlestech
script generates a web page with a 4-figure manually-entered
human test.
How do I get a correct response to open the protected web page?
Why not visit a page that uses it and inspect the source?
I'm afraid to find out. If it's google product . . . .
Post by Don Y
Post by legg
Post by Jan Panteltje
What I am doing now is using a html://mywebsite/pub/ directory
with lots of files in it that I want to publish in for example this newsgroup,
I then just post a direct link to that file.
So it has no index file and no links to it from the main site.
It has many sub directories too.
https://panteltje.nl/pub/GPS_to_USB_module_component_site_IXIMG_1360.JPG
https://panteltje.nl/pub/pwfax-0.1/README
So you need the exact link to access anything
fine for publishing here...
<snip>
The top (~index) web page of my site has lists of direct links
to subdirectories, for double-click download by user.
You could omit the actual links and just leave the TEXT for a link
present (i.e., highlight text, copy, paste into address bar) to
see if the "clients" are exploring all of your *links* or are
actually parsing the *text*.
After the chinese IPs were blocked, there was not much more
I could learn by fiddling about. My ISP had to reset the auto
suspension and up the limit with each (failed) iteration.
The current block is considered as dusting of the hands.
Case closed.
Post by Don Y
Post by legg
It also has limks to other web pages that, in turn, offer links or
downloads to on-site and off-site locations. A great number of
Whether or not you choose to "protect" those assets is a separate
issue that only you can resolve (what's your "obligation" to a site that
you've referenced on YOUR page?)
Post by legg
off-site links are invalid, after ~10-20years of neglect. They'll
probably stay that way until something or somebody convinces me
that it's all not just a waste of time.
At present, I only maintain data links or electronic publications
that need it. This may not be neccessary, as the files are generally
small enough for the Wayback machine to have scooped up most of the
databases and spreadsheets. They're also showing up in other places,
with my blessing. Hell - Wayback even has tube curve pages from the
'Conductance Curve Design Manual' - they've got to be buried 4 folders
deep - and each is a hefty image.
You can see if bitsavers has an interest in preserving them in a
more "categorical" framework.
The PDF version of complte CCDM is already out there in a couple
of free doc sites. Chart images in that pdf might have sample envy.
Post by Don Y
Post by legg
Somebody, please tell me the the 'Internet Archive' is NOT owned
by Google?
Some off-site links for large image-bound mfr-logo-ident web pages
captcha-type routine. Wouldn't need many bot hits to bump that
location into a data limit. Those pages take a long time
simply to load.
There is an art to designing all forms of documentation
(web pages just being one). Too abridged and folks spend forever
chasing links (even if it's as easy as "NEXT"). Too verbose and
the page takes a long time to load.
The problem with mfr logo ident is the raw volume of tiny images.
Don't recall if an epub version was made - I think, if anything,
that attempt just made a bigger file . . . .
Slow as it is - it's already split up alpha numerically into six
sections . . . .
Post by Don Y
OTOH, when I'm looking to scrape documentation for <whatever>,
I will always take the "one large document" option, if offered.
It's just too damn difficult to rebuild a site's structure,
off-line, in (e.g.) a PDF. And, load times for large LOCAL documents
is insignificant.
Post by legg
Anyway - how to get the Oodlestech script to open the appropriate
page, after vetting the user as being human?
No examples, there?
RL
Don Y
2024-03-13 23:43:22 UTC
Permalink
Post by legg
Post by Don Y
Post by legg
Post by Jan Panteltje
Post by legg
Ideally, if they can't access the top level, a direct address
access to the files might be prevented?
Using barebones (Netscape) Seamonkey Compser, the Oodlestech
script generates a web page with a 4-figure manually-entered
human test.
How do I get a correct response to open the protected web page?
Why not visit a page that uses it and inspect the source?
I'm afraid to find out. If it's google product . . . .
I think there are a variety of "similar" mechanisms offered.
You can also "roll your own" just by adding a stumbling
block that ties access to something beyond just having the served
page (e.g., delay the activation of links for a short period
of time after the page is served so the "client" has to
delay clicking on them)

Or, generating a psuedo-random number and requiring the
client to enter it -- or combinations thereof:
"Please enter this numeric value: six four three"
as a bot likely won't know that you have made such a request
of the client.
Post by legg
Post by Don Y
Post by legg
Post by Jan Panteltje
What I am doing now is using a html://mywebsite/pub/ directory
with lots of files in it that I want to publish in for example this newsgroup,
I then just post a direct link to that file.
So it has no index file and no links to it from the main site.
It has many sub directories too.
https://panteltje.nl/pub/GPS_to_USB_module_component_site_IXIMG_1360.JPG
https://panteltje.nl/pub/pwfax-0.1/README
So you need the exact link to access anything
fine for publishing here...
<snip>
The top (~index) web page of my site has lists of direct links
to subdirectories, for double-click download by user.
You could omit the actual links and just leave the TEXT for a link
present (i.e., highlight text, copy, paste into address bar) to
see if the "clients" are exploring all of your *links* or are
actually parsing the *text*.
After the chinese IPs were blocked, there was not much more
I could learn by fiddling about. My ISP had to reset the auto
suspension and up the limit with each (failed) iteration.
The current block is considered as dusting of the hands.
Case closed.
Well, you should be thankful they were at least THAT cooperative.
Post by legg
Post by Don Y
Post by legg
Somebody, please tell me the the 'Internet Archive' is NOT owned
by Google?
Some off-site links for large image-bound mfr-logo-ident web pages
captcha-type routine. Wouldn't need many bot hits to bump that
location into a data limit. Those pages take a long time
simply to load.
There is an art to designing all forms of documentation
(web pages just being one). Too abridged and folks spend forever
chasing links (even if it's as easy as "NEXT"). Too verbose and
the page takes a long time to load.
The problem with mfr logo ident is the raw volume of tiny images.
Don't recall if an epub version was made - I think, if anything,
that attempt just made a bigger file . . . .
Slow as it is - it's already split up alpha numerically into six
sections . . . .
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?

I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
Peter
2024-03-14 16:26:40 UTC
Permalink
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.

For example I run some sites and paid someone 1k or so to develop a
graphical captcha. It displayed two numbers as graphic images and you
had to enter their product e.g. 12 x 3 = 36.

A friend who is an expert at unix spent just a few mins on a script
which used standard unix utilities to do OCR on the page, and you can
guess the rest.
Don Y
2024-03-14 22:38:00 UTC
Permalink
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
Post by Peter
For example I run some sites and paid someone 1k or so to develop a
graphical captcha. It displayed two numbers as graphic images and you
had to enter their product e.g. 12 x 3 = 36.
A friend who is an expert at unix spent just a few mins on a script
which used standard unix utilities to do OCR on the page, and you can
guess the rest.
But a *bot* wouldn't know that this was an effective attack.
It would move on to the next site in its "list" to scrape.

If you use a canned/standard(ized) captcha, then a bot can
reap rewards learning how to defeat it -- because those
efforts will apply to other sites, as well.

[Some university did a study of the effectiveness of
captchas on human vs. automated clients and found the
machines could solve them better/faster than humans]

If you want to make something publicly accessible, then
you have to assume it will be publicly accessed!

I operate a server in stealth mode; it won't show up on
network probes so robots/adversaries just skip over the
IP and move on to others. Folks who *should* be able to
access it know how to "get its attention".

Prior to this "enhancement", I delivered content via email
request -- ask for something, verify YOU were the entity that
issued the request, then I would email it to you.

This was replaced with "then I would email a unique LINK
to it to you".
Liz Tuddenham
2024-03-15 10:41:07 UTC
Permalink
Post by Don Y
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
--
~ Liz Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk
Don Y
2024-03-15 11:08:55 UTC
Permalink
Post by Liz Tuddenham
Post by Don Y
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
You have to define your goal with any such mechanism.

If you want to protect content, then encrypt the content;
any downloads just waste the client's bandwidth (but, yours,
as well).

If you want to protect access, then you need a mechanism
that exceeds the abilities of the "current connection"
(e.g., robot, blind scrape, human, etc.) to navigate.

Every mechanism has a cost -- a portion of which you, also, bear.

Remember, a client can always hammer away at the basic page
(ignoring the cached flag) even if he never gets past your
"mechanism(s)" intended to deter him.

[A telemarketer can keep dialing your phone number even
if you NEVER answer his calls!]

Publishing any sort of contact information (email, phone, www,
etc.) INVITES contact.
Peter
2024-03-15 11:34:22 UTC
Permalink
Post by Liz Tuddenham
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
Yeah; like 95% ;)
Liz Tuddenham
2024-03-15 12:24:30 UTC
Permalink
Post by Peter
Post by Liz Tuddenham
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
Yeah; like 95% ;)
[Said in best posh English accent]
Did you meean: "Yes; for instance 95%" ? :-)
--
~ Liz Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk
Peter
2024-03-15 15:54:19 UTC
Permalink
Post by Liz Tuddenham
Post by Peter
Post by Liz Tuddenham
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
Yeah; like 95% ;)
[Said in best posh English accent]
Did you meean: "Yes; for instance 95%" ? :-)
Indeed.

Anybody starting a sentence with "indeed" is posh!
Don Y
2024-03-15 12:35:41 UTC
Permalink
Post by Liz Tuddenham
Post by Don Y
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
Require visitors to insert correct punctuation:

John had had had had had had had a better effect
Liz Tuddenham
2024-03-15 13:00:08 UTC
Permalink
Post by Don Y
Post by Liz Tuddenham
Post by Don Y
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
John had had had had had had had a better effect
"He helped his Uncle Jack off a horse."
--
~ Liz Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk
Don Y
2024-03-15 13:06:45 UTC
Permalink
Post by Liz Tuddenham
Post by Don Y
Post by Liz Tuddenham
Post by Don Y
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
John had had had had had had had a better effect
"He helped his Uncle Jack off a horse."
ROTFL!
Carlos E.R.
2024-03-15 13:08:10 UTC
Permalink
Post by Liz Tuddenham
Post by Don Y
Post by Liz Tuddenham
Post by Don Y
Post by Peter
Post by Don Y
(Without having seen them...) Can you create a PNG of a group
of them arranged in a matrix. Then, a map that allows clicking
on any *part* of the composite image to provide a more detailed
"popup" to inspect?
I.e., each individual image is a trip back to the server to
fetch that image. A single composite could reduce that to
one fetch with other actions conditional on whether or not
the user wants "more/finer detail"
All of this "graphical captcha" stuff is easy to hack if somebody is
out to trash *your* site.
If you are *targeted*, then all bets are off. At the end of the
day, your adversary could put a REAL HUMAN to the task of hammering
away at it.
You could always have a question which involved correcting the English
grammar of a sentence, but that might eliminate far more of your
visitors than you intended.
John had had had had had had had a better effect
"He helped his Uncle Jack off a horse."
Those things would kill most people for which English is a second language.
--
Cheers, Carlos.
Peter
2024-03-15 11:33:30 UTC
Permalink
Post by Don Y
I operate a server in stealth mode; it won't show up on
network probes so robots/adversaries just skip over the
IP and move on to others. Folks who *should* be able to
access it know how to "get its attention".
Port knocking ;)
Carlos E.R.
2024-03-15 12:34:56 UTC
Permalink
Post by Peter
Post by Don Y
I operate a server in stealth mode; it won't show up on
network probes so robots/adversaries just skip over the
IP and move on to others. Folks who *should* be able to
access it know how to "get its attention".
What is "stealth mode", what do you do?
Post by Peter
Port knocking ;)
I was thinking of using a high port. I do that.
--
Cheers, Carlos.
Don Y
2024-03-15 13:00:23 UTC
Permalink
Post by Carlos E.R.
Post by Peter
Post by Don Y
I operate a server in stealth mode; it won't show up on
network probes so robots/adversaries just skip over the
IP and move on to others.  Folks who *should* be able to
access it know how to "get its attention".
What is "stealth mode", what do you do?
It's what you *don't* do that is important.

When you receive a packet, you extract all of the
information indicating sender, intended destination
port, payload, etc.

Then, DON'T acknowledge the packet. Pretend the network
cable is terminated in dead air.

The *determined* "caller" sends another packet, some time later
(with limits on how soon/late this can be).

Again, you extract the information in the packet -- and
ignore it.

Repeat this some number of times for a variety of
different ports, payloads -- all traced back to the
same sender.

Then, on the *important* packet that arrives, subsequently,
acknowledge it with the service that is desired.

If the sequence is botched at any time -- like a sender doing
a sequential port scan -- then you reset the DFA that is
tracking THAT sender's progress through the automaton.

Note that you can handle multiple clients attempting to
connect simultaneously -- "hiding" from each of them
until and unless they complete their required sequences.

Anyone with a packet sniffer can be thwarted by ensuring
that the sequence is related to source IP, time of day,
service desired, etc. (though security by obscurity)

Because you don't react to most (all?) packets, a systematic
probe of your IP will not turn up a "live machine" at your
end.

Once you actually acknowledge a packet, all of the
regular authentication/encryption/etc. mechanisms come
into play. You just don't want to reveal your presence
unless you are reasonably sure the client is someone
that you *want* to have access...
Post by Carlos E.R.
Post by Peter
Port knocking ;)
I was thinking of using a high port. I do that.
But a port scanner can stumble on that. Or, it can be leaked
by a malevolent user.

The "knock sequence" can be customized per sender IP address,
per client identity, per service, etc. So, it's less vulnerable
than something (anything!) static.
Peter
2024-03-15 15:56:54 UTC
Permalink
Post by Don Y
Then, DON'T acknowledge the packet. Pretend the network
cable is terminated in dead air.
Can you actually do that, with a standard server? Normally every
TCP/IP packet is acked. This is deep in the system.

UDP isn't, which is why port knocking works so well.
Don Y
2024-03-15 20:05:45 UTC
Permalink
Post by Peter
Post by Don Y
Then, DON'T acknowledge the packet. Pretend the network
cable is terminated in dead air.
Can you actually do that, with a standard server? Normally every
TCP/IP packet is acked. This is deep in the system.
You have to rewrite your stack. *You* have to handle raw
packets instead of letting services (or the "super server")
handle them for you.

[And, you can't have an active proxy upstream that blindly
intercepts them]

The server effectively does a passive open and waits for
packets ON *ANY* PORT. You obviously have to hide ALL
ports as a potential client could poke ANY port, notice a
response, then assume you are *deliberately* hiding OTHER ports
that don't reply! If you reply ANYWHERE, then the "adversary"
knows that you aren't just a "dangling wire"!

Think of an old-fashioned RdTd serial port (no handshaking lines
that you can examine as "active"). You can listen to incoming
character stream without ever responding to it -- even allowing
your driver to lose characters to overrun/parity/framing/etc. errors.

Only when you see something that you recognize do you "react".

[This is the easy way to hide an "internal" 3-pin serial port
(that you likely have for diagnostics in a product) from folks
who like looking for shells, etc. on such things!]

Of course, if something (adversary or sniffer) sees that reaction,
then the secret is out. So, you don't want to abuse this access
mechanism.

It's like tunneling under some existing protocol; it works
only as long as folks don't *notice* it!
Post by Peter
UDP isn't, which is why port knocking works so well.
Anything that can be routed can be used. You can knock
on UDP/x, then UDP/y, then... before trying to open a
particular UDP/TCP connection. The point is to just LOOK
at incoming packets and not blindly act on them -- even
if that action is to block the connection.
Peter
2024-03-20 11:43:58 UTC
Permalink
Post by Don Y
Post by Peter
Can you actually do that, with a standard server? Normally every
TCP/IP packet is acked. This is deep in the system.
You have to rewrite your stack. *You* have to handle raw
packets instead of letting services (or the "super server")
handle them for you.
OK, so this is very rare.
Don Y
2024-03-20 14:52:59 UTC
Permalink
Post by Peter
Post by Don Y
Post by Peter
Can you actually do that, with a standard server? Normally every
TCP/IP packet is acked. This is deep in the system.
You have to rewrite your stack. *You* have to handle raw
packets instead of letting services (or the "super server")
handle them for you.
OK, so this is very rare.
Yes. So, sysadms aren't really looking for it or trying to
defend against it.

It's not a trivial solution as you need the skillset (as well
as access to the specific server!) to be able to, essentially,
rewrite the stack.

The easiest way to do this is to build a shim service to sit
above the NIC's IRQ as an agent; intercepting network
packets and only passing "select" ones up to the underside
of the *real*/original stack. You would then track the
"state" of each client's "knocking" sequence so you would know
who to BLOCK and who to PASSTHRU at any given time.

And, you can apply it to all ports/protocols (an essential
requirement as you don't want ANYTHING to be visible to a probe).

The problem with this approach lies in knowing when to
"stop" passing packets from a particular client as you
don't have an easy way of knowing that the "real"
"service" has been terminated. This a consequence of the
monolithic nature of most kernels.

[My new OS uses an entirely different approach to the stack
so its relatively easy for me to deal with "transactions"]

The *advantage* is that you can use it to effectively tunnel
under HTTP without worrying about sysadms blocking your
specific traffic: "Why is Bob, in accounting, trying to
send datagrams to port XYZ at DonsHouseOfMagic?"

[Very few protocols are *reliably* allowed through firewalls
without some form of caching, rescheduling, rewriting, etc.
E.g., tunneling under DNS is easily "broken" by a caching
server between the client and external agency. And, most
can't deliver large payloads without raising suspicions!
And, remember, you can't "sort of" process the protocol
without indicating that you exist!]

OTOH, a TCP connection (HTTP on port 80) to DonsHouseOfMagic
likely wouldn't arouse any suspicion. Nor would the payload
merit examination. Great for slipping firmware updates through
a firewall, usage data, etc.

[HTTP/3 adds some challenges but is no worse than any other
UDP service]
Peter
2024-03-15 15:55:12 UTC
Permalink
Post by Carlos E.R.
Post by Peter
Port knocking ;)
I was thinking of using a high port. I do that.
The sniffer will find any port # in a few more seconds...
Don Y
2024-03-15 20:08:51 UTC
Permalink
Post by Peter
Post by Carlos E.R.
Post by Peter
Port knocking ;)
I was thinking of using a high port. I do that.
The sniffer will find any port # in a few more seconds...
Point a nessus daemon at yourself and see what it finds.

GRC.com offers some (less exhaustive) on-line tools...
Carlos E.R.
2024-03-20 15:03:55 UTC
Permalink
Post by Peter
Post by Carlos E.R.
Post by Peter
Port knocking ;)
I was thinking of using a high port. I do that.
The sniffer will find any port # in a few more seconds...
Actually it takes longer than that. So far, no hits; and I would notice
when someone tries to login on ssh.

Of course, one can defend the fort from casual attackers, not from
determined attackers; those will eventually find a way.
--
Cheers, Carlos.
Don Y
2024-03-20 16:52:32 UTC
Permalink
Post by Peter
Post by Carlos E.R.
Post by Peter
Port knocking ;)
I was thinking of using a high port. I do that.
The sniffer will find any port # in a few more seconds...
Actually it takes longer than that. So far, no hits; and I would notice when
someone tries to login on ssh.
Why would an attacker try to breach a secure protocol -- hoping
you have enabled it without any protections??

A port scanner just needs to see if it gets a response from
a particular port, not whether or not it can invoke a particular
protocol on that port. Even "refusing the connection" tells the
scanner that there is a host at that IP.

Simple exercise: go to another host and just TRY to open a
connection to port 22 (sshd) or 23 (telnetd). Don't try to
login. What do you see on the server concerning this
activity?

You can learn a lot about the host, OS, etc. just from watching how
it reacts to connections and connection attempts (e.g., how it
assigns sequence numbers, which ports are open "by default", etc.)
Of course, one can defend the fort from casual attackers, not from determined
attackers; those will eventually find a way.
Only if they sense potential value beyond what they can get
for less effort, elsewhere. With all of the casual hosts out there,
(especially those folks who don't realize their security risks)
its silly to waste resources trying to get to one that poses any
sort of obstacle.

And, if you don't KNOW that there is a machine at that IP, then
what's your attack strategy? Just push packets down a black hole
and *hope* there is something there, listening (but ignoring)?

What do you do if I just hammer away at your IP even KNOWING that
you've got all your ports closed? Any *legitimate* traffic
can't get through (including replies to your outbound requests)
because I am saturating your pipe. What can you do to *stop* me
from doing this?

[The same sort of logic applies to "hidden" diagnostic ports
in devices. If I keep pushing bytes into a "debug" UART, I
consume system resources at a rate that *I* control. Was your
firmware designed to handle this possibility? Or, did you
assume only "authorized technicians" would use said port and
only in benevolent ways?]

Don Y
2024-03-15 12:42:50 UTC
Permalink
Post by Peter
Post by Don Y
I operate a server in stealth mode; it won't show up on
network probes so robots/adversaries just skip over the
IP and move on to others. Folks who *should* be able to
access it know how to "get its attention".
Port knocking ;)
Effectively, yes. It's a bit tedious to use -- and the server-side
code is far from "standard" -- but it is great at stealth. I'm
not sure how it would work in situations with lots of *intended*
traffic, though...

[I've been making little boxes with a NIC on one end, stack
in the middle, and some form of communications I/O on the
other (serial port, USB, GPIB, CAN, DMX, etc.). The stealth
feature was one of the most requested capabilities (as it lets
an interface be deployed and routed -- without fear of some
hacker/script-kiddie stumbling onto it and dicking with the
attached device).]
Jasen Betts
2024-03-11 05:25:08 UTC
Permalink
Post by legg
Post by Jan Panteltje
On a sunny day (Thu, 07 Mar 2024 17:12:27 -0500) it happened legg
Post by legg
A quick response from the ISP says they're blocking
the three hosts and 'monitoring the situatio'.
All the downloading was occuring between certain
hours of the day in sequence - first one host
between 11 and 12pm. one days rest, then the
second host at the same timeon the third day,
then the third host on the fourth day.
Same files 262 times each, 17Gb each.
Not normal web activity, as I know it.
RL
Many sites have a 'I m not a bot' sort of thing you have to go through to get access.
Any idea what's involved - preferably anything that doesn't
owe to Google?
I'd like to limit traffic data volume by any host to <500M,
or <50M in 24hrs. It's all ftp.
FTP makes it harder, you'll prably need to process the FTP logs and
put in a firewall rule once an ip address has exceeded their quota.
it may be possible to configure fail2ban to do this or you might have
to write your own script.
Post by legg
Have access to Pldesk, but am unfamiliar with capabilities
and clued out how to do much of anything save file transfer.
You'll probably need a root shell to do this setup.
--
Jasen.
🇺🇦 Слава Україні
Martin Brown
2024-03-08 11:16:46 UTC
Permalink
Post by legg
Got a note from an ISP today indicating that my website
was suspended due to data transfer over-use for the month. (>50G)
It's only the 7th day of the month and this hadn't been a
problem in the 6 years they'd hosted the service.
Turns out that three chinese sources had downloaded the same
set of files, each 262 times. That would do it.
Much as I *hate* Captcha this is the sort of DOS attack that it helps to
prevent. The other option is to add a script to tarpit or block
completely second or third requests for the same large files coming from
the same IP address occurring within the hour.
Post by legg
So, anyone else looking to update bipolar semiconductor,
packaging or spice parameter spreadsheets; look at K.A.Pullen's
'Conductance Design Curve Manual' or any of the other bits
stored at ve3ute.ca are out of luck, for the rest of the month .
Seems strange that the same three addresses downloaded the
same files, the same number of times. Is this a denial of
service attack?
Quite likely. Your ISP should be able to help you with this if they are
any good. Most have at least some defences against ridiculous numbers of
downloads or other traffic coming from the same bad actor source.

Provided that you don't have too many customers in mainland china
blacklist the main zones of their IP address range:

https://lite.ip2location.com/china-ip-address-ranges?lang=en_US

One rogue hammering your site is just run of the mill bad luck but three
of them doing it in quick succession looks very suspicious to me.
--
Martin Brown
legg
2024-03-08 17:17:32 UTC
Permalink
On Fri, 8 Mar 2024 11:16:46 +0000, Martin Brown
Post by Martin Brown
Post by legg
Got a note from an ISP today indicating that my website
was suspended due to data transfer over-use for the month. (>50G)
It's only the 7th day of the month and this hadn't been a
problem in the 6 years they'd hosted the service.
Turns out that three chinese sources had downloaded the same
set of files, each 262 times. That would do it.
Much as I *hate* Captcha this is the sort of DOS attack that it helps to
prevent. The other option is to add a script to tarpit or block
completely second or third requests for the same large files coming from
the same IP address occurring within the hour.
Post by legg
So, anyone else looking to update bipolar semiconductor,
packaging or spice parameter spreadsheets; look at K.A.Pullen's
'Conductance Design Curve Manual' or any of the other bits
stored at ve3ute.ca are out of luck, for the rest of the month .
Seems strange that the same three addresses downloaded the
same files, the same number of times. Is this a denial of
service attack?
Quite likely. Your ISP should be able to help you with this if they are
any good. Most have at least some defences against ridiculous numbers of
downloads or other traffic coming from the same bad actor source.
Provided that you don't have too many customers in mainland china
https://lite.ip2location.com/china-ip-address-ranges?lang=en_US
One rogue hammering your site is just run of the mill bad luck but three
of them doing it in quick succession looks very suspicious to me.
Beijin, Harbin and roaming.

Yeah. You gotta ask yourself; what's the friggin' point?

RL
legg
2024-03-11 14:42:18 UTC
Permalink
Post by legg
Got a note from an ISP today indicating that my website
was suspended due to data transfer over-use for the month. (>50G)
It's only the 7th day of the month and this hadn't been a
problem in the 6 years they'd hosted the service.
Turns out that three chinese sources had downloaded the same
set of files, each 262 times. That would do it.
So, anyone else looking to update bipolar semiconductor,
packaging or spice parameter spreadsheets; look at K.A.Pullen's
'Conductance Design Curve Manual' or any of the other bits
stored at ve3ute.ca are out of luck, for the rest of the month .
Seems strange that the same three addresses downloaded the
same files, the same number of times. Is this a denial of
service attack?
RL
You can still access most of the usefull files, updated in
January 2024, through the Wayback Machine.

https://web.archive.org/web/20240000000000*/http://www.ve3ute.ca/

Probably the best place for it, in some people's opinion, anyways.

RL
Peter
2024-03-12 12:54:06 UTC
Permalink
IME, the hidden google re-captcha works brilliantly against bots.
Presumably by examining the timing. Set the threshold to 0.6 and off
you go. I run a fairly busy tech forum.

Another approach is to put your site behind Cloudflare. For hobby /
noncommercial sites this is free. And you get handy stuff like

- https certificate is done for you
- you can block up to 5 countries (I blocked Russia China and India)

Ideally you should firewall your server to accept web traffic only
from the set of CF IPs, but in practice this is not necessary unless
somebody is out to get you (there are websites which carry IP history
for a given domain, believe it or not!!!)
legg
2024-03-12 13:41:24 UTC
Permalink
On Tue, 12 Mar 2024 12:54:06 +0000, Peter
Post by Peter
IME, the hidden google re-captcha works brilliantly against bots.
Presumably by examining the timing. Set the threshold to 0.6 and off
you go. I run a fairly busy tech forum.
Another approach is to put your site behind Cloudflare. For hobby /
noncommercial sites this is free. And you get handy stuff like
- https certificate is done for you
- you can block up to 5 countries (I blocked Russia China and India)
Ideally you should firewall your server to accept web traffic only
from the set of CF IPs, but in practice this is not necessary unless
somebody is out to get you (there are websites which carry IP history
for a given domain, believe it or not!!!)
My ISP has finally blocked all China IP addresses from accessing the
site.

Maybe that's what the bots want; who knows.

Haven't had access to the site to find out what the practical result
is, yet.

RL
bitrex
2024-03-14 21:37:13 UTC
Permalink
Post by legg
On Tue, 12 Mar 2024 12:54:06 +0000, Peter
Post by Peter
IME, the hidden google re-captcha works brilliantly against bots.
Presumably by examining the timing. Set the threshold to 0.6 and off
you go. I run a fairly busy tech forum.
Another approach is to put your site behind Cloudflare. For hobby /
noncommercial sites this is free. And you get handy stuff like
- https certificate is done for you
- you can block up to 5 countries (I blocked Russia China and India)
Ideally you should firewall your server to accept web traffic only
from the set of CF IPs, but in practice this is not necessary unless
somebody is out to get you (there are websites which carry IP history
for a given domain, believe it or not!!!)
My ISP has finally blocked all China IP addresses from accessing the
site.
Maybe that's what the bots want; who knows.
Haven't had access to the site to find out what the practical result
is, yet.
RL
Maybe consider hosting the web server yourself, using a virtual
machine/Promox as the host and a Cloudflare tunnel for security:


Don Y
2024-03-15 01:26:16 UTC
Permalink
Maybe consider hosting the web server yourself, using a virtual machine/Promox
The advantage is that you can institute whatever policies you want.
The DISadvantage is that YOU have to implement those policies!

And, nothing prevents your site from being targeted for a [D]DoS
attack, etc. Or, any other behavior that increases the cost to
you (in terms of your effort or servicing/hosting fees from
provider(s).

It's often easier (less hassle) to just avail yourself of some
free service to host the content and let THEM worry about
these issues. (unless you enjoy dicking with this sort of thing)
bitrex
2024-03-15 16:05:33 UTC
Permalink
Post by Don Y
Post by bitrex
Maybe consider hosting the web server yourself, using a virtual
The advantage is that you can institute whatever policies you want.
The DISadvantage is that YOU have to implement those policies!
And, nothing prevents your site from being targeted for a [D]DoS
attack, etc.  Or, any other behavior that increases the cost to
you (in terms of your effort or servicing/hosting fees from
provider(s).
It's often easier (less hassle) to just avail yourself of some
free service to host the content and let THEM worry about
these issues.  (unless you enjoy dicking with this sort of thing)
OK, don't have to self-host. There are possible privacy/security
concerns using Cloudflare for private data/WAN applications but for
public-facing generally static web pages it seems like a no-brainer,
they have pretty generous free plans.
Don Y
2024-03-15 20:07:45 UTC
Permalink
Post by Don Y
Post by bitrex
Maybe consider hosting the web server yourself, using a virtual
The advantage is that you can institute whatever policies you want.
The DISadvantage is that YOU have to implement those policies!
And, nothing prevents your site from being targeted for a [D]DoS
attack, etc.  Or, any other behavior that increases the cost to
you (in terms of your effort or servicing/hosting fees from
provider(s).
It's often easier (less hassle) to just avail yourself of some
free service to host the content and let THEM worry about
these issues.  (unless you enjoy dicking with this sort of thing)
OK, don't have to self-host. There are possible privacy/security concerns using
Cloudflare for private data/WAN applications but for public-facing generally
static web pages it seems like a no-brainer, they have pretty generous free plans.
IME, most of these efforts are just a shitload of unplanned effort,
that you (later) discover. And, are under some (self-imposed?) pressure
to keep running ASAP.

[I *really* don't like something imposing a timing constraint on my
actions. "Your site is down!" "Yeah. And?"]
Loading...