Preservation of historical ftp site content

General discussion for all topics related to DOS, Windows, Linux, consoles, etc. Anything to do with games.
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Preservation of historical ftp site content

Post by MrFlibble »

Some time ago I checked gamers.org but it was down (both http and ftp). I thought it was undergoing maintenance and when I checked back later it was working again. However, now it's down again, and already for several days if not more.

I was thinking, maybe it's time to start archiving those ftp sites that still have the old content of DOS and early Windows era games - the files themselves the directory structure (if present), date stamps on files, everything? The overall size of the files shouldn't be much of a problem with today's HDD capacity and Internet download speeds.

Here's a few potential candidates for archiving:

DOS, Win3.x and early Win95:
ftp://ftp.funet.fi/pub/msdos/games/
ftp://ftp.pl.freebsd.org/vol/rzm1/coast/games/
ftp://ftp.padua.org/pub/msdos/dos/games/local/
ftp://delphi.hs-niederrhein.de/pub/dos/games/
ftp://ftp.uni-potsdam.de/pub/systems/dos/games/
http://ftp.sunet.se/pub/games/PC/
ftp://ftp.uni-potsdam.de/pub/systems/win95/games/
ftp://ftp.fi.netbsd.org/.m/archive1l/ft ... n95/games/

Early 2000s:
ftp://ftp.farlep.net/pub/clubix/demo/
ftp://ftp.farlep.net/pub/clubix/demo2/
ftp://ftp.farlep.net/pub/clubix/demo3/
ftp://ftp.peliplaneetta.net/pelidemot/
http://ftp.gameaholic.com/pub/demos/

Official developer/publisher ftp sites:
ftp://ftp.3drealms.com/
ftp://ftp.atari.com/demos/
ftp://ftp.bluebyte.com/demos/
ftp://ftp.ea.com/pub/
ftp://ftp.idsoftware.com/
ftp://ftp.gtinteractive.com/demos/
ftp://ftp.lucasarts.com/demos/pc/
ftp://ftp.team17.com/pub/t17/demos/
ftp://ftp.ubisoft.com/
ftp://ftp.virginmedia.com/blueyondergam ... ndergames/
ftp://ftp.westwood.com/pub/
ftp://ftp.wizworks.com/demos/
http://download.mvpsoft.com/
User avatar
dosraider
Admin
Admin
Posts: 9243
Joined: Tue Mar 15, 2005 2:06 pm
Location: ROTFLMAO in Belgium.

Post by dosraider »

Now yeah, if you would want to backup all that's worth backing up of that kind of sites ..... much much work !!

Can't say it didn't cross my mind, but it's really too time consuming.

And yes, a shame some vanish.
wardrich wrote:The contrasts in personalities will deliver some SERIOUS lulz. I can't wait.
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Post by MrFlibble »

archive.org has this kind of project, called the FTP Site Boneyard. It seems to be progressing at a slow pace though, and snapshots of actual FTPs are huge (several GiB or more) archive files, not exactly easy to browse.
User avatar
dosraider
Admin
Admin
Posts: 9243
Joined: Tue Mar 15, 2005 2:06 pm
Location: ROTFLMAO in Belgium.

Post by dosraider »

The same thing happened to 'I have no idea how much' BBSses, it's simply too much data to download/archive, and even when you get some of them, the info pile is so huge you lose yourself in it.

Crap, I even don't get to make a comprehensive catalog of my own stuff I piled up .... *sigh*
..... I guess there maybe is enough time in a lifespan to download all that stuff from the internet, but there certainly isn't enough time to take a look at all that data ....
Problem ourdays is not the storage capacity, 1 or 2 or even more TB HDs are kinda cheap. Problem is the amount of data.

Man o man o manometer, on my PC for daily use (nothing really fancy, dual core@3GHz, 4GB RAM, Intel onboard graph card, but runs nicely silent) I have some 2 TB storage capacity and I hardly know what's on it.......
.... and add a lot of external HDs ..... trying to keep those externals a bit organized is already hard enough.

So yeah, at the end a lot of interesting data is getting lost forever.
wardrich wrote:The contrasts in personalities will deliver some SERIOUS lulz. I can't wait.
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Post by MrFlibble »

dosraider wrote:Problem ourdays is not the storage capacity, 1 or 2 or even more TB HDs are kinda cheap. Problem is the amount of data.
Absolutely. But that can be at least in part alleviated if a team is working on a project instead of a single person. It also helps if there is a possibility to easily search such archives (like Filewatcher for FTPs).
User avatar
dosraider
Admin
Admin
Posts: 9243
Joined: Tue Mar 15, 2005 2:06 pm
Location: ROTFLMAO in Belgium.

Post by dosraider »

I can only say you have some good and strong points.

Alas, as internet history proves over and over again, gathering such a group (even a large one) is possible, but also it falls usually apart and goes down after a while, usually when peeps begin to realize how much donkey work it asks and how much of their valuable free time it consumes .......
wardrich wrote:The contrasts in personalities will deliver some SERIOUS lulz. I can't wait.
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

If you use a program to just automatically download and mirror the whole site then it just becomes an issue of how much you can download and store. You (or others) can always go back to examine it later, but if the site disappears and nobody saved a copy, well it's too late then...
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Post by MrFlibble »

Yeah, first I thought of using the Free Download Manager or FileZilla to get FTP site contents, then also learned about HTTrack. However, none of them seem to preserve date stamps on folders, and the date stamps for files get corrected for the host OS time zone. Is there any way to circumvent that?
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

When you say the timestamps are corrected for the local timezone, do you mean that 5pm on the server is converted to 5pm localtime? Or 5pm server time appears as (for example) 2am local time because no conversion is being done?

I'm not sure there's a way around this because I don't think servers report their timezone. You can probably compensate by changing your local timezone before starting the download, this way you can add/remove hours from all the downloaded files in one go. If you set your timezone to match the server's then if the server says 5pm, your download might say 2am depending on where you are.

Under Linux I normally use the wget command-line program for things like this, and I believe there are versions available for other OSes too. It is designed for mirroring sites so it has plenty of options for this sort of thing, although none that I can see dealing with the timezone. Although under Linux you can change the timezone for just one program at a time so maybe that's why.

However I'm not sure what the best solution is. If you treat everything as localtime, then you will always know that a file that says 5pm your time was changed at 5pm their time wherever in the world it was from. If you convert it, then it will say 2am on your PC and you'll have to remember what timezone the people were in at the time and do the conversion to figure out what their time was when the file was originally uploaded. I think either way you'd need to record what timezone the files are in if you're worried about that.

However - ftp.3drealms.com appears to be a Linux server, so its directory listings include the date only. The time is not shown because under Linux, times are not included in directory listings if they're older than a certain age. So you'll never be able to get a precise timestamp anyway - only midnight on a given day. So be careful because if you convert timestamps your dates might be out by a day. Imagine someone uploading a file at 11:30pm on the 2nd, the server now reports it as 00:00 on the 2nd, and your timezone is four hours behind so your PC says 8pm on the 1st...

Again my experience is limited to Linux, but every time you modify a file (even renaming or moving in and out of a folder), the last update time of the parent folder is changed. This means that most mirroring software does not set timestamps for folders, because they would need to be reset every time a file within the folder changed (so they'd have to keep track of every single folder and set the timestamp last, once all the files have been downloaded.) However that does mean that at least on Linux servers, the last modified date of the folder will probably match the last modified time of the newest file within that folder.
User avatar
dosraider
Admin
Admin
Posts: 9243
Joined: Tue Mar 15, 2005 2:06 pm
Location: ROTFLMAO in Belgium.

Post by dosraider »

I suppose MrFlibble is talking about the original file time stamp.

If you down a file from let say '17/02/2012' it will be time stamped to today, losing the original time stamp.

And no, don't know how to avoid this.
wardrich wrote:The contrasts in personalities will deliver some SERIOUS lulz. I can't wait.
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

@dosraider: Most programs designed for mirroring sites will change the date on the downloaded file to match the server. But the problem is, what time do you use?

Let's say you're in Belgium and your timezone is UTC+1, but the server is in the US with a timezone of UTC-8. If a file was originally uploaded in the US at 23:00 on 17 Feb 2012, this moment was 08:00 on 18 Feb in Belgium.

So on your copy of the data you are keeping in Belgium, do you want the files to say they were last changed at 08:00 on Feb 18, because that's what time it was in Belgium when the files were uploaded? Or should it say 23:00 on Feb 17, because that's the time that was showing on the watch of the person who uploaded the files?
User avatar
dosraider
Admin
Admin
Posts: 9243
Joined: Tue Mar 15, 2005 2:06 pm
Location: ROTFLMAO in Belgium.

Post by dosraider »

Personally I couldn't care less for server/client date stamp.
I surely don't intend to crawl such ftp servers for 'updated' content, or whatever.

More important for me are the files original time stamp.
wardrich wrote:The contrasts in personalities will deliver some SERIOUS lulz. I can't wait.
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

Yes but the point is what time zone do you use for the original time stamp?
User avatar
dosraider
Admin
Admin
Posts: 9243
Joined: Tue Mar 15, 2005 2:06 pm
Location: ROTFLMAO in Belgium.

Post by dosraider »

Seems I need to fine tune my Engrish ....... :laugh:

Timezones doesn't interest me in the least.

What I want is when the file I download is last modified, reason is quiet simple, some versions of old games / patches / whatever used the file name but were last modified, code cleaned up, and so on ... but kept the same filename, call it laziness from the dev/coder. As for example Basic/GWBasic/QBasic files comes in mind ..... rarely they put on a 'version', they simply updated their code and kept same name, happend a lot in the earlier days, and not only with Basic files *sigh*..... for those you need the original time stamp.

FYI: Belgium is on CET, also means we follow the @#@#@#@# daylight saving time switches, man I hate that !
wardrich wrote:The contrasts in personalities will deliver some SERIOUS lulz. I can't wait.
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Post by MrFlibble »

Malvineous wrote:When you say the timestamps are corrected for the local timezone, do you mean that 5pm on the server is converted to 5pm localtime? Or 5pm server time appears as (for example) 2am local time because no conversion is being done?
Actually, it seems that every programme handles this differently if told to preserve original date stamps. I tried downloading a file first using Free Download Manager and then FileZilla, from the same FTP, with completely different results. FDM added four hours to the date/time that is displayed when I view the FTP contents either in Opera. FileZilla on the other hand added 24 hours (resulting in the date being different by one day) and two hours on top of that, without any fathomable reason.

I've decided to double-check stuff right now, and I'm a bit confused. I took this FTP address:
ftp://ftp.funet.fi/pub/msdos/games/starmines/

Now it turns out that Opera and FileZilla display the date stamps differently. For one, Opera seems to show the time as 0:00:00 for everything. FileZilla displays time as well. Another thing is that one file, sm2v11.zip, has a date stamp from 01.01.1995 as displayed in Opera, and FDM keeps that date too. However, FileZilla displays the date as 25.02.2012.

What's more is that now FileZilla got the exact date stamps as it displays for the site. I'll get back and double-check the other FTP I used as a test source yesterday.

Ah-hah, now I get it, the 24+2 hours in FileZilla happens when a site (this one in my test example) doesn't have any info on time, which as you said is typical of Linux servers.

Interesting thing though, I checked ftp.3drealms.com just now and FileZilla reports (and copies) time stamps for the files there.

So basically FileZilla does get original date stamps, and HTTrack does so as well.
dosraider wrote:What I want is when the file I download is last modified, reason is quiet simple, some versions of old games / patches / whatever used the file name but were last modified, code cleaned up, and so on ... but kept the same filename, call it laziness from the dev/coder. As for example Basic/GWBasic/QBasic files comes in mind ..... rarely they put on a 'version', they simply updated their code and kept same name, happend a lot in the earlier days, and not only with Basic files *sigh*..... for those you need the original time stamp.
Yeah, the file creation dates would be great but unfortunately not possible unless they zip it up again in a different file. At least file upload dates give some clue about the relative time of file creation, in that a file could not be created after it was uploaded :)
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

That's a bit odd then. I tried connecting directly to ftp.3drealms.com and I could not get the server to give me a time for /source/rottsource.zip (just a date.) What time did it give you? I wonder how it managed to get that time?
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Post by MrFlibble »

FileZilla reports the date/time for rottsource.zip as 17.02.2012 4:55:16 (at this point I can't tell if FZ uses a 12- or 24-hour time format though).
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

That's really interesting. Some investigation reveals there are (new?) FTP commands called MLST and MLSD which provide a directory listing in a format much easier for a program to parse.

A normal directory listing is platform specific, and it has always been problematic for graphical FTP clients to parse:

Code: Select all

-rw-rw-rw-   1 root     root      4017201 Feb 17  2012 duke3dsource.zip
-rw-rw-rw-   1 root     root      4033649 Feb 17  2012 PREY_SDK_2006-10-13.zip
-rw-rw-rw-   1 root     root      4037414 Feb 17  2012 rottsource.zip
-rw-rw-rw-   1 root     root      4763385 Feb 17  2012 shadowwarriorsource.zip
It's a problem because it's the same as doing "ls" or "dir" on the platform, so Windows servers produce very different output to Linux ones, and other OSes are slightly different too.

But MLSD produces a more detailed list, with full timestamps, in apparently a standard format:

Code: Select all

modify=20120217011059;perm=fle;type=pdir;unique=FD01UF33;UNIX.group=504;UNIX.mode=0750;UNIX.owner=10001; ..
modify=20120217005443;perm=adfrw;size=4033649;type=file;unique=FD01U2803AE;UNIX.group=0;UNIX.mode=0666;UNIX.owner=0; PREY_SDK_2006-10-13.zip
modify=20120217005516;perm=adfrw;size=4037414;type=file;unique=FD01U2803AF;UNIX.group=0;UNIX.mode=0666;UNIX.owner=0; rottsource.zip
modify=20120217005550;perm=fle;type=cdir;unique=FD01U2803AD;UNIX.group=0;UNIX.mode=0755;UNIX.owner=0; .
modify=20120217005619;perm=adfrw;size=4017201;type=file;unique=FD01U2803B1;UNIX.group=0;UNIX.mode=0666;UNIX.owner=0; duke3dsource.zip
modify=20120217005549;perm=adfrw;size=4763385;type=file;unique=FD01U2803B0;UNIX.group=0;UNIX.mode=0666;UNIX.owner=0; shadowwarriorsource.zip
Here the server reports rottsource.zip as 2012-02-17 00:55:16. I'm not sure if it reports the server's timezone, but it would make sense for all these timestamps to be reported in UTC, then the local client can just compensate. Indeed, FileZilla tells me rottsource.zip was modified at 10:55:16, which is 10 hours ahead and I'm in UTC+10 so that makes sense.

It looks then like FileZilla is going to record the most accurate timestamps, and they will be adjusted for your local timezone.
User avatar
MrFlibble
Demoniac Demo maniac
Demoniac Demo maniac
Posts: 3754
Joined: Sun Dec 05, 2010 11:39 am

Post by MrFlibble »

Yup, I've just looked up, UTC+04:00 is in fact the setting for Moscow (although I've always thought it's GMT+3), so everything matches.

I guess this should also explain the "offset by one day" for files from servers with no time stamp info - apparently FileZilla treats 0:00:00 as 24:00:00 and adding four more hours sets the date to the next day.
User avatar
Malvineous
Time zone stampede
Time zone stampede
Posts: 29
Joined: Sat Sep 07, 2013 4:25 pm
Location: Brisbane, Australia

Post by Malvineous »

Interesting. Well at least FileZilla has an option (in the Site Manager, on the Advanced tab) to adjust the server's timezone offset, so for those servers you might be able to tweak it to get the timestamps you want. Or if there are no timestamps across the whole server, you could set them all after the download to something like 1970-01-01 00:00 which is often used to indicate an unknown timestamp.
Post Reply