How To Archive Your Online Articles

Source

When Yahoo Contributor Network (YCN) shut down recently, I had more than a hundred articles published on the site. Unless I did something to preserve that treasure trove (at least that’s what I considered it to be), all that content would simply disappear when the YCN site went away.

To make sure that didn’t happen, I wanted to create an online archive of my articles where they would be available in almost exactly the form in which they appeared on the YCN site.

Knowing that a number of these articles had already been stolen and republished on rogue websites, I needed for my archive to be accessible online. That way I could simply provide a link to my original content in order to establish proof of authorship when filing DMCA copyright violation complaints.

On the other hand, because I would be republishing some of this work on other writing sites, I needed to insure that my archive would not show up as duplicate content in web searches done through Google, Bing, or other search engines.

Since I already had a website I could use to host my files, I just had to figure out how to transfer my articles so that they both retained their original appearance and wouldn’t be listed by search engines.

After some trial and error, I came up with a three-step process to create such an archive, and I thought it might be useful to other online writers to know what I did. The steps are:

1. Copy your article web pages onto your computer

2. Upload your articles to your web site

3. Set up a robots.txt file to prevent search engines from seeing your files

I make no claim of this being the best way to go about creating such an archive; it’s simply the way I chose to do it.

So, here are the steps a writer could take to create an online archive similar to mine.

1. Copy your article web pages onto your computer

The first step is to get a copy of each of your article web pages, along with all files (like image files) necessary for the page to appear as it originally did. For me as a Windows user, this was a very simple though somewhat time consuming process.

All you have to do is open the web page of each article in a browser, and do a Save As to your computer.

In Windows this is as simple as hitting Ctrl-S. That opens up a window that will allow you to save the article’s web page file, plus all the ancillary files that are necessary to retain its original appearance.

Source

Saving your web page to a folder on your computer

Start by selecting or creating a folder on your computer to receive the downloaded files. Now, for each article file, open it in your browser and use Ctrl-S to save it into the folder you selected.

The Save As process will place two entities into your download folder. The first is the file named in the File name box. The second is a folder containing all the files necessary to allow the page to retain the appearance it had online.

Here’s how the Save As box looked when I clicked Ctrl-S to save an article called Pennsylvania’s “Benevolent Gesture” Bill Makes Sense into my Yahoo folder.

Video Tutorial: How to copy web sites

Do you think you're ready if a writing site that has your articles shuts down?

See results without voting

As you can see, both the web page file and the folder containing the ancillary files have the same name, except that the folder has “_folder” added to the end of the name. This common name is what links the two together.

Important tips concerning file names

The name with which you download your web page will be its name from now on. That’s because if you rename either the web page file or its associated folder, the link between them will be broken. That happens even if you rename them to the same name. The only approved way to rename a downloaded web page is to open it in your browser, and save it again under the new name. So, be sure to put your desired name into the File name box before saving the page.

I should have modified the name of this file before saving it for a couple of reasons.

First of all, the name automatically given to it by the YCN site carries a lot of extra baggage I didn’t need (the part that says “-Yahoo Voices – voices.yahoo.com”). All I really wanted for the downloaded filename was the article title alone.

Watch out for “special” characters in the file name

The second reason I needed to choose a different name is that the article name has some non-standard characters in it. Although they don’t cause a problem on my Windows computer, when the article web page and its associated folder were uploaded to my web site, those non-standard characters prevented the linkage between the two from being recognized. The result was that although I could see all the written content of my page, all the formatting, as well as the images it contained, were lost.

Here’s how the original page looked on the YCN site:

But because of the interference caused by the non-standard characters in the name, here’s how it appeared on my web site:

Here are the non-standard characters that can get you into trouble

What were those non-standard characters that messed up my beautifully formatted page? Here are the ones I’ve found: ; : ‘ ’ “ ” –

These are the “smart” versions of double quotes, single quotes, and dashes that may be produced by a document editor like Microsoft Word, plus colons and semicolons. When my website server sees any of those characters in a file or folder name, it doesn’t know what to do with them. Here’s how the name of the file I uploaded looked in the file manager of my website:

Pennsylvania�s �Benevolent Gesture� Bill Makes Sense - Yahoo Voices - voices.yahoo.com.html

The easy solution is to either strip such characters out of the file name completely, or replace any “smart” characters with their simple equivalents. In other words, if I select a smart quote ( ) in the Filename box, and type over it with that same character from the keyboard, ( ) becomes ( " ) and the problem is eliminated.

Get rid of spaces!

One final thing I would now do in renaming my downloaded web page is to replace all spaces in the name with dashes. So, “Bill Makes Sense” would become “Bill-Makes-Sense”. The reason for that is purely esthetic. Your website server will automatically change any space in a filename to %20. So, “Bill Makes Sense” would be seen as “Bill%20Makes%20Sense”. I’d rather see the dashes.

Once you get your article web pages downloaded to your computer under the names you’d like them to have, the next step is to upload them to your web site.

Source

2. Upload your articles to your web site

You will need to upload both the article file and its associated folder to the same folder on your website. The easiest way to do this is by use of a program called an FTP client. That’s simply an application you run on your computer that allows you to bulk upload files to the chosen folder on your website.

The FTP client recommended by my web hosting service is FileZilla, and that’s the one I used. You can get more information about this program at https://filezilla-project.org/.

In researching FTP clients I ran across an interesting alternative you might want to check out. It’s called FireFTP. As the name suggests, it’s an add-on to the Firefox browser. Once you install FireFTP, it will appear on the browser’s Tool menu. You have but to click on it, and a simple, easy-to-use window opens that will allow you to quickly and easily upload your files.

You can see further information about FireFTP, and download it if you so desire, on cnet.com.

Source

3. Set up your robots.txt file to prevent search engines from seeing your files

Search engines use web crawling robots to identify every file that is accessible from the internet. However, there is provision for people who don’t want these robots to see their files to opt out. It’s called a robots.txt file.

The robots.txt file, which is housed in the top-level directory of your web site, gives specific instructions to any web crawler about which folders or files on your site should be ignored.

In another article I give detailed instructions on how to set up a robots.txt file. Please see:

[ How to use robots.txt to hide your files from search engines ]


How to get a Table of Contents for your uploaded files

Here’s one final tip I found very useful. If you enter the name of your archive folder (without any file names) into your browser, it will list the files that folder contains. For example, if your archive folder is hosted at

http://mywebsite.org/myArchive/

typing that into your browser will produce an page that looks something like this:

Index of /myArchive

  • Parent Directory
  • My-First-Article.html
  • My-First-Article_files/
  • My-Second-Article.html
  • My- Second -Article_files/ ... and so on.

You can open any article simply by clicking on its link on the index page.

Also, I found it convenient to copy the index into a Microsoft Word document (Ctrl-A followed by Ctrl-C in the browser, then Ctrl-V to paste the list into Word). That way, I can use that Word document as a Table of Contents, and access any of my article files simply by holding down the Ctrl key while clicking on the link (Windows 7).

My files look the way they should

My uploaded files appear on my website in almost exactly their original form, including, by the way, comments and most ads.

If you’d like to see the article I’ve been using as an example of the process, you can access it by clicking here.

There may be quicker and easier ways to do what I’ve done here, but for someone who’s sole interest is in preserving his articles exactly as they originally looked, this works for me.

I hope it works for you as well.

© 2014 Ronald E. Franklin

More by this Author


36 comments

MsDora profile image

MsDora 2 years ago from The Caribbean

Thank you for these instructions. I'll have to practice. This is very useful. So thoughtful of you!


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Thanks, MsDora. I saw a lot of confusion among writers on how to recover their articles when YCN went away. Hopefully we'll never have that issue on HP!


ologsinquito profile image

ologsinquito 2 years ago from USA

This is excellent information. Just today, I'm setting aside some time to back up my newer HP articles, after someone in the forums advised us to do this. Voted up and shared.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Thanks, ologsinquito. Setting up an archive before a hosting site disappears is the wise thing to do!


OldRoses profile image

OldRoses 2 years ago from Franklin Park, NJ

I write all of my hubs in Word and then save those documents in my cloud drive. They are readily accessible from anywhere with an internet connection and are backed up by the provider. Many vendors such as Microsoft and Amazon provide free space on their cloud drives so storing your documents won't cost anything.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Hi, OldRoses. I do exactly the same in producing my hubs. But the further step I like to take is to archive the finished hub as it appears online. I spend a fair amount of time formatting the hub, adding photos and various capsules that aren't there in the initial Word version. Plus, I usually end up making a lot of corrections directly to the hub before and immediately after publication. So, I really like archiving the finished hub in its final form. Thanks for reading.


Laura335 profile image

Laura335 2 years ago from Pittsburgh, PA

I've been there. Thanks for the reminder to make sure I back up all of my files!


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

You're very welcome, Laura335. And thank you!


SusanDeppner profile image

SusanDeppner 2 years ago from Arkansas USA

Excellent information and you explained it very well. Thanks for sharing your hard work with the world!


Donna Cook 2 years ago

Terrific explanation! As a Squidoo refugee, I frantically printed my articles to an XPS file on my computer. I need to get them to the cloud for DMCA purposes. Thank you very much for the detailed info.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Thanks, Susan. I appreciate that!


mumsgather 2 years ago

Interesting way to backup. Backing up has always been a headache for us online writers. Thanks for sharing a way to do it effectively.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Thanks so much, Donna. From my experience at YCN, this is the kind of info I hope writers will pay attention to before the need arises.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Hi, mumsgather. Backing up is a definite headache, but a necessary one. I'll be very satisfied if this article makes it a little less of a headache for some writers. Thanks for reading and commenting.


esmonaco profile image

esmonaco 2 years ago from Lakewood New York

Thanks for taking the time to write your method. I'm not that technical, so I need all of the help I can get :)


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Thanks, esmonaco. My aim was to write it in a way that non-technical folks could follow. I hope it helps!


Linda BookLady profile image

Linda BookLady 2 years ago from Post Falls, Idaho, USA

I've already saved all my Squidoo lenses to my computer. They look fine there so I won't be trying to upload them to my website. I couldn't let the site go down without the opportunity to save the pages as I originally intended them to look.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Hi Linda. It's great you've got all your lenses saved. That puts you ahead of a lot of folks! Just remember that hard disks do crash, so it's good to have two levels of backup on different storage devices.


Linda BookLady profile image

Linda BookLady 2 years ago from Post Falls, Idaho, USA

Ron... I have one backup covered because I subscribe to Mozy.Com . . . I'll have to think of what other backup would be most appropriate.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Linda, IMO if you've got online "cloud" copies as well as the ones on your computer, you should be covered. It would be very unusual to lose both at the same time.


Linda BookLady profile image

Linda BookLady 2 years ago from Post Falls, Idaho, USA

Thanks Ron... good to know you think that's safe. I like the Mozy service... though I've never had the occasion to need my stored information.


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

You're very welcome, Linda. Like you, I have Mozy am very glad I've never yet had to use it to restore anything.


Sparrowlet profile image

Sparrowlet 2 years ago from Massachusetts, USA

Very informative and helpful hub! I had no idea they were shutting down. I contributed some articles a few years ago, but I guess I didn't get the email about shutting down. So now all our articles revert back to us? Do you know what happens to the ones they paid directly for? Thanks for the helpful tips on storing articles. I believe I still have mine somewhere on my computer!


RonElFran profile image

RonElFran 2 years ago from Mechanicsburg, PA Author

Thanks, Sparrowlet. Yes, the rights to all our YCN articles revert to us, including the ones for which we received up-front payments. I'm not sure they sent emails - I didn't get one. But the notice appeared on your dashboard screen when you signed in. I wonder if there aren't many YCN writers who didn't hear, and will be unpleasantly surprised to find that their articles have vanished into cyberspace.


mary615 profile image

mary615 22 months ago from Florida

As a former user at Bubblews, I'm glad I saved all my articles there. Most of the posts I wrote there were not worth saving though!

When filing a DMCA complaint, it is very important to have your original article available. I just went through that process (successfully).

Voted this UP, etc.


RonElFran profile image

RonElFran 22 months ago from Mechanicsburg, PA Author

Thanks, mary615. I've had quite a few DMCA actions now, and having an archive of my articles has been very advantageous. My authorship has never been questioned. Like you I have many Bubblews articles, some worth saving and some not. I'm slowly copying and then deleting the ones I can use elsewhere. The rest I'll probably just leave there - it's not worth the trouble to take them down.


VirginiaLynne profile image

VirginiaLynne 22 months ago from United States

Thank you for this information--I have not saved my HubPages articles and this is terrific information about how to do it.


RonElFran profile image

RonElFran 22 months ago from Mechanicsburg, PA Author

Thanks, VirginiaLynne. I've found it very helpful, especially in filing DMCA complaints.


Chelle Cordero profile image

Chelle Cordero 21 months ago from northeast USA

This is a wonderful source of info for all writers who like to keep a working online portfolio available as well. Thank you.


RonElFran profile image

RonElFran 21 months ago from Mechanicsburg, PA Author

Thank you, Chelle. Having seen the confusion and stress some authors went through when their articles were about to be taken offline, I think this is very important.


Debbie Snack cake profile image

Debbie Snack cake 21 months ago from Iowa

I just found your article, by accident; and I find it very helpful, thanks for sharing your information with us; I normally use, a USB stick, to save my files, and I can take them with me. And I'll never lose them again. But I will look into your method. And I started to follow you today; Hope you stop by, my profile sometime.


RonElFran profile image

RonElFran 21 months ago from Mechanicsburg, PA Author

Thanks, Debbie. I hope the article will prove useful for you. Having multiple layers of backup is the safest course for preserving our work.


sallybea profile image

sallybea 20 months ago from Norfolk

RonElFran

This is very useful advice and I am very grateful to you for sharing this info.

Thank you so much.


RonElFran profile image

RonElFran 20 months ago from Mechanicsburg, PA Author

Thanks, sallybea. It's the kind of info that remains very relevant in today's swiftly changing online environment.


adevwriting profile image

adevwriting 13 months ago from United Countries of the World

@RonElFran Really good advice for writers. Every writer would feel bad if they were to lose their work in such ways. Sharing on Facebook and Google+!


RonElFran profile image

RonElFran 11 months ago from Mechanicsburg, PA Author

adevwriting, as sites continue to shut down, some without warning, backing up is a must!

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working