Showing posts with label Ancient Tech. Show all posts
Showing posts with label Ancient Tech. Show all posts

Thursday, March 20, 2014

Death of a Hard Drive

Diagnosis

You just never know what you're going to get hit with from one day to another. Late last week I got a call from our operator at the Five-Mile Pond Dam. He said that there was an odd alarm on the SCADA computer that wasn't going away when you acknowledge it. It was a strange alarm, but it was enough to point me in a direction:
MAC_DL Lost DB connection to SQL Server
There was also one for MAC_PTDL but since that part of the error didn't mean anything to me I figured it was probably secondary to the primary issue of the lost connection to the SQL Server. I don't pretend to be an SQL guy and I had nothing to do with how the computer was set up seven years ago. I do know that the SQL Service was running and it claimed that the database was online (this turned out to be too general to be useful).

I started to dig through the admin tools for the SCADA software (Cimplicity) and was on the verge of contacting the company that developed the SCADA for us. Before I did that I was trying to check everything I could find and stumbled into the Trends section. Trends are stored to the database but everything looked fine until I tried to close out of the Trends admin area. As soon as I tried to do that it told me that the database it was trying to use ("FiveMile") no longer existed. The list of available databases contained nothing that looked like it would be used for trends and definitely didn't contain one called "FiveMile."

Database Woes

As I said, I'm not a database guy. I am generally pretty savvy though and found my way into the SQL Database Admin tools pretty easily. At this point I found a list of available databases. This list included one called "FiveMile" but it was grayed out and next to it was the label "suspect".
FiveMile (suspect)
What does that even mean? I'm sure there are people out there who really know, but from what I found it basically means that something happened (computer crashed?) and when it started back up SQL tried to restart the database and found a problem that it couldn't repair. Thus it suspects there's something wrong.

I spent a great deal of time learning about suspect databases and emergency mode and running chkdb and all sorts of related errors and status codes. It was rough. Once I got the DB back up chkdb didn't want to run on this ancient XP machine with a 5GB database. The errors in the SCADA had changed but weren't quite gone. I thought I had made progress but couldn't be sure and was planning to do some more DB research when things changed...

Hardware Failure

I had assumed that the DB issues were stemming from the hard drive starting to fail. That wasn't quite right though, looks like the DB issues were the end of the failure, not the start. I had placed an order for a new hard drive the first day I was exploring the issues and I'm glad I did because the next day the computer died. Locked up on boot and further boot attempts resulted in "No boot device found" sort of errors. The site is 400 miles away so I couldn't verify but since my remote access was gone I was pretty sure that was it. 

These machines are old, not very good to begin with (Dell 830 tower) and in very dirty environments. I'm surprised it lasted 7 years straight. I had anticipated this though. Back in January when I was doing the network upgrades I decided to make backups of the computers (FiveMile and Quinabaug) just in case. I wasn't part of the initial plan and I didn't run it past upper management, I just did it. 

I bought a USB to SADA (bare drive) adapter and wanted a rugged external drive to save it all on. After a TON of research I ended up with the Silicon Power Rugged Armor A80 external 1TB drive. I'm not yet at the point where I would say it's the best drive ever, but it's supposedly water proof and dust proof. It has a good warranty and it includes a very short USB3 cable that slots into the side of it, which is VERY handy. I've never used any other cable with it. 

For the actual backups I went with a program called DriveImageXML from Runtime Software. I think I went with it because it was free and sounded like it did a reasonably good job. 

So, back in January I anticipated a hardware failure and back a backup of the SCADA software at Five-Mile Pond Dam. Now, in March, the drive failed and it's time to test my backups...

A Simple Drive Restore

Everybody knows that you NEVER test your backups until you need them. I am, of course, being sarcastic, but we're not in a position to have a bunch of extra hardware laying around to actually perform those sorts of tests on. So I made backups, took them away with me and hoped that we'd never need them. Yet here we are, needing them. 

My HOPE was that I could restore onto the new drive here at my house, then drive the 400 miles out to the dam and install it them drive home the next morning. I had purchased a Data-Center quality drive that was supposedly designed for long-term reliability which should arrive Monday. I'd drive out Tuesday and come back Wednesday. 12 hours of driving in two days isn't my idea of fun but without that computer there was no remote access and the operator would have to be onsite a lot more. So let's do it. 

The restore went smoothly on Monday afternoon and it looked like the drive was all ready to just be plugged in and it would work. I haven't done many windows restores so I assumed it would go smoothly and struck out Tuesday morning with every expectation of this being a cake walk. I'd install the drive in 30 minutes then be off to dinner and the hotel where I would monitor it remotely during the evening, meet the operator briefly in the morning and head out. Oh the best laid plans...

Murphy is my project manager

I arrived at the site around 4:30pm after some delays and cracked open the PC. A dead drive doesn't really look any different from a good one, but seven years of industrial dust gives a drive a strange texture:

I suppose it could have been worse after seven years. Anyway, I easily swapped in the new drive and buttoned it up. Turned it on and... .... .... No love. No boot device present. Immediately I'm like, "it must ben the Master Boot Record, it probably didn't get restored." I go to look up how to fix it and it requires the Windows install disk. Yeah... no idea where that is.

The operator thinks it's at the other plant so we run down there and manage to find a Windows 2000 dell install disk. What's the OS on the machine at the other plant? I can't remember, might be 2000... (rookie error). So we grab the disk and head back. I boot into the repair console and run FIXBOOT which is the typical first step. Still no luck. 

Things get a little hazy here as I started to get nervous. My tech cred was rapidly disappearing as the operator patiently waits for his computer to come back online. I try a bunch of stuff which may or may not have been in the following order:
  1. Realize the partition needs to be "ACTIVE" to boot so I rip the drive out of the machine and hook it back up to my laptop to fix that. 
  2. Once that's done it turns out the FIXBOOT for Win2k won't help a WinXP install. (stupid) and now there's a MISSING NTLDR error. 
  3. Operator takes THREE more trips to the other plant before we FINALLY find the windows XP dell install CD.
  4. XP Repair console doesn't really like what I did with the Win2K repair console. Out of desperation I try FIXMBR, which it tells me might mess up my partition table, but I didn't read it. 
  5. My partition table gets messed up and the drive is dead. 
At this point it's over. There's nothing I can do at the plant. We pack the computer (big old dell tower) and monitor, keyboard and mouse into my car and I tell the operator I'll fix it in the hotel tonight and we'll get it set up in the morning. I'm pretty down but determined to get this working.

Determination and tenacity win the day

I know this is it. If I want to head home tomorrow morning I need to get this done tonight no matter what. Earlier I had realized that I forgot my USB CD Writer at home (STUPID) and I saw the potential of needing to but an emergency boot disk of some sort. At the very least I didn't want to realize that I need to do that at 11:00pm and be SOL. So I tried a Target first and then a Staples and found an external SuperDrive and some blank CDs and DVDs. Then I grabbed some takeout (Subway) and hunkered down at the hotel. 

First order of business was to restore the data again. I repartitioned the drive (set partition 0 to Active) and began the restore. Thankfully there's not a lot of data on the drive and it only takes and hour or so to restore from the external drive. During this time I'm talking to my wife and kids and the question on everyone's mind is, "You are still coming home tomorrow, right?" My answer is that this is still the plan, but in my heart I was not yet at a point where I was making progress.

With the data restored I pop it back into the tower and boot. No luck, just as I thought. So the Windows Repair console didn't seem to help much last time so I tried something new. I had downloaded something called the Ultimate Boot CD and installed it on a USB drive. I tried to boot the dell off the USB and it works. Unfortunately the Ultimate Boot CD is anything but straight forward. I spend about 30 minutes gingerly navigating the tools on the CD (USB) before giving up. Nothing makes sense and the documentation is all out of date. 

Back to the Windows XP Repair Console. This time when I start the console it asks me which Windows install I want to repair, this is new, and lists C:\Windows. I'm very excited. It's recognized something new that it didn't last time and I eagerly selected that entry.
Please enter the Administrator password: 
Oh, no. I try all the standard options and nothing works. Every three attempts I have to reboot and launch the repair console again (about 5-10 minutes each time). I've got no idea. We only used the "Operator" user and never the Administrator so nobody has any idea. The repair console won't let me  repair without the password. Ugh.

I remember anticipating this issue at the other plant back in January. I had made a CD of the Offline NT Password and Registry Editor, a tool I had found that supposedly allowed you to reset windows passwords. Well if ever there was a time to test it out... I dig it out of my bag and try to boot off it, but it wont boot. Crap. Try again. Finally it boots. I don't know what the issue was but it does finally boot and the provided instructions for the tool are perfect. Everything goes smooth, I reset the Admin password and I'm finally able to boot into the Repair Console! Whew!

Of course we're not done yet. I start again with the FIXBOOT command. It appears to work correctly and I reboot the machine. There's no boot device error this time which is GREAT, but instead I get another weird error:
Windows could not start because the following file is missing or corrupt: <Windows Root>\system32\hal.dll
HAL.DLL.
What the hell is that? First link off a google search sends me to About.com but is promising. It provides a list that I start going though:

  1. Reboot, could be a fluke. Nope didn't work.
  2. Check bios boot order. Yup, it's fine.
  3. Run XP restore command. Looked this up, didn't like the look of it. Skipped it. 
  4. Repair boot.ini. Restored boot.ini from a backup, I'm sure it's fine.
  5. New Boot Sector. This is FIXBOOT. I did this.
  6. Recover bad sectors. New drive hopefully this isn't it.
  7. Restore HAL.DLL from XP install CD. Really? I just restored from a backup, this should be fine.
  8. Repair Install. Risky.
  9. Clean Install. No way...
Not feeling very good about this HAL.DLL thing. I decide to be more thorough. I go back up to boot.ini and start trying to look into it. The Repair console doesn't give you a lot of options and I eventually remember the MORE command to view text files. Running MORE on boot.ini gives me some crap that I don't fully understand. It shows a "path" to the hard drive where windows is supposed to be installed. Doesn't look quite right. Seems like it SHOULD be saying partition 0, not Partition 1. So I do some research into how to edit boot.ini which leads me to the BOOTCFG command which has an option to display the currently available OS installations. Sure enough it shows Windows on Partition 0. So somehow there was something else on Partition 0 of the original hard drive that I didn't back up (or restore), oops.

Some more searching lands me a bootcfg tutorial which helps me add the correct drive to boot.ini. 

BOOM!

No more HAL.DLL errors and the machine boots up and seems to be running fine.

FWEW. 

I have a new level of loathing for Windows but at least I can say I learned something. I ran some more tests and in the morning we set it back up in the plant and everything is running smooth. YES! After a quick trip to staples to return the stuff I bought the previous night I'm on my way home. 

Wow. What a day. Anyway everyone was happy and I'm glad I was able to get everything working. I'm still not completely happy with the backup/restore process but it was WAY better than nothing. 

Sunday, June 30, 2013

Debugging a Model A

I got a call today from my father. Turns out that he was having a fuel problem with his new Model A Ford hotrod. The thought was that it was probably out of gas, but he wasn't sure because his fuel gauge didn't work.

He basically built the Model A from scratch. He pulled the rusted out frame and body from the woods and rebuilt it from the ground up. It's been cool watching him build it up from nothing but I must say, it's incredibly complicated to rebuild a car, even an old one. I sometimes imagine trying to rebuild the van from the ground up and it's mind boggling. The Model A is a 1930's car so it's not quite as bad. I can see now why Peter D has taken 8+ years to rebuild his 60's era Beetle.

So back to the task at hand. Putting more gas in the tank did NOT help and we had to assume that the fuel filter was clogged. A quick trip to his house yield s a fuel filter from an earlier vintage that should work fine. swap out the filter and he gets about another mile further along before it dies again.



This time we find that the fuel line is clogged and the filter seems fine (the new filter was in a transparent housing). So a wire up the fuel line and a drain pan was necessary. And, low and behold, the fuel line was clogged up with ... bugs. That's right, the fuel tank was full of dead ants and metal shavings. Well, not full, but enough to clog the fuel lines a few times. We babied it home and my father spent the rest of the day cleaning it out. yuck.

I wish I'd have gotten picture of the actual bugs. It brings new meaning to the term debugging.

Saturday, June 29, 2013

More old mill stuff

I spent some more time exploring the mill the other day. Found some more cool stuff I wanted to post.

HELP!

Sir, we've got a leak in our penstock.


Are you feeling lucky, Mr. Bond?

For the record, this thing is very big and very rusty and if it fails there will be a very big problem...

Tuesday, June 18, 2013

How do you know your hydro site is working properly?

Got trained today on at the Gilman plant. I use the term loosely because I'm not going to be an operator but I need to start understanding everything that goes on. My partner took my around and walked me through the "mechanical rounds" which the operators make in order to ensure that everything is running properly.

So how does one determine whether their hydro site is running properly? Essentially it comes down to checking about a hundred oil levels and temperature gauges. You see, if you run out of oil (or hydraulic fluid) then shit starts going south... fast. Similarly, a bearing that is going bad then it will generate more friction which will cause it's temperature to rise. I'm not 100% clear on how many of those temperatures are monitored via the PLC.

One of the reasons that Gilman has a full time staff, unlike all our other locations, is that it is not fully automated. One of my tasks will be to help bring it up to full automation, though I'm concerned that in doing that I may end up putting some people out of jobs...

Update on my desk

So, the "desk" I picked out of the old mill office ended up being a bigger find than I realized. You may have seen the old picture, it just looked like a reasonably good drafting table. Turns out to be a bit more.


So in addition to it being a drafting table, it's also a light table. The internal lights can be adjusted for different brightness levels and the surface is a nice frosted glass. This picture shows that it can (easily) be adjusted from completely horizontal to completely vertical. I'm not exactly sure why yet but I'm sure it will become apparent soon...

Finally, and most spectacularly, is the fact that it has an electric height control. There's a toe switch that allows you to raise it up to about my armpit level and down to my waist or even a bit lower. Very cool. It weighs about 200 lbs and has so far been painful to move around but I'm excited to finally get it into an office. Maybe by July...

Tomorrow I'll try to figure out the brand on this thing. I keep meaning to get that.


Friday, June 14, 2013

A ghost town attached to a hydro site

I mentioned the other day that the Gilman site used to be a paper mill and that the majority of the mill was now unused. Actually, I think my words were that the rest of the mill was dead. There are different flavors of dead at the mill.

There's the upper middle section that housed the paper machines themselves. Those areas are industrial dead. There's minimal lighting and tons of junk lying around. We're trying to sell the machines so we're trying to "preserve" it a bit with as little effort as possible. The outside of that area of the building looks pretty awesome. The paper machines ran so hot that there's tons of HVAC mounted on the outside of the building.


I love it.

The next section of the mill is the nasty middle office area. While not nasty back in it's day, these lower level mill offices were used for managing the floor. Unfortunately, the roof failed...

Now it's all black mold and water damage. It's so bad I want to hold my breath while walking through it. I was exploring one day during the rain and it's just terrible. There's water coming in everywhere and a storm drain pipe split on the floor and just pours water in. It's just nasty.

Then we come to the "abandoned" type of dead of the main administrative offices. This area looks reasonably normal if a bit neglected. The really disconcerting part of it is that every calendar still shows 2007. There a few areas where unopened mail is still sitting there post-marked 2007. We found someone's lunch (yuck) and a few other signs of an unexpected closure. 

(Coffee?)

Then there's just all the weird stuff you find lying around. We were scavenging primarily for furniture but we were also just looking for anything that might be useful. The guy who ran the mill when it closed already grabbed a bunch of stuff and I heard that one of our (former) operators had been stealing stuff from the front office as well. So we weren't sure how much we'd find. 

One strange one was a fairly modern router wired into the office network, plugged into an old APC power supply and still-on. There was no internet connection hooked in but it was placed near one. It was hacked in so strangely that it had to be reasonably new. 

Turns out we had tried to access some information off the old mainframe (VAX) when we first purchased the plant and had hacked in this network. Apparently it didn't work but I'm not sure what the goal was. That router came with us for use elsewhere.

I found a nice new laptop, the only one left probably. I expect any others were stolen. It's about 2" thick  and not very sleek. Pretty classic though. That keyboard has some serious travel too...
Damn kids and your trackpads...

Since furniture for PHS's new office was our first priority I was scrutinizing all the nice wooden desks in the administrative offices. They're all basically the same as the one above and not particularly interesting. Back at home I have an old oak drafting table that came out of another paper mill and I was hoping to come across something similar. Turns out I was in luck.


It doesn't particularly look like much here but that drafting table is incredibly solid and in really good shape. I'm excited to get it out of here and into our offices.

Overall there were some interesting and strange things we found while digging through the old mill offices. Overall it was just kind-of sad. Whatever happened it was very sudden and nobody really gave a crap. There's no sign of a bank coming through to sell off stuff. We just got the whole mill, as-is, when we bought the hydro portion. 

Our problem now...

Tuesday, June 11, 2013

An overview of the Gilman site

The Gilman site is one of the hydro sites we're involved in. It's the biggest site we work with and it consists of four turbines of various sizes. It also happens to be attached to an old paper mill that has not run since 2007 (before we purchased the site).

There's an interesting history with the Gilman site particularly with my family (and others who work with us). When my father finished college he spend a year in Illinois (where I was born) before heading back to Northern NH where he grew up. His first job when he got back was at the Gilman paper mill. At the time it was owned by Georgia Pacific and he was working on a wood boiler generator. We're still trying to get that generator back online but the paper mill is pretty much dead (more on that later).

The hydro site is in good shape though. As I said, we've got four turbines working which put out almost 4 Mw/h when there's good water, which there is right now. What's really interesting though is the fact that the turbines provide an glimpse into the history of hydro.

We have two turbines (called #3 and #4) which were installed in the 1930s. The've been upgraded, but there's still a great deal of original stuff on them. For example, the actual generators attached to them are original and still working. They look nothing like modern generators (which look like big electric motors) but there's no reason to remove them.

(1930s Generator)

And then there's this crazy contraption, which (I believe) controls the wicket gates for the turbine. We're replacing it with a single hydraulic cylinder, but in the old days it did something a little more complicated...
The WHOLE thing is being replaced by a single hydraulic cylinder...

These first two turbines are of a style known as a double camel-back. As far as I know, it's not a style used anymore. 

The next turbine was built in the 1960s and is known as the #2 Turbine. I believe it's a vertical francis turbine and it's generator is significantly smaller and more efficient than #3 or #4. I need to get a picture of #2. It looks completely different and is in a very different section of the power house. 

The #1 Turbine is our newest (installed in the 1980s) and our most powerful. On a good day it will do 2.5 Mw/h. It is a horizontal Caplan set up very similar to the image that links to. 


This is a picture of the generator and gear box for the #1 Turbine. It's big, don't get me wrong, but considering it's generating 3x the power, it's still pretty compact. 

Overall it's a good site. We've got a pretty sweet trash-rack setup that let's an operator clean all the racks then plow the crap off the dam. It's mostly sticks and stuff and is your standard river flotsam for the most part. We'll pull out trash when we find it. 

The site also has these slick inflatable bags along the top of the dam. In good weather this allows us to increase our head by some amount (not sure how much... maybe a foot or two). More importantly though is that in heavy water, we can deflate them to allow more water over the dam to avoid flooding the power house (which seems to happen once a year anyway). It's incredible how strong water is. 







Saturday, June 8, 2013

Ancient technology


Debugging a dial up modem today. Needed to grab my debugger (headphones) before I was able to make any real progress. I'm unsure if the computer's date and time are causing problems but they certainly made me confused when I was reading the log file.
So after debugging and digging through the weird log files, apparently the remote computer is sending a "NO CARRIER" message. I'll need to dig into what that means on-site probably. I wonder if a USB modem and windows 7 will work on my MacBook? Good times.