Thursday, March 20, 2014

Death of a Hard Drive

Diagnosis

You just never know what you're going to get hit with from one day to another. Late last week I got a call from our operator at the Five-Mile Pond Dam. He said that there was an odd alarm on the SCADA computer that wasn't going away when you acknowledge it. It was a strange alarm, but it was enough to point me in a direction:
MAC_DL Lost DB connection to SQL Server
There was also one for MAC_PTDL but since that part of the error didn't mean anything to me I figured it was probably secondary to the primary issue of the lost connection to the SQL Server. I don't pretend to be an SQL guy and I had nothing to do with how the computer was set up seven years ago. I do know that the SQL Service was running and it claimed that the database was online (this turned out to be too general to be useful).

I started to dig through the admin tools for the SCADA software (Cimplicity) and was on the verge of contacting the company that developed the SCADA for us. Before I did that I was trying to check everything I could find and stumbled into the Trends section. Trends are stored to the database but everything looked fine until I tried to close out of the Trends admin area. As soon as I tried to do that it told me that the database it was trying to use ("FiveMile") no longer existed. The list of available databases contained nothing that looked like it would be used for trends and definitely didn't contain one called "FiveMile."

Database Woes

As I said, I'm not a database guy. I am generally pretty savvy though and found my way into the SQL Database Admin tools pretty easily. At this point I found a list of available databases. This list included one called "FiveMile" but it was grayed out and next to it was the label "suspect".
FiveMile (suspect)
What does that even mean? I'm sure there are people out there who really know, but from what I found it basically means that something happened (computer crashed?) and when it started back up SQL tried to restart the database and found a problem that it couldn't repair. Thus it suspects there's something wrong.

I spent a great deal of time learning about suspect databases and emergency mode and running chkdb and all sorts of related errors and status codes. It was rough. Once I got the DB back up chkdb didn't want to run on this ancient XP machine with a 5GB database. The errors in the SCADA had changed but weren't quite gone. I thought I had made progress but couldn't be sure and was planning to do some more DB research when things changed...

Hardware Failure

I had assumed that the DB issues were stemming from the hard drive starting to fail. That wasn't quite right though, looks like the DB issues were the end of the failure, not the start. I had placed an order for a new hard drive the first day I was exploring the issues and I'm glad I did because the next day the computer died. Locked up on boot and further boot attempts resulted in "No boot device found" sort of errors. The site is 400 miles away so I couldn't verify but since my remote access was gone I was pretty sure that was it. 

These machines are old, not very good to begin with (Dell 830 tower) and in very dirty environments. I'm surprised it lasted 7 years straight. I had anticipated this though. Back in January when I was doing the network upgrades I decided to make backups of the computers (FiveMile and Quinabaug) just in case. I wasn't part of the initial plan and I didn't run it past upper management, I just did it. 

I bought a USB to SADA (bare drive) adapter and wanted a rugged external drive to save it all on. After a TON of research I ended up with the Silicon Power Rugged Armor A80 external 1TB drive. I'm not yet at the point where I would say it's the best drive ever, but it's supposedly water proof and dust proof. It has a good warranty and it includes a very short USB3 cable that slots into the side of it, which is VERY handy. I've never used any other cable with it. 

For the actual backups I went with a program called DriveImageXML from Runtime Software. I think I went with it because it was free and sounded like it did a reasonably good job. 

So, back in January I anticipated a hardware failure and back a backup of the SCADA software at Five-Mile Pond Dam. Now, in March, the drive failed and it's time to test my backups...

A Simple Drive Restore

Everybody knows that you NEVER test your backups until you need them. I am, of course, being sarcastic, but we're not in a position to have a bunch of extra hardware laying around to actually perform those sorts of tests on. So I made backups, took them away with me and hoped that we'd never need them. Yet here we are, needing them. 

My HOPE was that I could restore onto the new drive here at my house, then drive the 400 miles out to the dam and install it them drive home the next morning. I had purchased a Data-Center quality drive that was supposedly designed for long-term reliability which should arrive Monday. I'd drive out Tuesday and come back Wednesday. 12 hours of driving in two days isn't my idea of fun but without that computer there was no remote access and the operator would have to be onsite a lot more. So let's do it. 

The restore went smoothly on Monday afternoon and it looked like the drive was all ready to just be plugged in and it would work. I haven't done many windows restores so I assumed it would go smoothly and struck out Tuesday morning with every expectation of this being a cake walk. I'd install the drive in 30 minutes then be off to dinner and the hotel where I would monitor it remotely during the evening, meet the operator briefly in the morning and head out. Oh the best laid plans...

Murphy is my project manager

I arrived at the site around 4:30pm after some delays and cracked open the PC. A dead drive doesn't really look any different from a good one, but seven years of industrial dust gives a drive a strange texture:

I suppose it could have been worse after seven years. Anyway, I easily swapped in the new drive and buttoned it up. Turned it on and... .... .... No love. No boot device present. Immediately I'm like, "it must ben the Master Boot Record, it probably didn't get restored." I go to look up how to fix it and it requires the Windows install disk. Yeah... no idea where that is.

The operator thinks it's at the other plant so we run down there and manage to find a Windows 2000 dell install disk. What's the OS on the machine at the other plant? I can't remember, might be 2000... (rookie error). So we grab the disk and head back. I boot into the repair console and run FIXBOOT which is the typical first step. Still no luck. 

Things get a little hazy here as I started to get nervous. My tech cred was rapidly disappearing as the operator patiently waits for his computer to come back online. I try a bunch of stuff which may or may not have been in the following order:
  1. Realize the partition needs to be "ACTIVE" to boot so I rip the drive out of the machine and hook it back up to my laptop to fix that. 
  2. Once that's done it turns out the FIXBOOT for Win2k won't help a WinXP install. (stupid) and now there's a MISSING NTLDR error. 
  3. Operator takes THREE more trips to the other plant before we FINALLY find the windows XP dell install CD.
  4. XP Repair console doesn't really like what I did with the Win2K repair console. Out of desperation I try FIXMBR, which it tells me might mess up my partition table, but I didn't read it. 
  5. My partition table gets messed up and the drive is dead. 
At this point it's over. There's nothing I can do at the plant. We pack the computer (big old dell tower) and monitor, keyboard and mouse into my car and I tell the operator I'll fix it in the hotel tonight and we'll get it set up in the morning. I'm pretty down but determined to get this working.

Determination and tenacity win the day

I know this is it. If I want to head home tomorrow morning I need to get this done tonight no matter what. Earlier I had realized that I forgot my USB CD Writer at home (STUPID) and I saw the potential of needing to but an emergency boot disk of some sort. At the very least I didn't want to realize that I need to do that at 11:00pm and be SOL. So I tried a Target first and then a Staples and found an external SuperDrive and some blank CDs and DVDs. Then I grabbed some takeout (Subway) and hunkered down at the hotel. 

First order of business was to restore the data again. I repartitioned the drive (set partition 0 to Active) and began the restore. Thankfully there's not a lot of data on the drive and it only takes and hour or so to restore from the external drive. During this time I'm talking to my wife and kids and the question on everyone's mind is, "You are still coming home tomorrow, right?" My answer is that this is still the plan, but in my heart I was not yet at a point where I was making progress.

With the data restored I pop it back into the tower and boot. No luck, just as I thought. So the Windows Repair console didn't seem to help much last time so I tried something new. I had downloaded something called the Ultimate Boot CD and installed it on a USB drive. I tried to boot the dell off the USB and it works. Unfortunately the Ultimate Boot CD is anything but straight forward. I spend about 30 minutes gingerly navigating the tools on the CD (USB) before giving up. Nothing makes sense and the documentation is all out of date. 

Back to the Windows XP Repair Console. This time when I start the console it asks me which Windows install I want to repair, this is new, and lists C:\Windows. I'm very excited. It's recognized something new that it didn't last time and I eagerly selected that entry.
Please enter the Administrator password: 
Oh, no. I try all the standard options and nothing works. Every three attempts I have to reboot and launch the repair console again (about 5-10 minutes each time). I've got no idea. We only used the "Operator" user and never the Administrator so nobody has any idea. The repair console won't let me  repair without the password. Ugh.

I remember anticipating this issue at the other plant back in January. I had made a CD of the Offline NT Password and Registry Editor, a tool I had found that supposedly allowed you to reset windows passwords. Well if ever there was a time to test it out... I dig it out of my bag and try to boot off it, but it wont boot. Crap. Try again. Finally it boots. I don't know what the issue was but it does finally boot and the provided instructions for the tool are perfect. Everything goes smooth, I reset the Admin password and I'm finally able to boot into the Repair Console! Whew!

Of course we're not done yet. I start again with the FIXBOOT command. It appears to work correctly and I reboot the machine. There's no boot device error this time which is GREAT, but instead I get another weird error:
Windows could not start because the following file is missing or corrupt: <Windows Root>\system32\hal.dll
HAL.DLL.
What the hell is that? First link off a google search sends me to About.com but is promising. It provides a list that I start going though:

  1. Reboot, could be a fluke. Nope didn't work.
  2. Check bios boot order. Yup, it's fine.
  3. Run XP restore command. Looked this up, didn't like the look of it. Skipped it. 
  4. Repair boot.ini. Restored boot.ini from a backup, I'm sure it's fine.
  5. New Boot Sector. This is FIXBOOT. I did this.
  6. Recover bad sectors. New drive hopefully this isn't it.
  7. Restore HAL.DLL from XP install CD. Really? I just restored from a backup, this should be fine.
  8. Repair Install. Risky.
  9. Clean Install. No way...
Not feeling very good about this HAL.DLL thing. I decide to be more thorough. I go back up to boot.ini and start trying to look into it. The Repair console doesn't give you a lot of options and I eventually remember the MORE command to view text files. Running MORE on boot.ini gives me some crap that I don't fully understand. It shows a "path" to the hard drive where windows is supposed to be installed. Doesn't look quite right. Seems like it SHOULD be saying partition 0, not Partition 1. So I do some research into how to edit boot.ini which leads me to the BOOTCFG command which has an option to display the currently available OS installations. Sure enough it shows Windows on Partition 0. So somehow there was something else on Partition 0 of the original hard drive that I didn't back up (or restore), oops.

Some more searching lands me a bootcfg tutorial which helps me add the correct drive to boot.ini. 

BOOM!

No more HAL.DLL errors and the machine boots up and seems to be running fine.

FWEW. 

I have a new level of loathing for Windows but at least I can say I learned something. I ran some more tests and in the morning we set it back up in the plant and everything is running smooth. YES! After a quick trip to staples to return the stuff I bought the previous night I'm on my way home. 

Wow. What a day. Anyway everyone was happy and I'm glad I was able to get everything working. I'm still not completely happy with the backup/restore process but it was WAY better than nothing. 

Thursday, February 13, 2014

Network Security at a Hydro Plant pt. 1

The small hydro industry is pretty old fashioned. Many of the people working in it come from old industrial backgrounds or have been doing hydro for a long time (or both). I mean absolutely no offense by this, it's just that you don't really fall into small hydro by accident. It's not a entry level job out of college and most people have never even considered that a private individual could own a hydro power plant.

Through no fault of their own this typically means that from a computer/technical side the small hydro industry is pretty far behind. If you go to the hydro conferences (which I did) you don't see anything about network security, remote access, or even the HMI/SCADA systems that almost every plant has. Frankly I was surprised to see almost nothing regarding PLCs and automation at the hydro conference. There were vendors who would give you "water to wire" which it appeared would also include a PLC and HMI/SCADA but to call it secondary would make it seem more prominent than it was.

So we've been acquiring new plants which are badly in need of some TLC and many of our old dial-up remote access systems are now failing. This has lead us to look at how we can put our plants online (the internet) without needlessly exposing them to hacking threats. If we're online there's always the possibility of hacking so we want to get the best security possible. The challenge was that we're securing multiple sites (11 at the time of writing this) and we're still small business size so we can't invest a ton of money in each site. Think about it, if we had only one location to secure with all our infrastructure it would be pretty straight forward, but we have no infrastructure and 11 separate sites to secure.

The Cyberoam CR-15wING Unified Threat Management appliance

Cyberoam CR-15wING

After some shopping around and experimentation we settled on the Cyberoam CR-15wING Unified Threat Management appliance. Cyberoam is a smaller player in the network security world (than CISCO) and they're currently big in Europe and Asia and still breaking into the US market. After my frustration with CISCO's ISA-550W and it's subsequent EOL (we can talk about that later) I set out to find a solid network security appliance with reasonable cost and support contract. 

Enter Cyberoam. On a whim I contacted them about our needs and how they could fit in and immediately got a call and an offer to have an evaluation unit sent to me immediately. After checking their prices (which were very reasonable) I agreed. I need to confess to not being a network guy so I don't want to pretend to be an expert, but I was quite overwhelmed by the CR-15. I can only compare it to the CISCO ISA-550W which was a small business network security appliance priced similarly. The Cyberoam interface was significantly faster and more stable than the CISCO. It appeared to be much more powerful as well. It was certainly more complicated for someone like me to deal with. Had Cyberoam technical support not spent three hours on the phone with me walking me through the setup I would have given up and sent it back. They were very patient and in the end I came to understand how much more secure then new SSLVPN was vs the CISCO IPSEC VPN. 

I'm also beginning to understand more about how to segregate our control systems from the regular intranet at our sites. By restricting access to the control systems we can further prevent intrusion. I'm still working on understanding everything about the Cyberoam device but for our small hydro plants I believe we have a solid platform with a lot of room to grow. 

Look for part 2 with some more information about how we are securing our sites. 

Wednesday, February 12, 2014

Utility Belt Prototype

So, aside from hydro, I've been working on designing my own bags and utility belts. Back in July I finished my first prototype, but never blogged it.

The utility belt I designed and created in July was created specifically to allow me to carry a few specific items around with me easily while working:

  • Phone (sized for the iPhone 5 carried horizontally)
  • Flashlight (AAA size)
  • Knife (specifcally the Leatherman CS4)
  • Pen (of some sort)
  • Pack of 3x5 index cards for notes
  • Something else (undesignated pouch)
There were a bunch of other items I had that I was playing around with but those were the most important. You can see some of my design ideas here.
The second from the top was chosen to be the first design I made.
I also spent a lot of time trying to decide on and then acquire the appropriate materials. I settled on waxed cotton canvas and wool as my main two materials. Waxed cotton ended up being very difficult to find in small quantities. Eventually I found someone on Etsy who was willing to sell me two yards of grey waxed cotton, which I'm not even close to being done with.

So it took me a good deal of time and experimentation, but back in July I produced my first prototype. I've been wearing it nearly every day since and I'm happy to say that my design was sound. My technique has continued to improve but ultimately it was a wild success as a prototype.
Prototype utility belt after six months of use. 
You can see the finished product here after I've been wearing it for about six months. I wasn't very good about finishing the edges so there's some fraying but all the seams are currently still solid.

So the design I came up with started with the belt. The belt is made from a backing of 1/16" industrial wool felt with 1.5" seatbelt nylon as a middle layer. A 1" nylon top layer holds the Cobra brand 1.5" buckle. These buckles are really sweet but quite expensive. They're technically overkill but I was set on using them and they're great. This belt is solid and doesn't slide around because of the extra friction offered by the wool bottom layer. It is somewhat adjustable (~4") allowing me to wear it over jeans or over insulated coveralls.

The pouches here are the same shown in the drawing and worked out quite well. At the time of construction I added another small pouch for soap stone (used to draw on metal). I used waxed canvas for just about everything but added on some grippy black material I found at Ragged Mountain onto the pockets and a contrasting yellow to the inside. I decided to use strong magnetic clasps in the smallest size I could find. I got them from ... and they ended up being perfect. I didn't do a good job of reinforcing the flaps so I added extra stitching later which looks strange, but it works. The trick with the pouches was to use the industrial wool felt as the backing for them. The belt goes through the pouch between the felt and the canvas so the felt of the belt is against the felt of the pouch providing excellent friction. The keeps the pouches from sliding around your body in general use without making them hard to move it you want to move them. It was an idea I had that ended up working perfect.

One idea that didn't really work well was the universal pouch that I added. The idea was to have a pouch that could hold a variety of things but ultimately it couldn't. I even added an extra long flap with a second magnetic snap so you could fasten it looser or tighter. It was a good idea, but the pocket itself was not well constructed. I didn't know how to make good pockets and things like falling out of it. Additionally it was just the wrong size. Nothing really fit in it well other than a granola bar. Oh well.

Ultimately I learned a lot and I'll post some of the other bags I've made since.

Prototype in progress.



When your dam is empy on the inside its a ...

Hollow Dam.

Hollow is our latest project. It's another old Algonquin site out near Gouverneur, NY. A group of us took a nice drive out there this week to look at what it would take to give the site a control upgrade much like we did at Burt Dam. The weather held out pretty good for us and we got a lot done.

You can see the "open" turbine at the left. 
The site has some very unique turbines. They are vertical cylinders that contain the turbine and the generator and act as a gate. The entire cylinder raises up by about three feet which opens the "gate" allowing water to flow into the cylinder and through the turbine. The turbines themselves have variable pitch blades allowing the operator to regulate the water flow and power generation to some degree. Unlike most of our plants, there's no real powerhouse around the turbines, which makes winter repairs rather annoying. Beyond this, though, they seem pretty reliable and produce about 1 MW/h (1000 KW/h) at peak. The two machines are identical and are rated at 530 KW/h each but tend to max out around 500 which is still good.

This is the rig around the two turbines to allow for maintenance.
Because of how the turbines are configured, the control room is an entirely separate building down below the dam. It's a bit cramped but nice and quiet, unlike most of our plants where the controls are in the same building as the turbine. The control panel appears to be similar to a lot of the plants I've seen recently. It was upgraded in the 80's and from there the bare minimum was done to keep it running.

The PLC is an old Toshiba model (same as Burt) but is not functional at all. Instead of spending the money to get it working it was bypassed as much as possible and is now run only on manual control. For Hollow this is okay (not great) because the operator lives on the premises. The previous owners wouldn't have gotten away with that any other way.

Kinda cramped, but quiet and warm. 
Richard and I went through the panels and started looking at the electrical drawings. The panels are pretty cramped too, having been pretty efficiently designed to be as compact as possible. You can see the two control panels and their cooresponding breaker cabinets in the picture. Out of frame is another control panel primarily containing protective relays along with a few non-functioning switches and the PLC cabinet which is near the floor.

Unfortunately there isn't any good space on the panel for one of our touch screens. At this point I'm thinking that we'll probably build a mount for the screen and put it on the desk which is at the left of the picture. I really like the Beijer panel we used at Burt and the operator likes it too. The operator at Hollow is less technical then the one at Burt and I don't think putting a full PC there makes sense. We'll probably use a wifi android tablet or iPad for him to keep an eye on things from his house. That will also give him access to the reporting tools we're setting up.

Overall it looks like a good little project. I'm not exactly looking forward to two weeks in Gouverneur (or maybe Watertown), but otherwise it looks like fun.

... so why is it called "Hollow Dam"? Well... we found/saw the remnants of the original dam...
Kinda looks like it was empty on the inside... 

Thursday, January 9, 2014

Solar at a Hydro Site

It's almost funny. After more than 10 years of my suggesting that we install solar at one of our hydro plants, we finally did it.

Renewable energy is an interesting topic. Most people end up being only partially informed about the different ways that renewable energy can be produced. Often this manifests itself as a negative attitude towards hydro ("damming up our rivers!!!!") and a positive one towards wind and solar. Being in the hydro business I have always found myself to be interested in all types of renewable energy though hydro is the best. :)

What I have REALLY been interested in is the combination of the different types. A hydro site has what we call "parasitic load" which is power that the plant needs to use in order to stay running: lights, computers, pumps, etc. In general, our contracts with the utilities require us to but the power we use for the parasitic load BACK from the power companies rather than use some of the power we generate. We are required to sell every bit of what we generate. This is a perfect opportunity for secondary power generation. I suggested the use of solar or a small wind turbine to offset the parasitic load many many years ago. It just seemed like it would be awesome to have a renewable energy site that was fully renewable. (hmmm... that could be a bigger topic for another post)

And so...


As of January 8th, 2014, we now have a solar farm in Northern NH at the first hydro site we ever built. You can see by the picture that this is a pretty large set of solar panels. I mean, it's not huge, but you're not likely going to put it into your back yard.

This solar site produces 45kw/h at about 14% efficiency. Our expectation is that it will offset roughly 57,000 kw of parasitic load. But what does that mean?

So at peak efficiency, the solar panels will produce 45 kw of power every hour. That would be 393,000 kw per year IF they could operate at peak output 24 hrs/day (45 kw/h x 8736 hrs/year). Since it's solar and the sun goes down each day that's not possible. So you get an efficiency rating. Solar can realistically generate about 14% of it's potential potential power. So 393,000 x 14% = 55,000 kwh.

But what does THAT mean? So, an average household in 2011 used between 9,000 and 11,000 kwh of power in a year. So a solar site the size of the one we just put in (note the tire tracks for scale) would power FIVE or SIX houses for a year. Not very many.

On top of that, 14% doesn't sound very good does it? That's part of the problem with solar and why it so dang hard to really get solar onto the grid. They just take up so much space for so little power...

So how does that compare with hydro?

Hydro tends to be about 60% efficient. Hydro can run 24/7 but the efficiency takes into account water flow variation for dry seasons. The site that this is installed at is our smallest site which is on the very small side of small hydro. This site generally produces around 3,000,000 kwh per year. We call it a 600 kw site. That tiny hydro site will provide power for roughly 300 families.

Three hundred.

That's why we put so much effort into getting these small plants up and running. Hydro has such an excellent efficiency (compared to other renewables) that even a small plant can make a difference. For the record, wind power tends to have about a 20% efficiency.

Despite all that, I love the idea of a Hydro plant using solar to offset it's parasitic load. It's brilliant. If more businesses did that it could really make a difference. That's where it needs to happen: businesses or perhaps neighborhoods. Small solar farms providing power to a neighborhood or business is awesome. Instead of one big farm we could have a bunch of small ones all over the place. Solar has "easy" going for it. No moving parts and no complicated systems. Just set it up and go.  They're bigger than you think, we just need to think bigger.



A Long Haitus

I need to apologize to anyone who had been reading this blog. I had been (by my standards) on quite a roll up until sometime in July. I don't have much of an excuse besides to say that things got a bit crazy. As part of the new year, I'm hoping to do a much better job keeping things updated.  In a effort to document things in a chronological order, I will be back-dating any posts about topics that took place in my past. Some posts won't require this kind of treatment, but others will. I'm not sure how the RSS feed works, but I would recommend checking the blog archive on the right to see if posts start appearing for the latter half of 2013.

Thank you to anyone still faithful enough to keep an eye on this blog.