Barr Code

Friday, June 06, 2008

RTOS Myth #4: The RTOS is in Charge

The Myth: The operating system is in charge and it decides when to switch from one application task to another.

The Truth: A real-time operating system (RTOS) is a very different beast than a multi-user desktop operating system, such as Linux. In fact, an RTOS is simply a library of functions plus a timer tick interrupt handler.

The only opportunities for an RTOS to effect a context switch from one task to another are:

1. If a running task deletes itself (or exits, if your OS allows that). In this case, a function in the RTOS library detects the lack of a running task and can directly invoke the scheduler function to select the next task to run.

2. When a running task blocks, which can only happen by making a function call into the RTOS library.

3. If a running task creates a new task with a priority higher than it's own.

4. When a previously blocked task of higher priority is unblocked, which could happen as a result of:

a. The running task made a function call into the RTOS library (e.g., semaphore post).

b. An interrupt service routine executed with that side effect.

These four (or, five, if you prefer) points of entry into the RTOS are the only mechanisms by which control of the CPU can be transferred from one task to another. They are called "scheduling points".

The implication here is that your application code is actually in charge. If it were to avoid calling into the RTOS library while simultaneously disabling interrupts, the running task could steal control of the CPU for any length of time.

Labels: , ,

AddThis Social Bookmark Button

Wednesday, March 19, 2008

Toward a Better Mutex API

A few months ago I blogged that mutexes and semaphores are distinct RTOS primitives. Unfortunately, the APIs of today's most popular commercial RTOSes only add to the confusion for application programmers.

For example, consider the VxWorks API, which not only forces mutexes and semaphores into an inappropriately common-looking API (semMXxx vs. semCXxx) but also adds a third "binary semaphore" type (semBxxx). Micrium's popular uC/OS-II API is preferable to this in that it at least has just two primitive types (OSMutexXxx and OSSemXxx). But the uC/OS-II API also forces programmers to use similar function name suffices--Post() and Pend()--with each.

I propose a new and clearer API such as the following, which is based loosely on uC/OS-II's current API:


int OSSemCreate(SEM * phSem, int cnt)
int OSSemPost(SEM hSem)
int OSSemPend(SEM hSem, int timeout)

int OSMutexCreate(MUTEX * phMutex)
int OSMutexGet(MUTEX hMutex)
int OSMutexPut(MUTEX hMutex)


Wind River would do well to eliminate semBxxx functions and rename the semCXxx functions as semXxx. In addition, the VxWorks API for mutexes should be changed to something like mutexXxx.

Note, too, that this new API is intended to force each mutex object to be created in the 'available' state (i.e., value = 1), as it already is in uC/OS-II. An additional feature of the mutex API should be that any OSMutexPut() call by a task that does not currently own that mutex should fail with an appropriate error code. Together these easily used mutex functions ensure correct usage of mutexes by application developers.

Labels: , ,

AddThis Social Bookmark Button

Wednesday, March 12, 2008

RTOS Myth #3: Mutexes are Needed at the Task Level

The Myth: Mutexes are a useful intertask synchronization primitive, which you should expect to use frequently.

The Truth: Mutexes are a necessary feature of all real-time operating systems. However, best practice is to use them only inside functions that must be reentrant. That is, you should use mutexes only inside functions that are or could be called by two or more tasks.

Mutexes, as their name suggests, enforce mutual exclusion and thus eliminate race conditions between tasks. A mutex can be used to protect a global data structure accessed by two or more tasks. However, doing this properly requires that each user task have the actual address of the global data and allows for bugs in the failure of one task to acquire and release the associated mutex properly.

It is preferable always to abstract or encapsulate said global data structure as an object, which can only be read or written through a set of reentrant function calls. That way, both the address (and possibly internal format) of the data structure can be hidden and the mutex calls can be ensured to be coded correctly.

Note that mutexes are also necessary inside functions that control hardware through I/O registers, which are effectively global data structures.

Of course, a priority inheritance capability should be present within the mutex API (put there by the RTOS vendor) to ensure that priority inversions cannot occur.

Labels: ,

AddThis Social Bookmark Button

Thursday, March 06, 2008

RTOS Myth #2: RMA is for Academics

The Myth: The Rate Monotonic Algorithm (RMA) is an interesting theory but it has no practical meaning for users of real-time operating systems.

The Truth: For starters,

  • All of the popular real-time operating systems (e.g., VxWorks, ThreadX, and uC/OS-II) feature fixed-priority preemptive schedulers
  • RMA is the optimal fixed-priority scheduling algorithm (and note that dynamic-priority algorithms do not degrade gracefully)
  • Unless you use RMA to assign priorities to RTOS tasks, there are no task-specific performance guarantees; if the processor becomes overly busy in a brief period of time, a critical task may miss its deadline

In a nutshell, RMA is the one and only proper way to assign relative priorities to RTOS tasks with deadlines. (Shock of shocks: Deferring to your boisterous colleague Bill's insistence that his task is the most important isn't guaranteed to work!) There's a nice introduction to the RMA technique at http://www.netrino.com/Embedded-Systems/How-To/RMA-Rate-Monotonic-Algorithm/.

The principal benefit of RMA is that the performance of a set of tasks thus prioritized degrades gracefully. Your key "critical set" of tasks can be guaranteed (even proven a priori) to always meet its deadlines--even during periods of transient overload. Dynamic-priority operating systems cannot make this guarantee. Nor can static-priority RTOSes running tasks prioritized in other ways.

Too many of today's real-time systems built with an RTOS are working by accident. Excess processing power can mask a lot of design sins. But if you haven't used RMA to assign priorities, it could just be a matter of time before you get burned.

Labels: , , ,

AddThis Social Bookmark Button

Thursday, February 28, 2008

More Bad RTOS Information

The Internet (and magazines and conferences) are filled with bad information about when to choose an RTOS. In short, the world wants to sell you an RTOS, even when you don't need one or the use of one would overly complicate your software design.

Here are two generalizations from a recent whitepaper:

Operating systems make programming more efficient and better structured, and their use is now frequently justified even in embedded solutions that are relatively small.


and

A clear benefit of using an RTOS is that it reduces time to market, because it simplifies development.


At best, this is misguided advice.

Here's the straight scoop. An RTOS may either "make programming more efficient and better structured" or less efficient and poorly structured; it depends on the nature of the requirements. In many cases, a design composed entirely of state machines is easier to code and works more reliably than one using an RTOS. In other cases, particularly closed-loop control systems, a simple main+ISR approach will work even better.

Labels: , ,

AddThis Social Bookmark Button

Monday, January 28, 2008

RTOS Myth #1: Mutexes and Semaphores are Interchangeable

The Myth: Mutexes and semaphores are similar--even interchangeable--operating system primitives.

The Truth: Mutexes and semaphores should always be used for distinct purposes, and should thus feature distinct APIs. (My recommendations to RTOS vendors are at the end.)

The cause of the confusion between mutexes and semaphores is historical, dating all the way back to the 1974 invention of the semaphore by Djikstra. Prior to that date, the interrupt-safe task synchronization and signaling mechanisms known to computer scientists were not efficiently scalable for use by more than two tasks. Dijkstra's scalable semaphore could be used for task synchronization (including mutual exclusion) as well as signaling.

After the introduction of commercial real-time operating systems (beginning with VRTX, ca. 1980) and the publication of a 1990 paper on priority inheritance protocols it became apparent that mutexes needed to be more than just semaphores with a binary value. Because of the possibility of unbounded priority inversion, which would break RMA assumptions, ordinary semaphores cannot be used for mutual exclusion.

Many bad sources of information add to the general confusion by introducing the alternate names binary semaphore for mutex and counting semaphore. The current wikipedia entry for semaphore is a prime example.

The correct and appropriate solution is a distinct set of RTOS primitives: one for semaphores and another for mutexes. Mutexes must prevent unbounded priority inversion. The APIs for semaphores and mutexes should be as distinct as possible, as their use is quite different.

Labels: , ,

AddThis Social Bookmark Button

Wednesday, January 23, 2008

Are all RTOSes the Same?

Having just, coincidentally, returned from teaching a two-day hands-on RTOS course in Florida, I was greeted this morning by the following message from an RTOS company president in my inbox:

Recently, I have had a statement by you thrown at me. The statement essentially said that all RTOSes are the same, or something to that effect. Obviously, you and I both know that there are differences, some large and some small. The problem is that people listen to what you say and I think they may have misunderstood you. So what were you trying to say? I'd like to know so I can rebut them with your own words when they quote you.

This reply is probably best handled publicly. Having used several commercially successful RTOSes (including both of the current top two according to Embedded Systems Design); written one of my own (ADEOS) for a book; taught OS theory as adjunct faculty at the University of Maryland; and also written and spoken of non-preemptive RTOS alternatives in several venues, I am quite opinionated on the subject. A few years back I was even interviewed about RTOSes on a PBS television show called American Business Review.

Here are a few of my past RTOS articles, which may provide additional background for this post:

I believe the opening paragraph of that last article concisely sums up an opinion I've often expressed--and which may have been the basis of the remark aimed at the RTOS vendor who e-mailed.

Every commercial RTOS employs a priority-based preemptive scheduler. This despite the fact that real-time systems vary in their requirements and real-time scheduling doesn't have to be so uniform. Multitasking and meeting deadlines is certainly not a one-size-fits-all problem.

But my e-mail correspondent is correct that there are differences large and small. Here are some of the most obvious differences between the various commercial RTOSes:

  1. At the API level, each RTOS is unique. Though every RTOS has functions for creating a new task, acquiring a mutex, and posting to a message queue, the specific function names and parameter lists differ "by brand". Individual programmers may find one RTOS' API more comfortable or logical than another.
  2. To support true real-time scheduling via RMA, each RTOS must provide the following:

    1. A guarantee that the highest-priority task ready to use the CPU is the one actually running at all times
    2. A bounded worst-case context switch time
    3. A bounded worst-case interrupt latency
    4. A mechanism to automatically prevent unbounded priority inversion during mutex contention

    An RTOS that doesn't do one of these things is, obviously, different from the others--but most do. But the specifics may vary. In particular, the precise timing of those worst-case times may differ from one RTOS to the next on one processor to the next. In addition, the details of the chosen priority inversion workaround will make a difference in the RMA calculation mathematics.
  3. Some RTOSes can use the MMU and others can't

Hopefully, this clarifies both that I think commercial RTOSes are somewhat commodity products and that there are, nonetheless, obvious differences.

Labels: ,

AddThis Social Bookmark Button

Tuesday, November 06, 2007

Public Course on Multithreaded Programming

This coming January, I'll travel from chilly Baltimore to sunny Miami to teach an in-depth training course about the proper use of real-time operating systems to design multithreaded firmware. The aim of the class is to clarify the safe and correct use of RTOS primitives, such as mutexes, semaphores, and mailboxes.

The two-day course, called Multithreaded Programming with uC/OS-II, will be held January 22-23, 2008 at the Weston, Florida headquarters of RTOS vendor Micrium, just east of the Everglades. Registration is open to the public, but the total number of seats is limited.

The hands-on course involves a mix of lectures and a coordinated series of programming exercises. The target hardware is an ARM9 development board from STMicro. The increasingly popular uC/OS-II real-time operating system will serve as the reference API with compiler and debug tools from IAR Systems.

Full details, including registration instructions, are available at the Micrium website: http://www.micrium.com/support/training.html

Labels: , , , ,

AddThis Social Bookmark Button

Sunday, September 09, 2007

Embedded Industry Survey Results

I was quoted (mostly accurately :-) over at Embedded.com, in an analysis of the results of the 2007 survey of embedded system designers. There are some interesting year-on-year trends, including an increase in the use of C vs. C++ and a decrease in the use of both commercial and open-source operating systems such as VxWorks and Linux.

Labels: ,

AddThis Social Bookmark Button

Tuesday, October 03, 2006

The End for Embedded Linux?

Last week at the Embedded Systems Conference in Boston, I moderated a panel discussion premised on the recent downward trending slope of Linux use in such systems. The panelists were Dr. Inder Singh (CEO, LynuxWorks), consultant Bill Gatliff, and John Carbone (VP of Marketing, Express Logic).

The graph to the left shows the operating systems use data. The source of this data is an annual (except 2003) subscriber survey by Embedded Systems Design (nee Embedded Systems Programming) magazine. To create this graph, I aggregated individual Linux distribution numbers, as well as combining data for pSOS and VxWorks under ISI acquirer Wind River Systems and Nucleus and VRTX under Accelerated acquirer Mentor. Similarly, all variants of DOS and Windows are lumped into Microsoft.

The question for the panel discussion revolved around the future trend: Will Linux's share growth return or has it peaked? Whatever the answer, Linux is clearly very popular with embedded software developers. And other surveys support this finding.

An interesting subplot concerns Wind River Systems (Nasdaq:WIND). When Wind acquired competitor Integrated Systems (ISI), the combined market share of ISI's pSOS and Wind's VxWorks products (according to the data cited above) was more than 30%. Today the combined share for the same two products has fallen to about 10%. Over the same era the company's stock price has fallen from a high of $60 to about $10. I see little reason to be optimistic about the company's future and noted that they were not even present at the aforementioned industry gathering.

Is VxWorks dead? Is the company's recurring market share around 10% simply due to past users at large companies continuing to use the product? How much has Linux contributed to the early demise of a previous market share leader? What do you think about the future of either operating system?

Labels: , ,

AddThis Social Bookmark Button

Saturday, September 16, 2006

Perils of Preemption

Embedded.com just picked up a paper I wrote for the upcoming Embedded Systems Conference in Boston. The paper is about the downsides of the dominant RTOS (real-time operating system) scheduling algorithm. It turns out that priority-based preemptive scheduling has one key benefit but more than ten important caveats.

Unfortunately, the formatting and editing was screwed up in several ways in Embedded.com's publication of this paper. But I have republished it at http://www.netrino.com/Embedded-Systems/How-To/Preemption-Perils.

I'll be speaking about alternatives to priority-based preemptive RTOSes in Boston on Tuesday, September 26.

Labels: ,

AddThis Social Bookmark Button

Monday, March 03, 2003

Moving Targets

There are currently so many interesting operating systems and alternatives that it’s hard to choose—as we must for each project—just one. Within the priority-based preemptive category, you can choose based on worst-case latency, source code availability, upfront and/or recurring cost, memory usage, API/features, and numerous other criteria. Indeed, the realm of possible price/feature combinations has fragmented the market into many tiny niches.

Though there are a handful of well-known names that have the bulk of the market tied up, a sizeable number of smaller RTOS providers do quite a nice business on just a tiny fraction of total market share. And as the demand for embedded operating systems continues to accelerate, these smaller vendors need not even hold their market share numbers to continue to increase profits. That’s good—because they will continue to lose market share.

There are a lot of forces that will shape the RTOS marketplace going forward—as it goes even more toward the big guys. Not the least of these factors is that more of us will go off-the-shelf. Among subscribers to Embedded Systems Programming magazine, for example, the percentage using no OS or a proprietary alternative has fallen from 38% to about 18% in just the past five years. Extrapolating, perhaps we’ll all be using off-the-shelf OS code by 2007.

Competition from “desktop-lite” operating systems has also picked up. There are a large number of embedded designs that look (or can look) an awful lot like a PC inside, benefit from the low component costs in that market, and no more than dabble in the realm of real-time. What used to be a small ROM-DOS market has morphed into today’s WinCE/XP and Linux market—almost entirely in the last three years. In 2002, some 17% of you fell into that category; I suspect it’s not a coincidence that x86 architectures continued to dominate the list of 32-bit processor choices.

And then there’s consolidation. Though the pace of consolidation has slowed with the business cycle, the effects continue to be felt. Mostly it was the vendors of 32-bit solutions that picked up 8- and 16-bit competitors and debugging tools when times were good—so they could offer one-stop shopping. An up-and-coming Linux player even spent some of its paper wealth to acquire a commercial RTOS vendor for that same purpose. To compete, a large commercial vendor picked up an open source, though non-Linux, OS. When the buying resumes, as it surely will, where will it ultimately end?

Several of the technologies positioned to profit from these trends are not what we traditionally think of as “embedded.” Microsoft, which—love ‘em or hate ‘em—correctly understands they must make it in the embedded space to stay relevant in the coming decade, is trying hard to find the right combination of OS features and vertical markets. In many of these markets, they’re competing directly against the open source alternatives—and apparently losing in some. According to a recent article in EE Times, Linux is also beating out traditional RTOSes in key markets like consumer electronics.

Of all the traditional vendors, Wind River is probably in the best position to compete with these market forces and survive. We are fortunate in the embedded space to have lots of choice when it comes to operating systems. But the future may hold far less technological alternatives. It’s not clear to me that QNX, VxWorks, Nucleus, or any other RTOS is really distinguishable from another in the boardroom—or that the smaller players have enough to gain by staying in the business longer term. What do you think?

Labels: ,

AddThis Social Bookmark Button

Thursday, December 12, 2002

Beyond the RTOS

Selecting a plural form for RTOS is hard; there is no one right way. Some possibilities, listed in order of increasing popularity (on the Web), include RTOS’s, RTOSes, and RTOSs. The first implies a possessive aspect that’s clearly not appropriate, so that variant is best avoided. Between the other two, the vast majority of trade journals have adopted RTOSes as their preferred style. Though, apparently, not everyone is yet convinced.

In addition to trying to standardize the language readers use to communicate with one another, an important role of trade journals is helping spread useful new techniques and best practices quickly. For example, numerous articles and columns in past issues of Embedded Systems Programming have helped popularize the use of RTOSes. Approximately half of that magazine's subscribers now use a commercial RTOS to get the job done.

Unfortunately, however, most of the differentiation between the hundred or so RTOS vendors is on the price and support side, leaving embedded programmers to develop their own solutions when a preemptive priority-based scheduler doesn’t fit the problem at hand. In fact, as RTOS vendors continue to argue against “rolling your own” and worry about lower-cost or no-cost competitors, I would argue that most are overlooking the technically obvious.

Static-priority preemptive schedulers, with priorities assigned rate or deadline monotonically, work very well in certain real time systems with high degrees of both parallelism and periodicity. Telecom and datacom products, with their many communication channels, are often of this type.

But the tradeoffs, including increased interrupt latency and potential priority inversions, are significant. Though it can be made to fit some, the static-priority preemptive solution just doesn’t fit the needs of a large population of systems quite right. Some tasks are periodic, but many others are not. Some systems have hard deadlines, but many others do not (or can safely miss a few now and then). In such cases other types of multitasking (or even the lack thereof) may be preferable to the much-touted RTOS.

Alternative ways of structuring embedded software run the gamut from simple main()+ISR implementations to the use of dynamic priorities. For example, a simple executive is a low overhead technique that works quite well for hard real-time systems with harmonic deadline periods and a small number of things to get done. And state machines can be executed in a series of run-to-completion steps via a framework like that outlined in Miro Samek’s excellent book Practical Statecharts in C and C++ (CMP Books). Taking another approach, Java and Ada support threads natively, making the choice of a particular RTOS largely irrelevant.

My point? The hundred plus commercial RTOSes available today have too much in common technically. Not every embedded designer benefits from adding a static-priority preemptive scheduler; many software designs are, in fact, harder with preemption. RTOS vendors might do better to view themselves as providers of software frameworks for developing embedded software, and differentiate themselves by offering more than one technique. Just as there are currently various ways of pluralizing RTOS, there are also various ways of doing without one.

Labels:

AddThis Social Bookmark Button

Thursday, September 20, 2001

Safety Patrol

When I was in the sixth grade, I was a member of my school’s Safety Patrol. It was my responsibility to ensure that younger children got on and off the school bus safely. “Safeties” wear bright orange sashes and help other kids cross streets adjacent to their bus stops. This is just one measure in a complex web of overlapping steps taken to protect the most vulnerable members of our communities.

As children and adults alike increasingly place their lives in the hands of computer hardware and software, we need to add layers of safety there as well. No software bug or hardware glitch (or combination) can ever be allowed to bring down an aircraft, whether there are hundreds of passengers on board or just a pilot. The failure of many other systems must be similarly prevented. But software and hardware do fail—perhaps inevitably. As engineers, we use system partitioning, redundancy, protection mechanisms, and other techniques to contain and work around failures when they do occur.

As software’s role in safety-critical systems continues to expand, I expect we’ll see a rapid increase in the number of civil lawsuits filed against companies that design and manufacture embedded systems. (Adding several new levels of meaning to the phrase project post mortem.) Indeed, there is anecdotal evidence that lawsuits of this sort may already be on the rise. With most of the action in hush-hush settlements outside the courtroom, though, the media hasn’t yet noticed the trend.

One organization that has definitely taken notice of the hazards posed by software in products is Underwriter’s Laboratories. An independent, not-for-profit product safety certification and ANSI-accredited standards organization, UL initiated a “Standard for Software in Programmable Components” in 1994. The resulting ANSI/UL-1998 standard addresses “the detailed safety-related characteristics of specific software in a product.”

In addition to focusing on top down design and development processes, it may also be beneficial to utilize an operating system that’s been designed with safety-critical systems in mind. Above all else, an RTOS should not compromise the stability of the system. However, an operating system can go beyond and do many things to reduce the risks inherent in your application code. Keeping software tasks from overwriting each other’s data and stacks is merely the beginning of the matter.

In your rush to select an RTOS for use in a mission critical system or life-critical medical device, do make sure you know what you’re getting, though. It turns out that one prominent new operating system marketed specifically for inclusion in products of these sorts has a potentially dangerous hole in its “innovative” protection mechanism. You don’t want to wind up on the wrong side of something like that in court.

Ultimately, the key to designing safety-critical systems is to include multiple layers of protection. The hardware, the operating system, and your application software must each do everything they can to prevent catastrophe—even if the fault itself lies outside that subsystem.

Labels: , ,

AddThis Social Bookmark Button