During most of last summer and early fall, the EOS Ad Hoc Working Group on Production (AHWGP) worked feverishly to get improved estimates of computer loadings, data archival rates, and network traffic. On October 3, the early instrument teams (ASTER, CERES, LIS, MISR, MODIS, and MOPITT) had submitted their initial scenarios to the modeling group at Hughes, together with revised network loadings from many of the IDS teams. About two weeks later, the AHWGP presented some initial results from this effort to the Investigators Working Group at Hunt Valley, Maryland. Over the next several months, the results worked their way into a variety of estimates that were used for the EOSDIS Core System (ECS) Preliminary Design Review. This review was satisfactorily completed toward the end of February, with the review board feeling pleased with the progress being made towards a solid EOSDIS design.
When the AHWGP started its work, production was viewed as a "continuous process", with little data captured on either the discrete nature of the data product files or on the ebb and flow of process activations. When the new information came in from the AHWGP, one of the first jobs that the Project and Hughes undertook was to see how different the estimates were. Interestingly, the previous estimates of total MFLOPS (millions of floating point operations per second) and storage rates appear to be traceable to the new estimates. However, we are now in a position to provide more reliable engineering because the modeling effort can probe the effect of loading the computers and disks with queues of "jobs" waiting to be processed.
As a result of our improved understanding, production of the standard data products appears to fit within the resource envelope for EOSDIS; although the rate of processing has gone up (in MFLOPS), the rate at which data has to be archived has decreased markedly. It looks as though the decreased cost of storage offsets the increased cost of processing.
However, the work of the AHWGP (and related efforts) is far from complete. We have started to expand our data collection efforts to include instruments not on the early satellites. For the instrument teams we have worked with previously, we have begun trying to estimate the impact of validation, quality control, and various kinds of exceptions. Hughes has moved well along in being able to simulate both standard processing and the effects of various kinds of perturbations on the system. Most of the early instrument teams have begun to use these simulation results in designing their operational processing scenarios. We will be pulling these results together in time for the Critical Design Review of EOSDIS. Again, we would expect to get together in a Modeling Workshop, perhaps before the IWG meeting in Sante Fe.
It is also important to observe that the success of the Ad Hoc Working Group on PRODUCTION has led to an Ad Hoc Working Group on CONSUMPTION, led by Bill Emory and Dave Emmitt. The AHWGP had its hands full trying to deal with the collection of production information and is very pleased to have other hands pick up a critical part of collecting what we need to know for a successful EOSDIS.
If we take a longer view of what the AHWGP is trying to do, we can perhaps summarize it in terms of avoiding "unnecessary delay and capacity" in getting good data to data users. Our needs here parallel those of industry in designing efficient production of other goods. We also want to minimize the delays, maximize the efficiency of hardware and software use, and, most of all, avoid wasting our time and that of our user communities. In industry, such an approach is called øJust In TimeÓ manufacturing. It aims to reduce the backlog of production (data products waiting to be processed or queries to be answered) and minimize wasted production capacity.
Just In Time manufacturing seems like a good metaphor for what we have to do in designing the production processing (and complex query answering) in EOSDIS. Production is not simple. A recent textbook for industrial engineers (Manufacturing Systems Engineering by Stanley W. Gershwin, pp. 15 and 16) has some interesting comments on what we have to do: "Complex systems that are poorly understood become increasingly complex over time. We experience such systems constantly in our daily lives, and such experiences are frustrating and wearing. Examples are unfortunately abundant: they include the tax code, the medical insurance system, many aspects of the legal system, price, wage, and rent control schemes, and many government social service agencies.
"I would propose the following mechanism to explain this phenomenon: when a system is poorly understood, simple rules are created to achieve some goal. They fail to move the system toward the goal. Instead, problems appear. More well-intentioned but misguided rules are added to solve the problems, but they only lead to new problems. This continues until nobody really understands what the rules are, what the goals of the organization are, or what the consequences of new rules would be. It also becomes increasingly difficult to change the system because of its dispiriting complexity.
"When rules proliferate, the system is poorly understood. Additional rules are worse than band-aids to cure cancer; they are the cancer. The only solution is to develop an understanding of the system. This understanding may be difficult to achieve, but the operating policies that result from such an understanding will be surprisingly simple, and the system will work."
That kind of understanding is what the AHWGP is trying to achieve. In future work, we will try to apply some of the theory behind Just In Time production to simplify what we have to do, and save us all the difficulty that comes from lack of understanding.