================================================================== TO: All Sales & Service FROM: Masood Jabbar, Larry Hambly DATE: September 11, 2000 SUBJECT: Customer information about ecache memory Quality ================================================================== This document contains information to help you answer customers' questions about reported ecache memory failures that may have disrupted their systems. Many customers are not experiencing any problems in this area, so be sure to use the information appropriately. Included is an overview, a brief history of the issue and our actions to improve reliability, and recommendations you can confidently make to customers now. The information is intended to be used proactively with customers who have been affected (or continue to be affected) by ecache parity errors. It also can be used proactively to unaffected but concerned customers. Please note: These points are NOT intended to be read verbatim or printed as a written communication; rather, they are talking points to help you answer customers' specific questions. Overview: --------- Some customers have experienced intermittent "ecache" memory errors that we now understand stem in part from a poor quality SRAM from our component. It should be noted that there are no architectural or data-corruption issues - it is a component that causes a system reboot after an "ecache parity error" message. Regardless, Sun must and will take full responsibility. That is the only way we can achieve our goal of 100 percent customer satisfaction.. From the beginning, the total number of occurrences have been infrequent enough to manage individually, customer-by-customer through numerous technology, IT practice and process improvements. These dramatically reduced the occurrence rate. However, ANY amount of unscheduled downtime maybe disruptive to our customers and is unacceptable to us. We take this issue very seriously and have engaged literally hundreds of Sun's best minds across the company to analyze the root cause and develop solutions to eliminate it. While we continue to make these specific engineering and process changes, we are leaving no stone unturned in chartering significant corporate quality initiatives across Sun. To that end, we have established a company-wide 15-point Availability and Quality (A&Q) Program, driven by a Customer Advocacy Organization focused 100% on customer satisfaction. The Bottom Line: This is THE Top Priority of Sun and nothing less than 100% customer satisfaction will be accepted. Revenue is a distant second! Background History ------------------ In the Summer of 1999, Sun's services organization began tracking intermittent ecache parity errors on some systems that caused reboots. As part of a company-wide A&Q initiative, Sun began taking measures in early Fall to reduce the effects of noise, temperature and other environmental factors, and began sharing information with affected customers as well as analysts. Since these briefings involved engineering details and possible future products, we requested that customers sign an NDA. By late Fall, Sun's technical team identified a discrepancy in reliability between our SRAM suppliers that resulted in two actions: 1) Short-term, we began taking steps to change suppliers. Earlier this year, we began ramping up production of the more highly reliable 5762 components, which today are used in E10000 and E6500 Enterprise servers. 2) Long-term, we began work on improvements to systems (hardware and software), test procedures, and components that would deliver higher reliability. These improvements have been made immediately in our product line as they have been identified and fully tested. Management removed the need for NDAs at the beginning of the year as the technical team began to identify and roll in engineering and process improvements. This communication was not implemented as efficiently as we intended. As a result, there have been several requested NDAs well after the company removed the requirement. We want to reiterate that talking about this issue does not require an NDA. The only time you need to require an NDA is when you discuss future products or when information you share with your customer needs to protect proprietary information of our suppliers, partners or other customers. As a direct result of our continuous improvement initiatives, this summer we began testing two alternatives that show promise in achieving total customer satisfaction. The first and preferred alternative is "scrubber" software technology, which make systems more resistant to ecache parity errors by constantly cleansing the data paths. Two versions of the scrubber software have been developed (and are being implemented on Sun's own systems): a) A User Level Scrubber is targeted at systems with fairly high levels of idle CPU time. It is showing relative improvements on the order of 2-3X in quality over systems without the User Level Scrubber. b) A Kernel Level Scrubber is showing considerable promise in meeting or exceeding reliability specifications on both idle and active systems. Early but incomplete testing is showing relative improvements of 4-7X over systems without the Kernel Level Scrubber. Availability of the scrubbers is as follows: -- User Level Scrubber for Solaris 2.5.1/2.6/Solaris 7/8: Available now at sunsolve.sun.com -- Kernel Level Scrubber for Solaris 2.5.1/2.6: Available now at sunsolve.sun.com -- Kernel Level Scrubber for Solaris 7/8: Available in late October. (Please see your SE or service manager ASAP.) Please note: Current users of Solaris 2.5.1/2.6 should install the Kernel Level Scrubber as soon as possible. Solaris 7/8 customers should install the User Level Scrubber as soon as possible, and migrate to the Kernel Level Scrubber as soon as convenient when it becomes available in late October. Also please note that customers may ask about the effects of the scrubber software on performance: Early empirical evidence gathered on highly active, high performance systems suggests that the effect is small -- well under 5 percent and only with specific applications. We will continue to monitor and report the findings to you. The second alternative is a new hardware component technology: mirrored SRAM. Sun's engineering teams began receiving initial samples of a newly designed mirrored SRAM components earlier this summer. Laboratory tests suggest that -- like the scrubber software -- the re-engineered SRAMs will help us meet or exceed customer satisfaction goals. Information on field availability of Mirrored SRAM components will be forthcoming. While all this is going on, we would like to ask you all to do the following: Recommendations: ---------------- 1) Recommend that customers implement Sun's Best Practices, account healthchecks, and HA/clustering if possible. These are located at Sun's SunSolve website at: http://bestpractices.central. As part of this exercise, prioritize the mission-criticality of your systems. Though customer dependent, empirical evidence suggests that these practices can deliver as much as an 2X improvement in reliability. 2) We recommend that Solaris 2.5.1/2.6 customers install the Kernel Level Scrubber as soon as possible. For Solaris 7/8 customers, recommend that they install the Kernel Level Scrubber as soon as it becomes available in late October. Note: The efficiency of the software is greatest when combined with systems based on the newer 5762 components. In keeping with our goal of 100% customer satisfaction, we are willing to replace all older 5661 components with the newer, more reliable 5762 components upon customer request. This process will take from four to eight months to complete as manufacturing of the newer component ramps up. We will be working first with customers whose systems have been affected. Enterprise Services and GSO will be working to create an account specific replacement strategy and plan for affected systems. Start working now with your service and SE teams to begin creating your account migration plans. Please help affected customers prioritize the mission criticality of their affected systems for scheduling the replacement. 3) Work is progressing on the new mirrored SRAM components. Comparisons of early reliability statistics suggest that the reliability of the Kernel Scrubber/5762 solution is equal to the Mirrored SRAM solution in delivering on Sun's total customer satisfaction goals. If that remains the case, it would obviate the need for an intrusive module replacement. Additional information will be sent as it becomes available. As I said we will continue to work this issue aggressively and we will keep you posted on our progress. We apologize for the impact to our customers especially and we are working very hard to regain their trust. Continuous Communication ------------------------ In addition to the above, we also have plans to increase our communications to you with a separate communications program. We are committed to delivering: 1) A Special Edition of the McNealy Report this week, featuring Scott McNealy, Ed Zander, Masood Jabbar, Larry Hambly and John Shoemaker. 2) Tech Talks from Subject Experts on an ongoing basis. Among the highlighted experts will be: -- Scrubber software: Steve Chessin, Distinguished Engineer -- ORT Test: Eugene McCabe, VP Operations -- USII/III Design: Anant Agrawal, VP Microelectronics -- Best Practices: Anne Chasen -- Data Collection: Kevin Terrill 3) A special internal website will go live by mid this week. Information that will be included: --Current strategy and product information --Sun's Best Practices document --A glossary of terms, defined in simple language (e.g., what is an ecache parity error and what causes it) --Other primers that can be used to help communicate with customers --FAQs that will be refreshed with new questions as they arise. --A list of vice presidents and directors to phone with specific questions. --Links to other relevant sites. So, our goal is to keep you "armed with information." We want you and our customers to know the whats, wheres, whens, hows, and whys. In so doing, we will do our best for our customers. Keep up the fine work and we will be back to you soon with more information. Masood Jabbar, Larry Hambly