AltText

Q&A - Outage - SO/SONE


The Contact Us function at the top of every page on the tl9000.org website is the preferred means for asking questions and receiving answers from the subject matter experts of the TIA QuEST Forum. Over the last few years many questions have been answered through this means. The number of each question is the ticket number in the Contact Us tracking system.

These questions generally relate to the system outage measurements that impact end-user customers (SO) and network equipment (SONE).

Question 9030 — This question concerns outage reporting (SO and SONE) for Family 4.2.1 On-line Critical Operation Support Systems. Our systems may manage a single network element or they may be managing an entire network. In general, an outage of our system does not impact the network being managed, only the management system itself is impacted such as inability to add trails, survey the customer’s activity, etc. The normalization unit for SO and SONE is “system”. How should we determine the percent of the system impacted if there is no impact on the network or if we do not readily know the size of the network?

Answer — SO and SONE are concerned with the loss of functionality of the product being measured. SO deals only with the loss of the primary function. SONE measures loss of any functionality. For Family 4.2.1, the primary function is network management. Additional functionality would include alarm reporting, performance monitoring, and any other OA&M function the product was designed to perform. The end-user here is the service provider using the equipment to manage their network and not the end-user of the network itself. Therefore, the impact to be considered is that caused by the loss of function in the network management system and not the network being managed. For guidance on the weighting of different partial outages of these products, please see the entry for 4.2.1 in Table A-3.

Question 9111 — We have questions about category 4.2.1.1. In our company we have Network Management System, which is managing one or more than one element. The Network Management System is managing Elements Management Systems (EMS, up to 10000 for example in one network). Which category relates to big networks management? Is category 4.2.1.1 for one EMS only? The benchmark data of 4.2.1.1 includes SO data. In network management system we don't have SO of the end users, the traffic is not lost, so we think SO is not relevant. We expected to see that SO in 4.2.1.1 category will not be available, so it's not clear for us. Is category 4.2.1.1 for one EMS only?

Answer — The definition for Family 4.2.1 On-line Critical Operations Support Systems is “Real time network management systems, demanding high availability, typically 24 hours a day and 7 days per week.” The number of elements being managed is not a direct factor, but products in this family typically manage large networks from a single point. SO measures the loss of primary functionality to the end-user of the product. The primary functionality is indicated by the bold text in the definition. The one thing to understand with regards to the application of the SO measurement to this category is the user of the product is the service provider managing their network and not the end-user of the network being managed. Therefore, SO should measure loss of network management capability. Since the SO normalization for this category is system, the weighting of different partial outages as shown for 4.2.1 in Table A-3 can be used for guidance on how to measure events which do not involve complete loss of functionality by the network manager.

Question 12346 — According to TL 9000 Measurement book an outage which is more than 15 second will be considered as outage otherwise not. But we have different outage criteria for different customer according to the SLA which is contracted with them. Example for one customer outage more than 3 minute will be considered outage otherwise not so let me know if we must follow the 15 second criteria which is in measurement handbook or we can follow the criteria which is contracted with the customer?

Answer — The rules listed in the TL 9000 Measurements Handbook concerning the length of outage are to be followed. These are 6.1.4 b) 2) for the SO measure and 6.2.4 b) 3) for SONE. Customer SLA’s do not modify these rules.

Question 10695 — Customers are expressly authorized to determine if a reported event is an outage, but the TL 9000 counting rules constrain which events are countable as outages. Can the customer insist for example that a 10-second service affecting outage (rather than 15 seconds) be treated as a valid outage measurement that we have to report to the MRS? Can the customer insist that loss of 10% (rather than the minimum 20%) of end-user mailboxes for a category 6.1 product is a reportable outage? If the customer contractually requires us to report TL 9000-compliant data to them but insists we use their non-compliant counting rules, can we call the customer report TL 9000 compliant and at the same time continue to report to TIA QuEST Forum different data we believe is genuinely valid?

Answer — The data for reported outages submitted to the MRS must follow the TL 9000 reporting rules. These rules cannot be modified by any agreement between the organization and its customer(s). In order to ensure comparability of the reported data, there is no exception or modifications of these rules based on any SLA the organization may have with the customer. What is required to be reported by the organization to its customer is between those two parties. The MRS can only comment on the correct interpretation of the TL 9000 Measurement rules. It cannot become involved with contractual discussions between the organization and any customer.

Question 10854 — There are several measurement requirements within TL 9000 that are difficult to support given our business model. As an example, we are not aware of customer outages; number of outage incidents reported, or outage fix response times. Can we still proceed with implementing a TL 9000 program and seek certification with these holes in our data?

Answer — It is not unusual for customers to not report outage data to their suppliers so you can be exempt from reporting system outage measurements in those cases where your customer doesn't supply outage data, including the number of outage incidents and the duration of these incidents. You can, therefore, proceed with implementing a TL 9000 program.

Note however that problems reported to you can be counted and fix time for those problems can be measured. These problems and fixes must be reported.

If no customers provide the data needed for you to report the outage measurement and you have no other means to collect the data, you may claim an exemption from reporting the measurement. As noted in Section 4.2.1 Customer Source Data in the Measurements Handbook, you would enter “Exempt” in the data submission. You are also required to document the justification for the exemption for review by your auditor.

Question 12444 — Our products are the DC to AC inverters, AC and DC switch gear etc. I am responsible for the TL9000 measurement data submission. We have a problem about SO measurement submission:

About all the major customers cannot give us detail data about our product SO information in telecommunication network. Could we enter the "Exempt" word when we submit the SO measurement for PC 5.3 according to measurement handbook, section 4.2.1?

Answer — Please see Section 6.1.5 of the TL 9000 Measurements Handbook for a full explanation of the requirements covering this case.

You are required to report measurement SO if any of your customers provide you with the needed information or if you can determine the information from internal sources.

If a subset of your customers report SO to you then you are to adjust SOs, and the outage sub-measurements to cover only those customers that report the data.

If none of your customers report SO data to you and you are not able to obtain the data from your own records, then you may enter EXEMPT in all sub-measurements of the SO measurement on your data submissions.

You should also document this situation including showing that you have attempted to contact your customer for this data and that the customer refused to supply this data. This justification will be needed for your audits to TL 9000. It is not sufficient just to say that the customer didn't supply the data. You must show that you asked for the data and that the customer would not supply the data.

Question 12365 — 2. For Optical amplification, how can I calculate for Optical channel and network element? For WDM, how also can I calculate for Optical channel and network element?

Answer — Please refer to the glossary in the Measurements Handbook for a definition of network element. For the type of equipment in category 3.2.2.1.2.2, it is likely each network node will be a network element. The optical channel count would be the total number of optical channels that node can handle excluding any protection channels.

Question 11189 — Is there some relationship between outage and critical problems? In the glossary, there are some descriptions of problem report-critical examples, which read "such as product inoperability (total or partial outage)". Do all outages automatically become critical problems, no matter if they are SO or SONE? In our company, we have some different voices about the relationship.

Answer — There is not a defined relationship between outages and critical problem reports in TL 9000. However, it is not uncommon for an outage to result in a critical problem report, especially if the outage impacts end-customers, which will make the outage an SO reported outage. There are some outages that do not impact end-customers and only result in the total or partial loss of a network element, which impacts the service provider. These outages can result in a critical problem report depending on the impact to the service provider and other factors. For instance, the failure of a non-redundant network element that results in an outage will result in data for SO and/or SONE and would generally result in a countable problem report. Remember that all problem reports must originate with the customer. It is their decision to make a problem report based on an outage.

Question 11548 — In establishing a product-attributable outage, do performance issues count as outages? Specifically, a customer wants to include as a service impacting outage the following: 1) record one-way-audio condition on a fraction of voice calls, and 2) the situation where some calls are dropped after being established. Would either of these scenarios qualify as a "partial outage"?

Answer — Partial outages do not apply for SO. For the Service Impact Outage Measurement (SO), all events “that result in a complete loss of primary functionality for all or part of the system for a duration greater than 15 seconds….”. There is no minimum number of users specified. Clearly a continuing one-way audio condition would meet this criterion and such an event would be reportable under SO. If the system was consistently dropping calls for a period of time, then that would be reportable also. For the Network Element Impact Outage Measurement (SONE), there are minimum amounts of traffic that must be impacted before the event is included in the reported data (Table A-3). If the minimum is met for the category, then the event would be reportable in SONE as well.

Question 11196 — What is the relationship between SO and SONE?

Answer — Outages that impact the end-user are reported in SO. Loss of a network element in whole or in part is reported in SONE if the outage meets the conditions for the category described in Table A-3 of Appendix A of the TL 9000 measurements handbook. An outage may be reported in SO only, in SONE only, or in both, or in neither depending on the nature of the outage as determined by the counting rules in 6.1.4 b), 6.2.4 b) and Table A-3.

There is no simple connection between SO and SONE.

Question 11110 — Service impact product-attributable SO3/SO4. We were out of compliance on our SO3/SO4 measurements and our service provider customer requested a corrective action that we have given them and also have fixed the issue with the deployment of a software patch. We have developed and delivered a solution to our customer that will result in the product performing at the agreed level. Until our customer deploys the SW solution the expectation is that the product will remain out of compliance as a direct result of our customer not deploying the solution. Question: Since this situation is caused by our customer not deploying the fix do we have to continue to accept these non-compliances against our SO3 and SO4 Measurements?

Answer — TL 9000 does not get involved with the setting of specific performance objectives for the TL 9000 measurements. That is between the organizations and its customers. So, we cannot comment on whether your organization has to accept the non-compliances from your customer against your SO3 and SO4 performance. We can offer some insight into how the failure to implement the required fix may or may not impact the calculation of SO3/SO4. If the delay in deployment is due to the normal length of time it takes for the customer to validate the new software and install it on all systems, then any outages due to the problem fixed by the software change will still need to be included in the SO3/SO4 data. Your organization could, of course, offer to speed up the deployment by providing assistance in the form of field service personnel, etc. If the customer has decided to delay the deployment of the fix or to not deploy it at all due to reasons of its own not related to verification of the fix and the performance of the new software, then any new outages due to the problem fixed by the new software would be considered customer attributable and not counted in SO3/SO4. It is important to note that if the fix is only available to the customer in a software release that they must purchase, then the fix has not been delivered to them and all events related to problem would still count in SO3/SO4. Review counting rule 6.1.4 b) 4).

Question 11219 — Just recently our company released a bulletin to our clients, advising them that a software patch has been released to fix an observed problem in our product. The bulletin mentioned that the fix is necessary to avoid a system outage on our product. The client after reading the bulletin, decided not to apply the software patch to their system. Sometime later, the problem appeared on the client's system and an outage was observed by the client. The question: Should this outage be included in any of the TL 9000 measurements?

Answer — If the customer has decided to delay the deployment of the fix or to not deploy it at all due to reasons of its own not related to verification of the fix and the performance of the new software, then any new outages due to the problem fixed by the new software would be considered customer attributable and counted in SO1/SO2 and not counted in SO3/SO4. It is important to note that if the fix is only available to the customer in a software release that they must purchase, then the fix has not been delivered to them and all events related to the problem will count in SO3/SO4. See counting rule 6.1.4.b)4). The same logic applies to SONE.

Question 12615 — When applying rule 4 of 6.1.4 b), when does fix deployment start:
1) When the customer acquires the fix through download or receipt of media?
2) When the customer first loads the fix on a lab or acceptance system for testing?
3) When the customer first loads the fix on an in-service system?
4) How do we know when the customer has commenced deployment?


Answer — Rule 4 is only concerned with a decision by the customer not to implement a that would have prevented an outage.

Rule 4 explicitly pulls in rule 7 if there is an issue with the customer taking an excessively long time to deploy a fix. In those cases, the outage itself is still product-attributable for the outage frequency measures. Rule 7 does require the organization to keep detailed start and stop times for an excessive delay that is to be excluded from the product-attributable duration and counted as customer-attributable duration. If, with customer agreement, it has been determined that the customer has taken an excessively long time to deploy the fix (and the organization has offered to assist with that deployment per the no cost part of the first clause in Rule 4) then outages which occur from that point on would be included in the product-attributable outage frequency but with zero duration and in customer-attributable duration but with no frequency.

Question 12156 — How does SOTS implement Category Table A-3? The SOTS record does not include the category-specific details needed to answer the questions in Table A-3, so should we assume the customer has already excluded the unreportable NE outages? How are outages recorded in the SOTS data record that meet SO counting rules but not SONE counting rules, and the reverse?

Answer — The SOTS template provides all the information required to report all the TL 9000 outage measurements. As noted in the description of the SOTS template, the partial outage information is to be completed in accordance with Table A-3. The value in the form can therefore be used as is for the outage calculations. The same is true of the other data fields in the SOTS record. They are to be taken at face value when performing the calculations for SO, SONE, and SSO. For more information on SOTS see https://tl9000.org/sots/overview.html

Question 9539 — Regarding Outage Calculations: Using category 3.3.1 as an example, the TL 9000 measurements handbook says a partial outage is recognized when there is a loss of 5% or more of provisioned capacity. Outage Downtime calculation begins at this time. Does outage downtime end when the loss of provisioned capacity drops less than 5% or when the system is 100% restored (0%)?

Answer — The information in Table A-3 defines events that are to be reported as a partial outage. Once the event exhibits one or more of the conditions it is reportable in the Network Element Outage measurement. Outage downtime continues until the event is over when all functional capability is restored to the network element. (100% functionality restored).

Question 12561 — I am a service-provider employee responsible for working with our suppliers to measure performance and develop reliability improvement initiatives. We have established a supplier report card using many of the standard TL 9000 measurements to measure supplier performance.

I have a question on the counting rules for Partial impact outages. We are seeing a number of outages where a portion of the outage is below the TL 9000 5% provisioned capacity impact threshold and the balance of the outage is above it. As an example, we recently had an outage that was 943 minutes in duration. The first 937 minutes of the event had a 4% impact to capacity and the last 6 minutes had a 14% impact.

How should we be treating these outages when calculating the partial impact duration? Do we disregard the portion of the outage that is below the 5% threshold, or since a portion of the event was above the threshold, weight the duration for each portion of the outage by its impact and sum the weighted durations for an overall impact?

Answer — Per the rules for the SONE measure, only the time after the 5% threshold was reached would be counted as a partial impact outage. The time where the impact was below the 5% threshold would not be counted in SONE. The entire time would be counted in the SO service impact measure as there is no minimum customer impact floor for that measure. SO does not include any loss of NE functionality other than customer traffic while SONE includes loss of OA&M, alarms, and other non-traffic related capabilities. These differences are why the two outage measurements are included in the TL 9000 measurement set.

Question 13114 — I have a query regarding outage impact assessment on a partial outage on an MSC. The MSC in question supports both mobility traffic and gateway traffic (land-to-land). For simplicity sake, let us suppose that these call volumes are roughly equal (50-50). Let us suppose that there was a degradation in the system that affected only the mobility traffic. 25% of the mobility traffic was impacted. In such a scenario, our assessment is that the MSC had a 12.5% impact on a Nodal level (25% of 50% of the total traffic on the MSC). Can you please confirm that this is an accurate assessment of the impact?

Answer — Yes, this is an accurate assessment.