precare_PdM

GEM PRECARE PdM PREDICTIVE MAINTENANCE

GEM Precare IIoT Platform Predictive Maintenance Solution

Abstract

Ensuring optimal machine operation is of central importance in any industrial activity. Any degradation in operation, be it availability, performance, or quality, impacts the bottom line, which can be millions of dollars in high volume manufacturing. Therefore planned maintenance at regularly scheduled intervals is a broadly accepted concept and practice in the industry. However, such a practice is sub-optimal in serving the goal of optimal machine operation. Regularly scheduled maintenance can be overkill on the one hand, leading to
unnecessary scheduled machine downtime and therefore productivity loss. On the other hand it doesn’t entirely prevent an unforeseen machine break down. The Holy Grail is to be able to predict when to schedule necessary maintenance under any circumstance before a break down occurs. Big data analytics and machine learning are the primary tools available to manufacturers to predict when to schedule maintenance.

Maintenance Strategies Compared

Maintenance strategies can be categorized as follows:
  1. Reactive maintenance. Maintenance is only performed when the machine breaks down.
  2. Preventive maintenance. Maintenance is performed at regular intervals.
  3. Conditional maintenance. Maintenance is scheduled in anticipation of a breakdown, based on monitoring of the asset and using prior experience to assess the need to schedule maintenance.
  4. Predictive maintenance. Maintenance is scheduled in anticipation of a breakdown, using big data analytics and machine learning to predict the probability of a failure by models trained on discerning usage and wear data patterns. Since maintenance is scheduled not later and not sooner, but at the right moment, predictive maintenance can also be called just-in-time maintenance.
Reactive maintenance sits at the highest end of the cost impact scale and at the lowest end of the maintenance anticipation scale, whereas predictive maintenance sits at the lowest end of the cost impact scale and at the highest end of the maintenance anticipation scale.

Another very important distinction between predictive maintenance and the other maintenance strategies is the fact that the former will be able to predict maintenance to be performed ahead of time even when machine conditions change relatively more rapidly than usual. In contrast, the preventive maintenance strategy will need to fall back to reactive maintenance in case of such conditions, while in case of the conditional maintenance strategy the unusually rapid changes may not be recognized early enough, causing here too the need to fall back to reactive maintenance.

An important aspect of predictive maintenance is that machine learning is able to learn at a much faster rate from historical data then humans are capable of. For instance, if years of recorded machine data are already available, then machine learning algorithms can learn from this data in a matter of hours, whereas a person would need days, weeks, or even longer to analyze the data and learn from it. And even then, it may not be possible at all for a person to learn from it if the data is only available in numeric form and not in a form that is suitable for any of the five senses. Furthermore, if the person leaves the manufacturing organization after having learned from the data, the knowledge is lost and the process has to start all over again.

GEM Precare Predictive Maintenance Worklow

GEM Precare is an Industrial IoT (IIoT) solutions platform that combines real-time machine data acquisition with big data analytics and machine learning. GEM Precare agents installed at the machine (i.e., edge) stream data in real-time to the GEM Precare cloud for big data analytics and training of predictive maintenance machine learning models. 
These agents are able to collect data through any means, any protocol and any physical interface, including directly from sensors, from networked data stores, over OPC UA client-server protocols, over Industrial Ethernet, over SCADA, UART, etc. The agents are highly portable to any embedded ARM or x86 hardware platform, with support for any type of physical I/O and network port.

The agents create digital twins for the machines, so that the data provided to the GEM Precare PdM module has the correct semantic meaning attached to it and not just a raw stream of 1’s and 0’s or numbers. This greatly facilitates human interpretation as well as pre-processing and feature extraction which are both very important parts of a machine learning workflow next to big data pools. 
The next stage in the workflow is the training and validation of two or more predictive models or two or more variations of the same model. Although this stage is typically performed in the cloud due to its highly scalable inter-connectivity, storage and compute resources, it is possible to perform this also at the edge (i.e., machine learning at the edge) with the right hardware platform.

In the final step the model or model variation with the best prediction accuracy is selected and deployed. This however does not mean that the cycle is complete and no more training is needed. A machine learning algorithm is able to extract from the training data a statistical model with specific model parameter values that best fit the data. As long as the underlying statistical model or model parameter values don’t change, the algorithm will perform as expected. 
However, small changes will cause the model parameters to drift and over time this drift will have become significantly large enough that the algorithm’s accuracy is noticeable negatively affected. For this reason the GEM Precare PdM module continues the learning and validation process in perpetuity.

Model Specificity and Right Data Set
It is important to spend a few words on the importance of:
  • The highly specific nature of a trained machine learning model.
  • The use of representative training and validation data sets.
Usage patterns and configurations of machines in a manufacturing operation may differ from machine to machine, even when all machines are of the same make and model. For instance, one of two identical machines is operated 5 hours per day over 3 months, then 10 hours per day over the next 9 months, while the other machine is used 10 hours per day for the first 10 months, then 7 hours per day over the next 2 months. The first machine is configured to probe 100 points per chip, while the second machine is configured to probe 50 points per chip.

Since machine learning inherently bases model parameters on the data it is trained on, each machine’s data will result in a different set of values for the model parameters. Hence, using the set of parameter values for one machine on a different machine will most likely not result in the same prediction accuracy for the two machines. It is therefore important to realize that machine learning will have to be performed individually for each machine separately if not all usage patterns and configurations across all the machines are considered in the training and validation data sets. 

The makeup of the training and validation data sets is of special importance. If not properly constituted, the training of a perfectly suitable machine learning model will yield unusable parameter values. If a training and validation data set is chosen with very few failure events, the model will be trained to recognize no failures at all, yet the accuracy of the training and validation data sets will be very high, for instance better than 99%. Such data sets are referred to as being skewed and machine learning models trained and validated on these data sets yield very poor results in real-life situations.

GEM Precare PdM Triggers
As explained before, the aim of predictive maintenance is to schedule maintenance not too soon, not too late, but just in time. The relevant question is therefore:
        “In how many days from today will maintenance be necessary?” 

The GEM Precare PdM solution asks the same question in a different way:
      “Has the probability for a failure exceeded the specified threshold for the specified number of days from today?”
Formulating the question in this way is more practical since in real-life scheduling resources to perform the maintenance has to happen a minimum of hours or days in advance. One can then specify a probability threshold above which the maintenance is being scheduled the specified number of days in advance.

Of course the previous will not prevent the possibility of a reactive maintenance event. For example, assume that the specified number of days (P) in the above question is 30 days, and the usual rolling 30 day prediction exhibits a gradually increasing probability (case 1). However, due to some unforeseen circumstance the probability has a much steeper upward trend (case 2) than the usual (case 1). Following the current trend, the probability is likely to cross the threshold within the next 5 days (a << P) for example, instead of the next 30 days (P). 
Therefore, the GEM Precare PdM solution also monitors the slope of the trend along which the probability progresses. In case this slope predicts that the preset probability threshold will be exceeded earlier than within the specified number of days, then Precare will issue notifications as it counts down towards the day that the failure probability is predicted to cross north of the threshold.

GEM Precare PdM Example
Following are the results obtained for real-world training and validation data sets. The data sets contained sensor and machine configuration data for operation within normal and outside of normal margins. The machine learning model used was based on a neural network, consisting of an input layer, two hidden layers and an output layer which provides the probability of a failure within specified number of days from today.

The results are listed in the table below. The probability stays small in the first 85 days, but starts to increase thereafter, and accelerates from day 100 towards day 160, with the actual failure occurring on day 159.
Days from Today Failure Probability (%) Days from Today Failure Probability (%)
25 0.004 100 4.800
50 0.736 110 12.970
75 0.976 115 51.790
85 1.660 125 91.500
90 5.187 150 90.120
95 6.340 160 97.620
             Actual Failure: Day 159
The 1st plot below shows graphically how the failure probability evolves from day 1 to day 160. The interpolation illustrates how the failure probability accelerates as mentioned before. The 2nd plot shows how the failure probability evolves from day 1 to day 100. The 3rd plot shows how the failure probability evolves from day 1 to day 110.

Set for a failure probability threshold of 85%, our PdM solution forecasts for the data available up to day 100 in the 2nd plot that the failure will occur at day 212 with a probability of 98%. Given that the actual day of the failure is at day 159, this prediction is 67% accurate. Repeating the forecast based on the data available up to day 110 in the 3rd plot yields a forecast for the failure to occur at day 157 with a probability of 100%. Given that the actual day of the failure is on day 159, the accuracy of this prediction is 99%. 

By continuing to predict as more data becomes available, Precare PdM is able to latch on to a trend towards failure resulting in rising predicted failure probabilities at higher and higher accuracy. In the example here, in case the minimum time for advanced allocation of maintenance/repair resources is 30 days, the prediction at day 100 would hold off reserving maintenance resources until day 182, but then change this to day 127 when the prediction changes from day 212 at 98% to day 157 at 100%.
These results illustrate the impressive accuracy with which GEM Precare PdM is able to forecast just-in-time maintenance with ample time to book the resources to perform the maintenance/repair. These results also illustrate how GEM Precare PdM is able to react to a sudden rapid deterioration of the machine’s condition, therefore avoiding reactive maintenance at high cost impact.

OEE Impacts
OEE (Overall Equipment Effectiveness) is an important KPI for manufacturers to measure how effective their manufacturing assets are utilized. OEE is expressed as OEE = availability x performance x quality

Each of these components is measured as follows:
  • Availability; the ratio of actual production time (unplanned down time excluded) and total run time (unplanned downtime included).
  • Performance; the ratio of actual and optimal machine throughput.
  • Quality; the ratio of good units and total units produced.
GEM Precare compiles these KPIs automatically from the data collected by the GEM Precare agents and provides the ability to zoom in on problem areas. Each of the three OEE components is positively affected by the ability to predict when to perform maintenance. Equipment availability increases by just-in-time maintenance over any of the other types of maintenance mentioned before. The same holds true for performance and quality, since maintenance may improve the throughput of a machine and may reduce the quantity of defects in manufactured units, respectively.

Customer Journey

Deploying GEM Precare PdM on the manufacturing floor starts with the deployment of GEM Precare agents for data acquisition and creation of digital twins of the machines for which one desires to apply predictive maintenance to. The data being captured must contain machine failure events as well as the historical data leading up to the events. This is very important since the training is done using a supervised training method, where failure events are labeled for the machine learning algorithm to be able to distinguish these events from the rest of the data. Supervised learning is similar to learning by example.

In case there is already a repository of historical machine data available, this data set can be used instead to start with, or in conjunction with the acquisition of new data, provided that semantics can be assigned to the data for feature extraction and preprocessing. Note that a pre-existing data set does not obviate the need for ongoing acquisition of machine data for reasons explained before. Therefore, the deployment of the agents is a necessary step. One or more machine learning models are trained, evaluated and results compared based on specified number of days advanced notification for maintenance. The most accurate model is selected, or further tweaking of model attributes is done (e.g., number of nodes or layers in a neural network model) for the most promising model in case the accuracy does not meet the desired minimum accuracy. Once the desired accuracy is achieved on the training and validation data sets, the model is deployed.

Conclusions

GEM Precare provides manufacturers with a powerful solution for IIoT connectivity of any machine for big data and predictive analytics. GEM Precare PdM takes advantage of machine IIoT data to predict failures and when to perform just-in-time maintenance/repairs with enough time to allocate resources, by using powerful machine learning algorithms and methodologies. This has immediate positive impact not only on gross margins and operating margins, but also on revenues as it improves OEE. Furthermore, GEM Precare PdM enables just-in-time maintenance scheduling, customized per individual machine, and learns on a continuous basis, therefore adjusting with changing machine usage, configurations and other factors. Finally, GEM Precare PdM’s very high accuracy not only rivals that of a human expert, but learning is achieved at a much faster rate than humanly possible. Therefore, manufacturers deploying the GEM Precare PDM solution improve their OEE and their bottom line.

Contact Us
Share by: