Researchnology Co. – Page 2 – Transform R&D into competitive advantage

集成电路生产设备及过程的资格认证程序

资格认证的目的是提高生产过程和设备的效能。通过统计学方法描述和量化生产过程和设备的运行特征，成为资格认证成败的关键。

资格认证程序由三个阶段构成，阶段一，建立过程和设备基线；阶段二，描述过程特征并优化，确认其稳定性；阶段三，改进设备的可靠性，展示可制造性和竞争力。要完成每一阶段，都要用到一些重要的统计学方法。在进一步介绍如何使用统计学方法之前，我们在本文中首先了解一下资格认证的每一个阶段及要达到的目标。下图显示的是资格认证各级段的流程图。

阶段一，建立过程和设备基线。

测量仪器，测量系统，过程和设备的效能

项目规划初始培训。

这是第一步，主要完成项目规划和商业运作相关内容。项目成员和客户要制定项目要达到的目标和为达到目标所需要的资源。内容包括：具体分列最后要完成的目标；数据收集的方法；结果如何发布；用以衡量成功与否的基线。开始对测量系统、生产设备、软件使用的培训，并持续贯穿于整个项目执行的全过程。

测量仪器容量测试 (Gauge Capability Study)

测量仪器容量测试的目的是保证测量工具能够完成需要的测量任务。保证持续不断的稳定性、准确性，测量值和实际值高度接近。测量数值的变化率保持在对特定目标测量允许的变动范围之内，同时意料之外的工具特征也受到统计过程控制限制。项目中的每一件测量工具都要经过测量仪器容量测试。接下来的博文中会逐段介绍几个基于统计学方法的实例。

被动数据收集

变化性的根源，评估取样设计，建立连续的稳定性，

总结
过程 & 设备硬件改进

阶段二，描述过程特征并优化，确认其稳定性。

阶段三，改进设备的可靠性，展示可制造性和竞争力。

摩托罗拉铁人方案，生产过程、工艺、软件可靠性改进测试方案，又称夜间设备可靠性改进计划。通过对设备极限使用，即连续不停1000小时以上保持设备在极限端运转，模拟真实生产场景，观测并发现软、硬件出现的各种问题。一般是白天对制造流程故障溯源，提出并实施改进方案；夜间极端运转流程，观测，记录各种故障。为提高效率，工厂一般是一边实施铁人方案，一边在实际生产过程中使用改进成果。

马拉松方案，一种通过模拟真实使用状态，对设备使用效率量化的方法。一般24小时不停，连续运转几个星期，记录每次检修间隔的平均时长、中值、变化率等统计数字，从而推算设备在实际生产过程中的效能，以及对硅晶生产增加的成本。这种方法还可以用来检测硬件和软件的可靠性，及早制定应对方案。

简单量化方法提高产品质量和技术含量(一)

产品质量和技术含量是企业生存的基础. 无论企业的大小都是如此. 一般认为要提高核心竞争力, 企业需要很大的投入. 事实却是如此. 但企业家不要因此而忽视了身边一些可以立刻改进提高的操作方法. 这些方法不需要很大投入却可以取得很明显的效果. 在即将发表的这个系列的文章中, 我们将集中介绍一些量化产品生产要素的方法, 帮助企业家在短时间内提高产品市场竞争力. 请广大客户集中关注.

本系列介绍的方法集中在以下四个角度:

收集简单，可靠的数据。
倾听用户的反馈
研发团队的基础数据分析能力
量化析因方法

收集简单，可靠的数据

Publish R Markdown Document on WordPress

It is possible to write R Markedown then publish it on a web site in WordPress. WordPress is a software that manage the interraction between web visitors and the web server. It functions analogous to Php. Website owner do not need to know Php in order to have a website running. The Rmarkdown package posts raw Rmarkdown files to WordPress software directly by running a R files within RStudio. Running this code after saving the “post.RMD” in the same directory,

options(WordPressLogin=c(your_own_user_name='your_password'),
        WordPressURL='https://yourwordpressaddress.com/xmlrpc.php')

knit2wp(input='post.RMD', title = 'RWordPress Package',post=FALSE,action = "newPost")

This uploads the .Rmd file “post.RMD”. Next the website owner will need to log into the admin page of the WP site, click this file, then push the Publish button to publish the document.

After publication the owner will use “post id” to update this post. The post id can be found in the edit article URL. Once you are in the post editor, view the post's URL in your web browser's address bar to find the ID number. For example, the URL for this post is

http://researchnology.com/wp-admin/post.php?post=378&action=edit

here the post id is “378”. To post an edit of ths document, issue this command in R

knit2wp(input='post.RMD', title = 'RWordPress Package',post=FALSE,action = "editPost",postid=378)

That's all to it. The Rworldpress package is not actively maintained as of Dec. 21, 2021. So if WordPress makes any change in the future, the above steps may fail. R package “blogdown” was said to have some functions similar to Rwordpress. Let's check it out.

Note this article itself is an R Markdown document. For more details on using R Markdown see http://rmarkdown.rstudio.com. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

plot of chunk pressure

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

SeMa Technology Series (1)

Statistical methods reinforcing semiconductor manufacturing – review and discussion of U.S. Semiconductor Industry Qualification Plan

Outline: A quick summary of the Qualification Plan, how statistical methods fit in each of the three stages of the Qualification Plan, with actual case reference, and quick insight/discussion on the contribution of statistical methods. Focusing on the substantiated contribution (vs. unverified, theoretical suppositions) of each of the statistical methods.

Relevance of Statistical Methods in Manufacturing: Even in today’s high-tech manufacturing environment with very precise measurements and powerful processing capabilities arising from the advanced technologies, the demand for stricter equipment baseline settings, the discovery of new processing techniques to solve emerging problems, a thorough understanding of the process and equipment capacities, as well control of the high-tech manufacturing process is still relevant. For the majority of industries and do not employ high precision equipment, traditional statistical methods should live strong and well for a long time.

Three areas that we focus on to achieve immediate impacts for tech businesses are:

Gauge study. High-precision equipment can achieve even better goals
Minimizing experiment runs for expensive experiments. Design experiments to suit specific production environments. Optimization of a process. Discover the cause of failures or defects.
Measure and control process and equipment capacity

We will discuss the ways to statistical methods that would improve each of the three areas with actual case examples. In the case example, we will emphasize how the statistical methods accelerate the efforts, make impossibles possible, saved expenses and deliver better products. Below is a diagram of an ideal manufacturing qualification plan should look like, each of the three stages involves many of the statistical techniques we will discuss in this sequence of articles.

Data Analytics Skills that Accelerate Scientific Discovery (1)

The following main skills are essential for researchers and technology innovators:

Data summarizing knowledge
Uncertainty and quantification of uncertainty
Predictive models
Design and analysis of experimental data

None of these are either trivial or easy. We will discuss in separate posts the above topics for practical application that will provide immediate benefits. Further study is always welcome such as through university courses or reading advanced texts. In each of the posts, we will first summarize the basic knowledge, then illustrate how this knowledge may be applied in the real world setting using one or multiple scientific and technological application examples.

1. Data Summarization Basics

Data Summarizing Knowledge is the basic skill for all data analysis methods. A good understanding of the data provides a foundation for locating the best method to tackle scientific and technological problems. To understand data, the first step would be to check on

Types of the data (numerical, categorical, or a mix of all)
Structure of the data (a series, multiple series such as in a table, unstructured such as texts or images)

For numerical data, to summarize the data we need to focus on

The center of the data (mean, median, mode, quantile)
The variation of the data (variance, max, min, range)
The distribution pattern (symmetric vs. tailed, the direction of skewness)

For categorical data, to summarize we need to check

The frequencies or relative frequencies of each category

If the data contains multiple series such as those usually appear in a table, in addition to the above actions on each of the individual series we need to check the statistical relationships between the series (columns or variables in a table) as well. The most common statistical relationship is the linear correlation. A linear correlation exists between numerical series, between numerical and categorical series, between categorical and categorical series. More about that will be described later. A complete correlation matrix helps us understand which two series are closely related. Note this is just to gain very basic knowledge, there are many relationships that are hidden quite deep, we will need more advanced methods to discover, which we will introduce later. Linear correlation paints a direct picture of the association between the series. Often it tells us how these series are related.

通过维护电力变压器学习预防性维护方法（之一）

预防性维护是现代工业技术中一种高效的维护方法。借助历史数据和统计模型，可以快速辨别即将损坏的设备，可以大幅降低运营成本。借助开源R软件包，上手简单，方法易学，本文教你分快速入门。

关键词

统计模型，预防性维护方法，相关性距阵图，直方图，电力变压器，设备爆炸的预防，输电线网，供电线路稳定，现代工业加速器，故障概率，统计分析R软件。

一．

普普通通的电力变压器在输电线网中，将高压电降为低压电后传送到普通用户。但是如果不及时维修，它就会爆炸。这是为什么哪？在变压器内，里面装满了散热油。如果没有油，降压产生的巨大热量会让变压器立刻烧毁。但是在高压电环境下，油料会发生化学反应，生成甲烷，乙烷，乙烯，乙炔，氢气，一氧化碳等气体。当这些气体囤积到一定程度时就会引发爆炸。为了保证供电线路稳定，电力公司要在事故未发生时，及时地对变压器检修。但是，要在成千上万的变压器中找到需要检修的并不容易，不是所有年龄到了的变压器就需要检修。现在我就给大家介绍一种现代工业维护加速器，预防性维护方法。

二

要研究就一定要有数据。我们拿到了美国某大电力公司31，031台变压器的检修记录。这些数据记录了变压器的使用时间，是否发生过故障，故障的种类，以及变压器缸中气体的含量。现代仪器可以只提取一点改变压器内部的气体，快速分析出其中的各种气体含量。我们通过分析，要找出哪种气体，或者哪几种气体和变压器故障高度相关。我们还要估测故障概率是如何随时间，随每种气体的变化，而变化的。这可以帮助我们有选择地维修变压器，而不是只按年龄维修。因为许多变压器即使年龄很高，但只要内部气体还没有达到一定的量，也没有损坏的风险的。

三

用开源统计分析R软件的“corrplot”包，我们可以很轻松的画一个直观的二位相关系数距阵图。具体的指令可见在结尾下载的R文件。

在这张相关性距阵图上，深蓝色表明正相关，深红色表明负相关。相关的一对变量分别标在对应的行和列上。我们首相应该注意的是与第二行“变压器是否损坏”相关比较高的变量。这几个分别是甲烷，氢气，乙烷和全部气体，也就是说，甲烷，氢气，乙烷和全部气体与变压器损毁正相关性较强。反之，乙烯，乙炔，一氧和二氧化碳含量与变压器是否损毁没有相关性。另外，在各种气体之间，我们也注意到比较高的正相关性。这个土可以很直观地揭示潜在的导致变压器损毁的气体

#################### Readin Rds data
Transformer <- readRDS("data/Transformer.RDS")
my_data <- Transformer[, c(6, 12:19)]

#################### Calculate and display the correlation in correlogram
par(mfrow=c(1,1))
res <- cor(my_data, use = "complete.obs")
corrplot(res, type = "upper", order = "hclust", tl.col = "black", tl.srt = 45)

四

下面我们再用直方图分析各种气体含量的分布。这里，我们比较一下各种气体含量在损坏变压器和完好变压器中分布的差异。如果哪种气体差异大，那么就说明这种气体可以帮助找到要出故障的变压器。这里红色的是出了故障的变压器，蓝色的是完好的变压器。我们另外画了拟合的密度线来帮助辨识。

#################### histograms of gas levles of transformers (Hydrogen only)
failure <- unlist(my_data %>% filter(Eventual_Failure == 1) %>% select(Hydrogen))
operational <- unlist(my_data %>% filter(Eventual_Failure == 0) %>% select(Hydrogen))

log_failure <- log(failure+2)
log_operational <- log(operational+2)

hist(log_operational, freq=FALSE, col='skyblue', border=F, xlim=c(0, 15), ylim=c(0, 0.4),
     ylab="密度", breaks=seq(0, 15, length.out=16),
     main = "", xlab="log(氢气)")
hist(log_failure, freq=FALSE, add=T, col=scales::alpha('red', 0.25), border=F, 
     breaks=seq(0, 15, length.out=16))
y1 <- density(log_operational, bw=0.7)
y2 <- density(log_failure, bw=0.7)
lines(y1, col = "blue", lty=2, lwd=2)
lines(y2, col = "red", lty=2, lwd=2)
title(main="氢气 (Hydrogen)")

浏览一下我们会发现，有几种气体差异还是比较大的。比如甲烷，发生故障的变压器甲烷含量多介于7-9之间，而完好的多介于0-5之间；又比如乙烷，损坏的变压器乙烷含量多介于5-9之间，而完好的基本小于5。氢气，乙炔和总气体也略微可以分出区别。而一氧化碳，二氧化碳，乙烯，乙炔好像区别不大。这样电力公司只要测量一下甲烷，乙烷，氢气的含量，就可以大概知道变压器是否需要维修了。比如甲烷，如果含量在7-9之间，就应该维修。如果是乙烷，在5-7之间就应该检修了。

总结

但是这样做比较笼统，不够精确。比如，有的变压器即使甲烷含量比较高但依然可以正常工作，乙烷大于5也没有损坏。相反，不少变压器还没有达到某气体的危险程度就已经坏了。有一个重要因素我们还没有考虑进去，这就是变压器本身的使用年龄。在接下来的视频中，我们将介绍如何通过一个统计模型，同时使用年龄和各种气体含量，来更准确地估测设备的故障概率。

video-R-demo-script Downloa

transformer data Download

Design of Experiment (DOE) Response Surface Methods (RSM) to Optimize Wafer MOSFET Polysilicon Gate Etching Production with R in 10 Minutes

In integrated circuit (IC) manufacturing, engineers need to ensure the polycrystalline lines on the wafer are perfectly straight up. There are millions and millions of these tiny lines with a square millimeter area, and these billion lines are created together in a plasma chamber. Today we will introduce an experiment method to find the best equipment settings, the method of Response Surface.

Reactive-ion etching (RIE) is a microchip silicon wafer etching technology in chip fabrication. It uses chemically reactive plasma to remove patterned silicon dioxide “film” deposited on wafers. The plasma is generated under a low-pressure vacuum by Radio Frequency electromagnetic field, with chemical gas vapor injected in. The right combination of Radio Frequency (RF) electric field power, the pressure of the vacuum, and hydrogen bromide (HBr) gas injected into the etch chamber are the key factors lead to the quality silicon wafer. Engineers will need to ensure the profile of the polycrystalline silicon gates isotopic, that is, the walls of the etch lines should be vertically perpendicular to the substrate in all directions.

In this study, the engineers would like to find the right processing settings for this etching equipment. As this is a million-dollar business, we are going to help them, using design of experiment methods.

Data and sample R commends (user needs to load data to R)

Silicon-gate-etching-data Download

video-demo-R-script Download

View the video

Logistic Growth Curve Forecasts a Receding Trend of Confirmed Cases and a Total Stabilization around 50,000 near February 20th

The pattern is getting clearer. With my 3-parameter logistic growth model fits as of February 10, 2020, the total confirmed cases will stabilize around 50,000 cases. The model came with a 0.9994 R-square.

Such a near-perfect model-fitting with 27 data points is not surprising. China’s enormous national effort to collect the patient data enables results in high-quality data, which approximates very closely many natural growth patterns.

Dawn of the Battle Against 2019-nCoV Emerges

With the latest confirmed 2019-nCoV infections on Feb. 7, 2020, the time when new confirmed cases will stop emerging. As indicated in the forecast shown below based on a simple order-6 polynomial model on confirmed cases using data since official data release December 29, 2019.

The significance of this prediction potentially confirms the actual virus incubation period among humans. Most experts have estimated it to be somewhere between 12-15 days. If the Feb. 10 or a few days after turns out to be the date no more significant new confirmed cases, then this estimate is then reasonable, as China started sealing off Wuhan then cutting off essentially all people-to-people form of transfers 17 days prior to the estimated Feb. 10. It also supports the fact no new form of the virus is causing similar infections.

Computer Simulation Study for Estimating 2019-nCoV Prevalence and People-to-people Infection Rate (as of 2020-01-27)

Common infectious disease studies for disease spread pace and infection rates among population rely heavily on fitting predetermined epidemic formulas, which in reality resembles nothing how flu virus actually spread. Here I propose a stochastic simulation model that mimic the actual virus spreading mechanism among people, see the model diagram below.

STOCHASTIC VIRUS SPREADING model

In this stochastic simulation model, we to mimic the actual virus spreading mechanism through computer simulations of how virus are passed from people to people and how people may or may not develop symptom and therefore passes virus to more people the second day. For those developed symptom will be removed out of the spreading chain. We randomly chosen based on a probability distribution the following:

(1) The number of people a virus carrier might meet each day,
(2) The number of contracted patient would develop symptoms, and
(3) Who with confirmed contraction would be quarantined from the population.

Samples are drawn on daily bases, and are drawn based on defined probabilities. This is different from fitting a deterministic model afterward in most published epidermic studies today, and are much closer reflecting how actually the virus spread in real life.