设为首页收藏本站

EPS数据狗论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 2940|回复: 4

工业企业数据库匹配问题

  [复制链接]

1

主题

2028

金钱

5243

积分

高级用户

发表于 2017-9-11 15:24:51 | 显示全部楼层 |阅读模式
向各位大神请教两个在我使用stata进行对历年的中国工业企业数据库匹配时遇到的问题:
     (1)Brandt(2012)公布的code中几个主要的匹配变量究竟是什么?比如geographic code是指邮政编码还是其他什么?
     (2)我根据Brandt(2012)公布的code对1998——2007的数据进行匹配时,在做到匹配连续三年数据时(也就是Step 70: Create a three-year balanced sample )遇到了“variable code does not uniquely identify observations in the master data”这样的错误,无法继续下去。发现用gen code =id_`j'+string(revenue_`j')+string(employment_`j')+string(profit_`j')+province_`j' 生成的code都显示为“...”
     (3)关于stata数据管理,编程方面的教材有没有推荐?急需快速入门。

10

主题

304

金钱

485

积分

入门用户

发表于 2017-9-11 15:34:08 | 显示全部楼层
将“gen code...”中的province删掉以,把merge 1:1换成merge命令试一下,之前也遇到过
回复 支持 反对

使用道具 举报

0

主题

1061

金钱

5697

积分

高级用户

发表于 2017-9-12 09:08:16 | 显示全部楼层
一段工业企业数据库的匹配样本代码,亲测可用,你可以参考一下
  1. clear all
  2. set more off

  3. global PATH "/Volumes/TOSHIBA EXT/Projects/NBS/China Industry Business Performance Data/Match Over Years"
  4. cd "$PATH"

  5. ******************************************************************************
  6. * Part 1
  7. * Befor run this do-file, orignial_1998.dta ~ original_2007 must be already
  8. * generated. Whic means that do-files 1998.do ~ 2007.do have already been
  9. * runned.
  10. *
  11. * Generate a id variable (id_in_source) for further combining data set
  12. * after match over years.
  13. *
  14. * Only keep match variables and id_in_source for the next steps
  15. *****************************************************************************

  16. forvalues i = 1998/2007{
  17.         disp "File `i'"
  18.         use `"../original_`i'.dta"',clear
  19. *        gen id_in_source = _n
  20.         if `i'==2003{
  21.                 gen town = address
  22.         }
  23.         gen cic = cic_adj
  24.         replace cic = real(industry_code) if cic == .
  25.         /*
  26.         if year <2003{
  27.                 gen cic = cic_adj
  28.         }
  29.         else{
  30.                 gen cic = cic03
  31.         }
  32.         */

  33.         if year<2004{
  34.                 gen revenue = sales_revenue
  35.         }
  36.         else{
  37.                 gen revenue = operating_revenue
  38.         }
  39.         gen profit = total_profit
  40.         if year ==1999 | year == 2002{
  41.                 gen employment = staff
  42.         }
  43.         keep id_in_source firm_id firm_name legal_person town province ///
  44.              telephone zip product1 founding_year cic region_code revenue ///
  45.                  employment profit
  46.         destring founding_year revenue employment profit,replace force
  47.         tostring cic,replace format(%04.0f)
  48.         rename firm_id id
  49.         rename firm_name name
  50.         rename founding_year bdat
  51.         rename region_code dq
  52.         rename product1 product1_
  53.         rename telephone phone
  54.         foreach var of varlist *{
  55.                 rename `var' `var'`i'
  56.         }
  57.         compress
  58.         saveold m`i'.dta,replace
  59. }
  60. forval i =1998/2007{
  61.         use m`i'.dta,clear
  62.         des,short
  63. }
  64. *******************************************************************************
  65. * Part 2 Match Firms over years
  66. * current directory should contain m1998.dta ~ m2007.dta after Part 1
  67. *******************************************************************************




  68. **************************************
  69. *                                                                         *
  70. *STAGE 1: Match Two Consecutive Years*
  71. *                                                                         *
  72. **************************************


  73. *change lower case to upper case in "firm id"
  74. forval i =1998/2007{
  75.         use m`i'.dta,clear
  76.         replace id`i' = strupper(id`i')
  77.         compress
  78.         saveold m`i'.10.dta,replace
  79. }
  80. forval i =1998/2007{
  81.         use m`i'.10.dta,clear
  82.         des,short
  83. }
  84. forval i = 1998/2006{
  85.         local j = `i'+1
  86.        
  87.         **Step 10: Match by firm ID (faren daima)**
  88.        
  89.         *deal with duplicates of IDs (There are a few firms that have same IDs)
  90.         disp "Step 10 "
  91.         use m`i'.10.dta,clear
  92.         bysort id`i': keep if _N>1
  93.         compress
  94.         saveold duplicates_ID`i'.dta,replace
  95.        
  96.         use m`i'.10.dta,clear
  97.         bysort id`i': drop if _N>1
  98.         rename id`i' id
  99.         sort id
  100.         keep *`i' id
  101.         compress
  102.         saveold match`i'.1.dta,replace
  103.        
  104.         use m`j'.10.dta,clear
  105.         bysort id`j': keep if _N>1
  106.         compress
  107.         saveold duplicates_ID`j'.dta,replace
  108.        
  109.         use m`j'.10.dta,clear
  110.         bysort id`j': drop if _N>1
  111.         rename id`j' id
  112.         keep *`j' id
  113.         sort id
  114.         compress
  115.         saveold match`j'.1.dta,replace
  116.        
  117.         use match`i'.1.dta,clear
  118.         merge 1:1 id using match`j'.1.dta
  119.         keep if _m==3
  120.         gen id`i' = id
  121.         rename id id`j'
  122.         drop _merge
  123.         gen match_method_`i'_`j'="ID"
  124.         gen match_status_`i'_`j'="3"
  125.         compress
  126.         saveold matched_by_ID`i'_`j'.dta,replace
  127.        
  128.        
  129.         **step20: match by firm names**
  130.        
  131.         *match those unmatched firms in previous step by firm names*
  132.         disp "Step 20 "       
  133.         use match`i'.1.dta,clear
  134.         merge 1:1 id using match`j'.1.dta
  135.         keep if _m==1
  136.         rename id id`i'
  137.         append using duplicates_ID`i'.dta
  138.         bysort name`i': keep if _N>1
  139.         keep *`i'
  140.         compress
  141.         saveold duplicates_name`i'.dta,replace
  142.        
  143.         use match`i'.1.dta,clear
  144.         merge 1:1 id using match`j'.1.dta
  145.         keep if _m==1
  146.         rename id id`i'
  147.         append using duplicates_ID`i'.dta
  148.         bysort name`i': drop if _N>1
  149.         rename name`i' name
  150.         sort name
  151.         keep *`i' name
  152.         compress
  153.         saveold unmatched_by_ID`i'.dta,replace
  154.        
  155.         use match`i'.1.dta,clear
  156.         merge 1:1 id using match`j'.1.dta
  157.         keep if _m==2
  158.         rename id id`j'
  159.         append using duplicates_ID`j'.dta
  160.         bysort name`j': keep if _N>1
  161.         keep *`j'
  162.         compress
  163.         saveold duplicates_name`j'.dta,replace       

  164.         use match`i'.1.dta,clear
  165.         merge 1:1 id using match`j'.1.dta
  166.         keep if _m==2
  167.         rename id id`j'
  168.         append using duplicates_ID`j'.dta
  169.         bysort name`j': drop if _N>1
  170.         rename name`j' name
  171.         sort name
  172.         keep *`j' name
  173.         compress
  174.         saveold unmatched_by_ID`j'.dta,replace
  175.        
  176.         use unmatched_by_ID`i'.dta,clear
  177.         merge 1:1 name using unmatched_by_ID`j'.dta
  178.         keep if _m==3
  179.         gen name`i' = name
  180.         rename name name`j'
  181.         drop _m
  182.         gen match_method_`i'_`j'="firm name"
  183.         gen match_status_`i'_`j'="3"
  184.         compress
  185.         saveold matched_by_name`i'_`j'.dta,replace
  186.        
  187.        
  188.        
  189.         **step 30: match by the names of legal person representatives**
  190.        
  191.         *match those unmatched firms in previous steps by firm legal person representatives*
  192.         disp "Step 30 "
  193.         use unmatched_by_ID`i'.dta,clear
  194.         merge 1:1 name using unmatched_by_ID`j'.dta
  195.         keep if _m == 1
  196.         rename name name`i'
  197.         append using duplicates_name`i'.dta
  198.         replace legal_person`i' = "." if legal_person`i' == ""
  199.         gen code1 = legal_person`i' + substr(dq`i',1,4)
  200.         bysort code1: keep if _N>1
  201.         keep *`i'
  202.         compress
  203.         saveold duplicates_code1_`i'.dta,replace
  204.        
  205.         use unmatched_by_ID`i'.dta,clear
  206.         merge 1:1 name using unmatched_by_ID`j'.dta
  207.         keep if _m == 1
  208.         rename name name`i'
  209.         append using duplicates_name`i'.dta
  210.         replace legal_person`i' = "." if legal_person`i' == ""
  211.         gen code1 = legal_person`i' + substr(dq`i',1,4)
  212.         bysort code1: drop if _N>1
  213.         sort code1
  214.         keep code1 *`i'
  215.         compress
  216.         saveold unmatched_by_ID_and_name`i'.dta,replace
  217.        
  218.         use unmatched_by_ID`i'.dta,clear
  219.         merge 1:1 name using unmatched_by_ID`j'.dta
  220.         keep if _m == 2
  221.         rename name name`j'
  222.         append using duplicates_name`j'.dta
  223. *        replace legal_person`j' = "." if legal_person`j' == ""
  224.         gen code1 = legal_person`j' + substr(dq`j',1,4)
  225.         bysort code1: keep if _N>1
  226.         keep *`j'
  227.         compress
  228.         saveold duplicates_code1_`j'.dta,replace       
  229.        
  230.         use unmatched_by_ID`i'.dta,clear
  231.         merge 1:1 name using unmatched_by_ID`j'.dta
  232.         keep if _m == 2
  233.         rename name name`j'
  234.         append using duplicates_name`j'.dta
  235. *        replace legal_person`j' = "." if legal_person`j' == ""
  236.         gen code1 = legal_person`j' + substr(dq`j',1,4)
  237.         bysort code1: drop if _N>1
  238.         sort code1
  239.         keep code1 *`j'
  240.         compress
  241.         saveold unmatched_by_ID_and_name`j'.dta,replace
  242.        
  243.         use unmatched_by_ID_and_name`i'.dta,clear
  244.         disp _N
  245.         merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
  246.         keep if _m==3
  247.         drop _m code1
  248.         gen match_method_`i'_`j' = "legal_person"
  249.         gen match_status_`i'_`j' = "3"
  250.         compress
  251.         saveold matched_by_legalperson`i'_`j'.dta,replace
  252.        
  253.        
  254.         **Step 40: match by phone number + geographic code + industry code**
  255.        
  256.         *match those unmatched firms in previous steps by phone number + geographic code + industry code*
  257.         disp "Step 40 "       
  258.         use unmatched_by_ID_and_name`i'.dta,clear
  259.         merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
  260.         keep if _m==1
  261.         drop code1
  262.         append using duplicates_code1_`i'.dta
  263.         replace phone`i' = "." if phone`i' == ""
  264.         gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
  265.         bysort code2 : keep if _N>1
  266.         keep *`i'
  267.         compress
  268.         saveold duplicates_code2_`i'.dta,replace
  269.        
  270.         use unmatched_by_ID_and_name`i'.dta,clear
  271.         merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
  272.         keep if _m==1
  273.         drop code1
  274.         append using duplicates_code1_`i'.dta
  275.         replace phone`i' = "." if phone`i' == ""
  276.         gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
  277.         bysort code2 : drop if _N>1
  278.         keep code2 *`i'
  279.         sort code2
  280.         compress
  281.         saveold unmatched_by_ID_and_name_and_legalperson`i'.dta,replace       
  282.        
  283.         use unmatched_by_ID_and_name`i'.dta,clear
  284.         merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
  285.         keep if _m==2
  286.         drop code1
  287.         append using duplicates_code1_`j'.dta
  288. *        replace phone`j' = "." if phone`j' == ""
  289.         gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
  290.         bysort code2 : keep if _N>1
  291.         keep *`j'
  292.         compress
  293.         saveold duplicates_code2_`j'.dta,replace       

  294.         use unmatched_by_ID_and_name`i'.dta,clear
  295.         merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
  296.         keep if _m==2
  297.         drop code1
  298.         append using duplicates_code1_`j'.dta
  299. *        replace phone`j' = "." if phone`j' == ""
  300.         gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
  301.         bysort code2 : drop if _N>1
  302.         sort code2
  303.         keep code2 *`j'
  304.         compress
  305.         saveold unmatched_by_ID_and_name_and_legalperson`j'.dta,replace               
  306.        
  307.         use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
  308.         merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
  309.         keep if _m==3
  310.         drop _m code2
  311.         gen match_method_`i'_`j' = "phone number"
  312.         gen match_status_`i'_`j' = "3"
  313.         compress
  314.         saveold matched_by_phone`i'_`j'.dta,replace
  315.        
  316.        
  317.         **step 50: match by code = founding year + geographic code + industry code+ name of town + name of main product
  318.        
  319.         *match those unmatched firms in previous steps by founding year + geographic code + industry code+ name of town + name of main product
  320.         disp "Step 50 "       
  321.         use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
  322.         merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
  323.         keep if _m==1
  324.         drop code2
  325.         append using duplicates_code2_`i'.dta
  326.         replace town`i' = "." if town`i' == ""
  327.         replace product1_`i' = "." if product1_`i' == ""
  328.         gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
  329.         bysort code3: keep if _N>1
  330.         keep *`i'
  331.         compress
  332.         saveold duplicates_code3_`i'.dta,replace
  333.        
  334.         use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
  335.         merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
  336.         keep if _m==1
  337.         drop code2
  338.         append using duplicates_code2_`i'.dta
  339.         replace town`i' = "." if town`i' == ""
  340.         replace product1_`i' = "." if product1_`i' == ""
  341.         gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
  342.         bysort code3: drop if _N>1
  343.         sort code3
  344.         keep code3 *`i'
  345.         compress
  346.         saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,replace

  347.         use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
  348.         merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
  349.         keep if _m==2
  350.         drop code2
  351.         append using duplicates_code2_`j'.dta
  352. *        replace town`j' = "." if town`j' == ""
  353.         replace product1_`j' = "." if product1_`j' == ""
  354.         gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
  355.         bysort code3: keep if _N>1
  356.         keep *`j'
  357.         compress
  358.         saveold duplicates_code3_`j'.dta,replace

  359.         use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
  360.         merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
  361.         keep if _m==2
  362.         drop code2
  363.         append using duplicates_code2_`j'.dta
  364. *        replace town`j' = "." if town`j' == ""
  365.         replace product1_`j' = "." if product1_`j' == ""
  366.         gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
  367.         bysort code3: drop if _N>1
  368.         sort code3
  369.         keep code3 *`j'
  370.         compress
  371.         saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta,replace
  372.        
  373.         use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
  374.         disp _N
  375.         merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
  376.         keep if _m==3
  377.         drop _m code3
  378.         gen match_method_`i'_`j' = "code 3"
  379.         gen match_status_`i'_`j' = "3"
  380.         compress
  381.         saveold matched_by_code3_`i'_`j'.dta,replace
  382.        
  383.         use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
  384.         merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
  385.         keep if _m == 1
  386.         drop _m code3
  387.         append using duplicates_code3_`i'.dta
  388.         gen match_method_`i'_`j' = ""
  389.         gen match_status_`i'_`j' = "1"
  390.         compress
  391.         saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta,replace
  392.        
  393.         use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
  394.         merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
  395.         keep if _m == 2
  396.         drop _m code3
  397.         append using duplicates_code3_`j'.dta
  398.         gen match_method_`i'_`j' = ""
  399.         gen match_status_`i'_`j' = "2"
  400.         compress
  401.         saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta,replace
  402.        
  403.                        
  404.         **step 60: merge the matched and unmatched fils to create files of to consecutive years**       
  405.         disp "Step 60 "       
  406.         use matched_by_ID`i'_`j'.dta,clear
  407.         append using matched_by_name`i'_`j'.dta
  408.         append using matched_by_legalperson`i'_`j'.dta
  409.         append using matched_by_phone`i'_`j'.dta
  410.         append using matched_by_code3_`i'_`j'.dta
  411.         append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta
  412.         append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta       
  413.         compress
  414.         saveold m`i'-m`j'.dta,replace       
  415. }
  416. forval i = 1998/2006{
  417.         local j = `i'+1
  418.         use m`i'-m`j'.dta,clear
  419.         tab match_method_`i'_`j'
  420.         tab match_status_`i'_`j'
  421. }



  422. *********************************************
  423. *                                                                          
  424. *STAGE 2: Match over Three Consecutive Years
  425. *                                                                          
  426. *********************************************


  427. forvalues i = 1998/2005{
  428.         local j = `i'+1
  429.         local k = `i'+2
  430.        
  431.         **Step 70: Create a three-year balanced sample
  432.         disp "Step 70 "       
  433.         use m`i'-m`j'.dta,clear
  434.         keep if match_status_`i'_`j' == "1"
  435.         keep *`i'
  436.         compress
  437.         saveold unmatched`i'.10.dta,replace

  438.         use m`i'-m`j'.dta,clear
  439.         drop if match_status_`i'_`j' == "1"
  440.         gen code = id`j'+string(revenue`j')+string(employment`j')+string(profit`j')+province`j'
  441.         sort code
  442.         compress
  443.         saveold m`i'-m`j'.10.dta,replace
  444.        
  445.         use m`j'-m`k'.dta,clear
  446.         keep if match_status_`j'_`k' == "2"
  447.         keep *`k'
  448.         compress
  449.         saveold unmatched`k'.10.dta,replace

  450.         use m`j'-m`k'.dta,clear
  451.         drop if match_status_`j'_`k' == "2"
  452.         gen code = id`j'+string(revenue`j')+string(employment`j')+string(profit`j')+province`j'
  453.         sort code
  454.         compress
  455.         saveold m`j'-m`k'.10.dta,replace

  456.         use m`i'-m`j'.10.dta,clear
  457.         merge 1:1 code using m`j'-m`k'.10.dta
  458.         drop _m code
  459.         keep if match_status_`i'_`j'=="3" & match_status_`j'_`k'=="3"
  460.         gen match_status_`i'_`k'="3"
  461.         gen match_method_`i'_`k'="`j'"
  462.         compress
  463.         saveold balanced.m`i'-m`j'-m`k'.dta,replace
  464.        
  465.         **Step 80: Creat files for unmatched `i' firms and `k' firms**

  466.         disp "Step 80"
  467.         use m`i'-m`j'.10.dta,clear
  468.         merge 1:1 code using m`j'-m`k'.10.dta
  469.         drop _m code
  470.         drop if match_status_`i'_`j'=="3" & match_status_`j'_`k'=="3"
  471.         drop if id`i'==""
  472.         gen code = id`i'+string(revenue`i')+string(employment`i')+string(profit`i')+province`i'
  473.         sort code
  474.         compress
  475.         saveold unmatched`i'.15.dta,replace
  476.        
  477.         use unmatched`i'.15.dta,clear
  478.         keep *`i'
  479.         append using unmatched`i'.10.dta
  480.         compress
  481.         saveold unmatched`i'.20.dta,replace
  482.        
  483.        
  484.         use m`i'-m`j'.10.dta,clear
  485.         merge 1:1 code using m`j'-m`k'.10.dta
  486.         drop _m code
  487.         drop if match_status_`i'_`j'=="3" & match_status_`j'_`k'=="3"
  488.         drop if id`k'== ""
  489.         gen code = id`k'+string(revenue`k')+string(employment`k')+string(profit`k')+province`k'
  490.         sort code
  491.         compress
  492.         saveold unmatched`k'.15.dta,replace
  493.        
  494.         use unmatched`k'.15.dta,clear
  495.         keep *`k'
  496.         append using unmatched`k'.10.dta
  497.         compress
  498.         saveold unmatched`k'.20.dta,replace
  499.        
  500.        
  501.         use m`i'-m`j'.10.dta,clear
  502.         merge 1:1 code using m`j'-m`k'.10.dta
  503.         drop _m code
  504.         drop if match_status_`i'_`j'=="3" & match_status_`j'_`k'=="3"
  505.         gen code = id`j'+string(revenue`j')+string(employment`j')+string(profit`j')+province`j'
  506.         sort code
  507.         compress
  508.         saveold unmatched`j'.15.dta,replace
  509.        
  510.        
  511.        
  512.         **Step 90: Match `i' firms and `k' firms by firm ID and name**
  513.        
  514.        
  515.         *ID*
  516.         disp "Step 90"
  517.         use unmatched`i'.20.dta,clear
  518.         bysort id`i': keep if _N>1
  519.         compress
  520.         saveold duplicates_ID`i'.dta,replace
  521.        
  522.         use unmatched`i'.20.dta,clear
  523.         bysort id`i': drop if _N>1
  524.         rename id`i' id
  525.         keep *`i' id
  526.         sort id
  527.         compress
  528.         saveold match`i'.1.dta,replace
  529.        
  530.         use unmatched`k'.20.dta,clear
  531.         bysort id`k': keep if _N>1
  532.         compress
  533.         saveold duplicates_ID`k'.dta,replace
  534.                
  535.         use unmatched`k'.20.dta,clear
  536.         bysort id`k': drop if _N>1
  537.         rename id`k' id
  538.         keep *`k' id
  539.         sort id
  540.         compress
  541.         saveold match`k'.1.dta,replace
  542.        
  543.         use match`i'.1.dta,clear
  544.         merge 1:1 id using match`k'.1.dta
  545.         keep if _m==3
  546.         gen id`i'=id
  547.         rename id id`k'
  548.         drop _m
  549.         gen match_method_`i'_`k'="`j'"
  550.         gen match_status_`i'_`k'="3"
  551.         compress
  552.         saveold matched_by_ID`i'_`k'.dta,replace
  553.        
  554.         *name*
  555.        
  556.         use match`i'.1.dta, clear
  557.         merge 1:1 id using match`k'.1.dta
  558.         keep if _merge==1
  559.         rename id id`i'
  560.         append using duplicates_ID`i'.dta
  561.         bysort name`i': keep if _N>1
  562.         keep *`i'
  563.         compress
  564.         saveold duplicates_name`i'.dta, replace
  565.        
  566.         use match`i'.1.dta, clear
  567.         merge 1:1 id using match`k'.1.dta
  568.         keep if _merge==1
  569.         rename id id`i'
  570.         append using duplicates_ID`i'.dta
  571.         bysort name`i': drop if _N>1
  572.         rename name`i' name
  573.         sort name
  574.         keep name *`i'
  575.         compress
  576.         saveold unmatched_by_ID`i'.dta, replace

  577.         use match`i'.1.dta, clear
  578.         merge 1:1 id using match`k'.1.dta
  579.         keep if _merge==2
  580.         rename id id`k'
  581.         append using duplicates_ID`k'.dta
  582.         bysort name`k': keep if _N>1
  583.         keep *`k'
  584.         compress
  585.         saveold duplicates_name`k'.dta, replace

  586.         use match`i'.1.dta, clear
  587.         merge 1:1 id using match`k'.1.dta
  588.         keep if _merge==2
  589.         rename id id`k'
  590.         append using duplicates_ID`k'.dta
  591.         bysort name`k': drop if _N>1
  592.         rename name`k' name
  593.         sort name
  594.         keep name *`k'
  595.         compress
  596.         saveold unmatched_by_ID`k'.dta, replace
  597.        
  598.         use unmatched_by_ID`i'.dta, clear
  599.         merge 1:1 name using unmatched_by_ID`k'.dta
  600.         keep if _merge==3
  601.         gen name`i'=name
  602.         rename name name`k'
  603.         drop _merge
  604.         gen match_method_`i'_`k'="firm name"
  605.         gen match_status_`i'_`k'="3"
  606.         compress
  607.         saveold matched_by_name`i'_`k'.dta, replace
  608.        

  609.         use unmatched_by_ID`i'.dta, clear
  610.         merge 1:1 name using unmatched_by_ID`k'.dta
  611.         keep if _merge==1
  612.         rename name name`i'
  613.         keep *`i'
  614.         append using duplicates_name`i'.dta
  615.         gen match_method_`i'_`k'=""
  616.         gen match_status_`i'_`k'="1"
  617.         compress
  618.         saveold unmatched_by_ID_and_name_`i'.dta, replace       
  619.        
  620.        
  621.         use unmatched_by_ID`i'.dta, clear
  622.         merge 1:1 name using unmatched_by_ID`k'.dta
  623.         keep if _merge==2
  624.         rename name name`k'
  625.         keep *`k'
  626.         append using duplicates_name`k'.dta
  627.         gen match_method_`i'_`k'=""
  628.         gen match_status_`i'_`k'="2"
  629.         compress
  630.         saveold unmatched_by_ID_and_name_`k'.dta, replace
  631.        
  632.        

  633.         **step 100: merge the files**
  634.         disp "Step 100"
  635.         use matched_by_ID`i'_`k'.dta, clear
  636.         append using matched_by_name`i'_`k'.dta
  637.         append using unmatched_by_ID_and_name_`i'.dta
  638.         append using unmatched_by_ID_and_name_`k'.dta
  639.         compress
  640.         saveold m`i'-m`k'.dta, replace
  641.        
  642.         use m`i'-m`k'.dta, clear
  643.         gen code = id`i'+string(revenue`i')+string(employment`i')+string(profit`i')+province`i'
  644.         sort code
  645.         *drop if code == "..."
  646.         merge code using unmatched`i'.15.dta
  647.         drop code _merge
  648.         sort id`i'
  649.         compress
  650.         saveold m`i'-m`k'.05.dta, replace
  651.                
  652.         *deal with disagreement (_merge==5 if "update" is used)*

  653.         use m`i'-m`k'.05.dta, clear
  654.         gen code = id`k'+string(revenue`k')+string(employment`k')+string(profit`k')+province`k'
  655.         sort code
  656.        
  657.         merge code using unmatched`k'.15.dta, update
  658.         keep if _merge==5
  659.         drop *`k'
  660.         drop code _merge
  661.         sort id`i'
  662.         compress
  663.         compress
  664.         saveold m`i'-m`k'.disagree.dta, replace


  665.         use m`i'-m`k'.05.dta, clear
  666.         merge id`i' using m`i'-m`k'.disagree.dta
  667.         drop if _merge==3
  668.         drop _merge
  669.         append using m`i'-m`k'.disagree.dta
  670.        
  671.         gen code = id`k'+string(revenue`k')+string(employment`k')+string(profit`k')+province`k'
  672.         sort code
  673.         merge code using unmatched`k'.15.dta, update
  674.         drop code _merge
  675.         gen code = id`j'+string(revenue`j')+string(employment`j')+string(profit`j')+province`j'
  676.         sort code
  677.         merge code using unmatched`j'.15.dta, update
  678.         drop code _merge
  679.         compress
  680.         saveold m`i'-m`k'.dta.10.dta, replace
  681.        

  682.         use m`i'-m`k'.dta.10.dta, clear
  683.         append using balanced.m`i'-m`j'-m`k'.dta
  684.         drop match_status_`i'_`j'
  685.         drop match_status_`j'_`k'
  686.         drop match_status_`i'_`k'
  687.         drop match_method_`i'_`j'
  688.         drop match_method_`j'_`k'
  689.         drop match_method_`i'_`k'
  690.         gen match_status_`i'_`j'_`k'="`i'-`j'-`k'" if id`i'!=""&id`j'!=""&id`k'!=""
  691.         replace match_status_`i'_`j'_`k'="`i'-`j' only" if id`i'!=""&id`j'!=""&id`k'==""
  692.         replace match_status_`i'_`j'_`k'="`j'-`k' only" if id`i'==""&id`j'!=""&id`k'!=""
  693.         replace match_status_`i'_`j'_`k'="`i'-`k' only" if id`i'!=""&id`j'==""&id`k'!=""
  694.         replace match_status_`i'_`j'_`k'="`i' no match" if id`i'!=""&id`j'==""&id`k'==""
  695.         replace match_status_`i'_`j'_`k'="`j' no match" if id`i'==""&id`j'!=""&id`k'==""
  696.         replace match_status_`i'_`j'_`k'="`k' no match" if id`i'==""&id`j'==""&id`k'!=""
  697.         compress
  698.         saveold unbalanced.`i'-`j'-`k'.dta, replace
  699.                
  700. }       


  701. forval i = 1998(1)2005{
  702.           local j=`i'+1
  703.         local k=`i'+2

  704.         use unbalanced.`i'-`j'-`k'.dta, clear
  705.         tab match_status_`i'_`j'_`k'
  706. }



  707. *************************************
  708. *                                              *
  709. * STAGE 3: Create a Ten-Year Panel *
  710. *                                               *
  711. *************************************


  712. use unbalanced.1998-1999-2000.dta, clear
  713. tab match_status_1998_1999_2000
  714. gen code=id2000+string(revenue2000)+string(employment2000)+string(profit2000)+province2000
  715. sort code
  716. compress
  717. saveold test1.dta, replace


  718. **step 110: add 2001 from 1999-2000-2001**

  719. use unbalanced.1999-2000-2001.dta, clear
  720. tab match_status_1999_2000_2001
  721. keep if match_status_1999_2000_2001=="1999-2000-2001"|match_status_1999_2000_2001=="2000-2001 only"
  722. gen code=id2000+string(revenue2000)+string(employment2000)+string(profit2000)+province2000

  723. sort code
  724. compress
  725. saveold test2.dta, replace

  726. use test1.dta, clear
  727. merge code using test2.dta
  728. tab _merge
  729. drop _merge code
  730. gen code=id1999+string(revenue1999)+string(employment1999)+string(profit1999)+province1999
  731. sort code
  732. compress
  733. saveold test3.dta, replace

  734. use unbalanced.1999-2000-2001.dta, clear
  735. tab match_status_1999_2000_2001
  736. keep if match_status_1999_2000_2001=="1999-2001 only"
  737. gen code=id1999+string(revenue1999)+string(employment1999)+string(profit1999)+province1999
  738. sort code
  739. compress
  740. saveold test4.dta, replace


  741. use test3.dta, clear
  742. merge code using test4.dta, update
  743. tab _merge
  744. drop code _merge
  745. compress
  746. saveold test5.dta, replace


  747. use test3.dta, clear
  748. merge code using test4.dta, update replace
  749. tab _m
  750. keep if _merge==5
  751. keep *2001
  752. compress
  753. saveold test6.dta, replace


  754. use unbalanced.1999-2000-2001.dta, clear
  755. keep if match_status_1999_2000_2001=="2001 no match"
  756. display _N
  757. compress
  758. saveold test7.dta, replace

  759. use test5.dta, clear
  760. append using test6.dta
  761. dis _N
  762. append using test7.dta
  763. dis _N
  764. gen code=id2001+string(revenue2001)+string(employment2001)+string(profit2001)+province2001
  765. sort code
  766. compress
  767. saveold test1.dta, replace



  768. **step 120: add 2002 from 2000-2001-2002**

  769. use unbalanced.2000-2001-2002.dta, clear
  770. tab match_status_2000_2001_2002
  771. keep if match_status_2000_2001_2002=="2000-2001-2002"|match_status_2000_2001_2002=="2001-2002 only"
  772. gen code=id2001+string(revenue2001)+string(employment2001)+string(profit2001)+province2001
  773. sort code
  774. compress
  775. saveold test2.dta, replace

  776. use test1.dta, clear
  777. merge code using test2.dta
  778. tab _merge
  779. drop _merge code
  780. gen code=id2000+string(revenue2000)+string(employment2000)+string(profit2000)+province2000
  781. sort code
  782. compress
  783. saveold test3.dta, replace

  784. use unbalanced.2000-2001-2002.dta, clear
  785. tab match_status_2000_2001_2002
  786. keep if match_status_2000_2001_2002=="2000-2002 only"
  787. gen code=id2000+string(revenue2000)+string(employment2000)+string(profit2000)+province2000
  788. sort code
  789. compress
  790. saveold test4.dta, replace

  791. use test3.dta, clear
  792. merge code using test4.dta, update
  793. tab _merge
  794. drop code _merge
  795. compress
  796. saveold test5.dta, replace

  797. use test3.dta, clear
  798. merge code using test4.dta, update replace
  799. keep if _merge==5
  800. keep *2002      
  801. compress
  802. saveold test6.dta, replace

  803. use unbalanced.2000-2001-2002.dta, clear
  804. keep if match_status_2000_2001_2002=="2002 no match"
  805. display _N
  806. compress
  807. saveold test7.dta, replace

  808. use test5.dta, clear
  809. append using test6.dta
  810. dis _N
  811. append using test7.dta
  812. dis _N
  813. gen code=id2002+string(revenue2002)+string(employment2002)+string(profit2002)+province2002
  814. sort code
  815. compress
  816. saveold test1.dta, replace




  817. **step 130: add 2003 from 2001-2002-2003**

  818. use unbalanced.2001-2002-2003.dta, clear
  819. tab match_status_2001_2002_2003
  820. keep if match_status_2001_2002_2003=="2001-2002-2003"|match_status_2001_2002_2003=="2002-2003 only"
  821. gen code=id2002+string(revenue2002)+string(employment2002)+string(profit2002)+province2002
  822. sort code
  823. compress
  824. saveold test2.dta, replace

  825. use test1.dta, clear
  826. merge code using test2.dta
  827. tab _merge
  828. drop _merge code
  829. gen code=id2001+string(revenue2001)+string(employment2001)+string(profit2001)+province2001
  830. sort code
  831. compress
  832. saveold test3.dta, replace

  833. use unbalanced.2001-2002-2003.dta, clear
  834. tab match_status_2001_2002_2003
  835. keep if match_status_2001_2002_2003=="2001-2003 only"
  836. gen code=id2001+string(revenue2001)+string(employment2001)+string(profit2001)+province2001
  837. sort code
  838. compress
  839. saveold test4.dta, replace

  840. use test3.dta, clear
  841. merge code using test4.dta, update
  842. tab _merge
  843. drop code _merge
  844. compress
  845. saveold test5.dta, replace

  846. use test3.dta, clear
  847. merge code using test4.dta, update replace
  848. keep if _merge==5
  849. keep *2003
  850. compress
  851. saveold test6.dta, replace

  852. use unbalanced.2001-2002-2003.dta, clear
  853. keep if match_status_2001_2002_2003=="2003 no match"
  854. display _N
  855. compress
  856. saveold test7.dta, replace

  857. use test5.dta, clear
  858. append using test6.dta
  859. dis _N
  860. append using test7.dta
  861. dis _N
  862. gen code=id2003+string(revenue2003)+string(employment2003)+string(profit2003)+province2003
  863. sort code
  864. compress
  865. saveold test1.dta, replace


  866. **step 140: add 2004 from 2002-2003-2004 **

  867. use unbalanced.2002-2003-2004.dta, clear
  868. tab match_status_2002_2003_2004
  869. keep if match_status_2002_2003_2004=="2002-2003-2004"|match_status_2002_2003_2004=="2003-2004 only"
  870. gen code=id2003+string(revenue2003)+string(employment2003)+string(profit2003)+province2003
  871. sort code
  872. compress
  873. saveold test2.dta, replace

  874. use test1.dta, clear
  875. merge code using test2.dta
  876. tab _merge
  877. drop _merge code
  878. gen code=id2002+string(revenue2002)+string(employment2002)+string(profit2002)+province2002
  879. sort code
  880. compress
  881. saveold test3.dta, replace

  882. use unbalanced.2002-2003-2004.dta, clear
  883. tab match_status_2002_2003_2004
  884. keep if match_status_2002_2003_2004=="2002-2004 only"
  885. gen code=id2002+string(revenue2002)+string(employment2002)+string(profit2002)+province2002
  886. sort code
  887. compress
  888. saveold test4.dta, replace

  889. use test3.dta, clear
  890. merge code using test4.dta, update
  891. tab _merge
  892. drop code _merge
  893. compress
  894. saveold test5.dta, replace

  895. use test3.dta, clear
  896. merge code using test4.dta, update replace
  897. keep if _merge==5
  898. keep *2004
  899. compress
  900. saveold test6.dta, replace

  901. use unbalanced.2002-2003-2004.dta, clear
  902. keep if match_status_2002_2003_2004=="2004 no match"
  903. display _N
  904. compress
  905. saveold test7.dta, replace

  906. use test5.dta, clear
  907. append using test6.dta
  908. dis _N
  909. append using test7.dta
  910. dis _N
  911. gen code=id2004+string(revenue2004)+string(employment2004)+string(profit2004)+province2004
  912. sort code
  913. compress
  914. saveold test1.dta, replace





  915. **step 150: add 2005 from 2003-2004-2005 **

  916. use unbalanced.2003-2004-2005.dta, clear
  917. tab match_status_2003_2004_2005
  918. keep if match_status_2003_2004_2005=="2003-2004-2005"|match_status_2003_2004_2005=="2004-2005 only"
  919. gen code=id2004+string(revenue2004)+string(employment2004)+string(profit2004)+province2004
  920. sort code
  921. compress
  922. saveold test2.dta, replace

  923. use test1.dta, clear
  924. merge code using test2.dta
  925. tab _merge
  926. drop _merge code
  927. gen code=id2003+string(revenue2003)+string(employment2003)+string(profit2003)+province2003
  928. sort code
  929. compress
  930. saveold test3.dta, replace

  931. use unbalanced.2003-2004-2005.dta, clear
  932. tab match_status_2003_2004_2005
  933. keep if match_status_2003_2004_2005=="2003-2005 only"
  934. gen code=id2003+string(revenue2003)+string(employment2003)+string(profit2003)+province2003
  935. sort code
  936. compress
  937. saveold test4.dta, replace

  938. use test3.dta, clear
  939. merge code using test4.dta, update
  940. tab _merge
  941. drop code _merge
  942. compress
  943. saveold test5.dta, replace

  944. use test3.dta, clear
  945. merge code using test4.dta, update replace
  946. keep if _merge==5
  947. keep *2005
  948. compress
  949. saveold test6.dta, replace

  950. use unbalanced.2003-2004-2005.dta, clear
  951. keep if match_status_2003_2004_2005=="2005 no match"
  952. display _N
  953. compress
  954. saveold test7.dta, replace

  955. use test5.dta, clear
  956. append using test6.dta
  957. dis _N
  958. append using test7.dta
  959. dis _N
  960. gen code=id2005+string(revenue2005)+string(employment2005)+string(profit2005)+province2005
  961. sort code
  962. compress
  963. saveold test1.dta, replace



  964. **step 160: add 2006 from 2004-2005-2006 **


  965. use unbalanced.2004-2005-2006.dta, clear
  966. tab match_status_2004_2005_2006
  967. keep if match_status_2004_2005_2006=="2004-2005-2006"|match_status_2004_2005_2006=="2005-2006 only"
  968. gen code=id2005+string(revenue2005)+string(employment2005)+string(profit2005)+province2005
  969. sort code
  970. compress
  971. saveold test2.dta, replace


  972. use test1.dta, clear
  973. merge code using test2.dta
  974. tab _merge
  975. drop _merge code
  976. gen code=id2004+string(revenue2004)+string(employment2004)+string(profit2004)+province2004
  977. sort code
  978. compress
  979. saveold test3.dta, replace

  980. use unbalanced.2004-2005-2006.dta, clear
  981. tab match_status_2004_2005_2006
  982. keep if match_status_2004_2005_2006=="2004-2006 only"
  983. gen code=id2004+string(revenue2004)+string(employment2004)+string(profit2004)+province2004
  984. sort code
  985. compress
  986. saveold test4.dta, replace

  987. use test3.dta, clear
  988. merge code using test4.dta, update
  989. tab _merge
  990. drop code _merge
  991. compress
  992. saveold test5.dta, replace

  993. use test3.dta, clear
  994. merge code using test4.dta, update replace
  995. keep if _merge==5
  996. keep *2006
  997. compress
  998. saveold test6.dta, replace

  999. use unbalanced.2004-2005-2006.dta, clear
  1000. keep if match_status_2004_2005_2006=="2006 no match"
  1001. display _N
  1002. compress
  1003. saveold test7.dta, replace

  1004. use test5.dta, clear
  1005. append using test6.dta
  1006. dis _N
  1007. append using test7.dta
  1008. dis _N
  1009. gen code=id2006+string(revenue2006)+string(employment2006)+string(profit2006)+province2006
  1010. sort code
  1011. compress
  1012. saveold test1.dta, replace


  1013. **step 170: add 2007 from 2005-2006-2007 **


  1014. use unbalanced.2005-2006-2007.dta, clear
  1015. tab match_status_2005_2006_2007
  1016. keep if match_status_2005_2006_2007=="2005-2006-2007"|match_status_2005_2006_2007=="2006-2007 only"
  1017. gen code=id2006+string(revenue2006)+string(employment2006)+string(profit2006)+province2006
  1018. sort code
  1019. compress
  1020. saveold test2.dta, replace


  1021. use test1.dta, clear
  1022. merge code using test2.dta
  1023. tab _merge
  1024. drop _merge code
  1025. gen code=id2005+string(revenue2005)+string(employment2005)+string(profit2005)+province2005
  1026. sort code
  1027. compress
  1028. saveold test3.dta, replace

  1029. use unbalanced.2005-2006-2007.dta, clear
  1030. tab match_status_2005_2006_2007
  1031. keep if match_status_2005_2006_2007=="2005-2007 only"
  1032. gen code=id2005+string(revenue2005)+string(employment2005)+string(profit2005)+province2005
  1033. sort code
  1034. compress
  1035. saveold test4.dta, replace

  1036. use test3.dta, clear
  1037. merge code using test4.dta, update
  1038. tab _merge
  1039. drop code _merge
  1040. saveold test5.dta, replace

  1041. use test3.dta, clear
  1042. merge code using test4.dta, update replace
  1043. keep if _merge==5
  1044. keep *2007
  1045. compress
  1046. saveold test6.dta, replace

  1047. use unbalanced.2005-2006-2007.dta, clear
  1048. keep if match_status_2005_2006_2007=="2007 no match"
  1049. display _N
  1050. compress
  1051. saveold test7.dta, replace

  1052. use test5.dta, clear
  1053. append using test6.dta
  1054. append using test7.dta

  1055. drop match_status*
  1056. compress
  1057. saveold unbalanced.1998--2007.dta, replace




  1058. /*
  1059. /*erase files*/
  1060. local file_list: dir . files "*.dta"
  1061. foreach file of local file_list{
  1062.         if "`file'" == "unbalanced.1998--2007.dta"{
  1063.                 continue
  1064.         }
  1065.         disp "erase `file'"
  1066.         erase "`file'"
  1067. }
  1068. */
  1069. *use "unbalanced.1998--2007.dta",clear
  1070. keep id_in_source*
  1071. gen id_in_panel=_n
  1072. reshape long id_in_source, i(id_in_panel) j(year)
  1073. drop if id_in_source == .
  1074. sort id_in_panel year
  1075. saveold "PanelID_1998-2007.dta",replace
复制代码


回复 支持 反对

使用道具 举报

1

主题

2028

金钱

5243

积分

高级用户

 楼主| 发表于 2017-9-29 11:26:24 | 显示全部楼层
lym6w0 发表于 2017-9-12 09:08
一段工业企业数据库的匹配样本代码,亲测可用,你可以参考一下

好感谢   
回复 支持 反对

使用道具 举报

11

主题

39

金钱

175

积分

入门用户

发表于 2018-3-19 08:58:05 | 显示全部楼层
关于第二问,因为有重复,所以用merge会产生错误提示。可以在step70中的sort code下加入以下这行,去重。         duplicates drop code, force
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

站长推荐上一条 /1 下一条

客服中心
关闭
在线时间:
周一~周五
8:30-17:30
QQ群:
653541906
联系电话:
010-85786021-8017
在线咨询
客服中心

意见反馈|网站地图|手机版|小黑屋|EPS数据狗论坛 ( 京ICP备09019565号-3 )   

Powered by BFIT! X3.4

© 2008-2028 BFIT Inc.

快速回复 返回顶部 返回列表