版本: DRUPAL 7
要采集的采集的页面:http://drupalgarden.cn/forum/1202.html
代码:~
<?php
function create_node($title,$uid,$body,$type){
 
 $node->is_new=1;
  $node->title = $title;
  $node->uid =$uid;
$node->type=$type;
  $node->body['und'][0]['value']= $body;
 node_save($node);
 print $node->nid;
}
 
function _get_contents($url,$img = FALSE){
   $dir = pathinfo($url);
 $host = $dir['dirname'];
  
	  $refer = $host.'/'; 
  
      
       
    $ch = curl_init();
    $user_agent = "Baiduspider+(+http://www.baidu.com/search/spider.htm)"; //伪装百度蜘蛛
    
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_REFERER,  $refer);
    curl_setopt ($ch, CURLOPT_TIMEOUT, 800);
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
    
    $file_contents = curl_exec($ch);
     if($img == TRUE){
    $content_length = curl_getinfo($ch,CURLINFO_CONTENT_LENGTH_DOWNLOAD);
    $size = round($content_length / 1024, 2);
     return $size;
     }else{
    curl_close($ch);
    return $file_contents;
   }
   
}
$url="http://drupalgarden.cn/forum/1202.html";
$content=_get_contents($url,$img = FALSE);
preg_match_all('/div class="forum-post-content">(.*?)<\/div>/is',$content, $a);
preg_match_all('/(.*?)<\/h2>/is',$content, $t);
$body=$a[1][0];
$title=$t[1][0];
$type='article';
$uid=1;
 create_node($title,$uid,$body,$type);
?>
你可以继续根据nid 弄个foreach 或者FOR 循环,你可以把龙马的论坛在5分钟内全部采集过来,龙马可别揍我:)
标签
          
      是只有nid自增,其他毫无反应
没有创建新node怎么回事!
成功了·
改了几个代码才成功·~循环怎么做·
        
    
如何自增NID没任何效果·
每天创建新node怎么回事·