`
footman265
  • 浏览: 114650 次
  • 性别: Icon_minigender_1
  • 来自: 宁波
社区版块
存档分类
最新评论

Lucene日志建立

阅读更多

   lucene 可以自己建立操作日志,刚在源码中发现,给个我刚建的日志文件:

 

IFD [Wed Dec 22 22:08:20 CST 2010; main]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@15dfd77
IW 0 [Wed Dec 22 22:08:20 CST 2010; main]: setInfoStream: dir=org.apache.lucene.store.SimpleFSDirectory@G:\package\lucene_test_dir lockFactory=org.apache.lucene.store.NativeFSLockFactory@1027b4d mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@c55e36 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@1ac3c08 ramBufferSizeMB=16.0 maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
maxFieldLength 10000 reached for field contents, ignoring following tokens
IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: optimize: index now 
IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: flush: now pause all indexing threads
IW 0 [Wed Dec 22 22:08:23 CST 2010; main]:   flush: segment=_0 docStoreSegment=_0 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=104 numBufDelTerms=0
IW 0 [Wed Dec 22 22:08:23 CST 2010; main]:   index before flush 
IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: DW: flush postings as segment _0 numDocs=104
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: DW:   oldRAMSize=2619392 newFlushedSize=1740286 docs/MB=62.663 new/old=66.439%
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushedFiles=[_0.nrm, _0.tis, _0.fnm, _0.tii, _0.frq, _0.prx]
IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_1" [1 segments ; isCommit = false]
IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_1" [1 segments ; isCommit = false]
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: LMP: findMerges: 1 segments
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: LMP:   level 6.2247195 to 6.2380013: 1 segments
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS:   index: _0:C104->_0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS:   no more merges pending; now return
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS:   index: _0:C104->_0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS:   no more merges pending; now return
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now flush at close
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush: now pause all indexing threads
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]:   flush: segment=null docStoreSegment=_0 docStoreOffset=104 flushDocs=false flushDeletes=true flushDocStores=true numDocs=0 numBufDelTerms=0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]:   index before flush _0:C104->_0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]:   flush shared docStore segment _0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushDocStores segment=_0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: closeDocStores segment=_0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: DW: closeDocStore: 2 files to flush to segment _0 numDocs=104
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushDocStores files=[_0.fdt, _0.fdx]
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS:   index: _0:C104->_0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS:   no more merges pending; now return
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now call final commit()
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: startCommit(): start sizeInBytes=0
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: startCommit index=_0:C104->_0 changeCount=3
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.nrm
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.tis
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fnm
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.tii
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.frq
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fdx
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.prx
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fdt
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: done all syncs
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: pendingCommit != null
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: wrote segments file "segments_2"
IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_2" [1 segments ; isCommit = true]
IFD [Wed Dec 22 22:08:24 CST 2010; main]: deleteCommits: now decRef commit "segments_1"
IFD [Wed Dec 22 22:08:24 CST 2010; main]: delete "segments_1"
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: done
IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: at close: _0:C104->_0

 接下来是我的建立索引的类,代码大多借鉴lucene自带的demo

 

indexer类用来建立索引:

 

package my.firstest.copy;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintStream;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Indexer {
	private static File INDEX_DIR = new File("G:/package/lucene_test_dir");
	private static final File docDir = new File("G:/package/lucene_test_docs");

	public static void main(String[] args) throws Exception {
		if (!docDir.exists() || !docDir.canRead()) {
			System.out.println("索引的文件不存在!");
			System.exit(1);
		}
		int fileCount=INDEX_DIR.list().length;
		if(fileCount!=0){
			System.out.println("The old files is existed, begin to delete these files");
			File[] files=INDEX_DIR.listFiles();
			for(int i=0;i<fileCount;i++){
				files[i].delete();
				System.out.println("File "+files[i].getAbsolutePath()+"is deleted!");
			}
		}
		Date start = new Date();
		IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR),
				new StandardAnalyzer(Version.LUCENE_CURRENT), true,
				IndexWriter.MaxFieldLength.LIMITED);
		writer.setUseCompoundFile(false);
		//writer.setMergeFactor(2);
		writer.setInfoStream(new PrintStream(new File("G:/package/lucene_test_log/log.txt")));
	    System.out.println("MergeFactor -> "+writer.getMergeFactor());
	    System.out.println("maxMergeDocs -> "+writer.getMergeFactor());
		indexDocs(writer, docDir);
		writer.optimize();
		writer.close();
		Date end = new Date();
		System.out.println("takes "+(end.getTime() - start.getTime())
				+ "milliseconds");
	}

	protected static void indexDocs(IndexWriter writer, File file)
			throws IOException {
		if (file.canRead()) {
			if (file.isDirectory()) {
				String[] files = file.list();
				if (files != null) {
					for (int i = 0; i < files.length; i++) {
						indexDocs(writer, new File(file, files[i]));
					}
				}
			} else {
				System.out.println("adding " + file);
				try {
					writer.addDocument(FileDocument.Document(file));
				} catch (FileNotFoundException fnfe) {
					;
				}
			}
		}
	}

}

 

 FileDocument:

 

package my.firstest.copy;

import java.io.File;
import java.io.FileReader;
import org.apache.lucene.document.DateTools;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

public class FileDocument {

	public static Document Document(File f)
			throws java.io.FileNotFoundException {

		Document doc = new Document();
		doc.add(new Field("path", f.getPath(), Field.Store.YES,
				Field.Index.NOT_ANALYZED));
		doc.add(new Field("modified", DateTools.timeToString(f.lastModified(),
				DateTools.Resolution.MINUTE), Field.Store.YES,
				Field.Index.NOT_ANALYZED));
		doc.add(new Field("contents", new FileReader(f)));
		return doc;
	}
	private FileDocument() {
	}
}
 

 

关键就是writer.setInfoStream(new PrintStream(new File("G:/package/lucene_test_log/log.txt")));

在lucene的代码里,很多地方多充斥着类似:

 

 

 if (infoStream != null) {
          message("init: hit exception on init; releasing write lock");
 }

 

 者个message方法时:

 

public void message(String message) {
    if (infoStream != null)
      infoStream.println("IW " + messageID + " [" + new Date() + "; " + Thread.currentThread().getName() + "]: " + message);
  }

 

 这里的infoStream是IndexWriter的一个属性:

 

private PrintStream infoStream = null;

 

 这个属性不去设置它是为null的

可以用writer.setInfoStream(PrintStream infoStream);这个方法去设置它

设置了以后日志信息就会自动写入到自己设的文件中去了.

 

10
5
分享到:
评论
1 楼 liuxinglanyue 2010-12-22  
来踩踩。。。。

相关推荐

    Rackspace的日志处理

    相反,我们使用Hadoop来做大量的日志处理工作,而其结果被Lucene索引之后用来支持客服的查询需求。  日志  数量级最大的两种日志格式是由Postfix邮件发送代理和Microsoft Exchange Server产生的。所有通过我们系统...

    docker-compose_Docker_

     也就是将logstach收集上来的日志储存,建立索引(便于查找),搜索(提供web展示)l:logstash 收集日志 数据源:各种log,文本,session,silk,snmpk:kibana 数据展示,web页面,可视化 可以完成批量分析 ...

    docker-logs-cookbook:使用 logspout 和 ELK 堆栈的 Docker 日志记录手册

    查看ELK 栈ELK代表 (建立在之上的分布式搜索引擎)、 (管理事件和日志的工具)和 (基于浏览器的分析和搜索仪表板)。 该堆栈是最先进的日志管理解决方案。 它允许您将日志发送到单个端点,但由于 ElasticSearch ...

    大型分布式网站架构与实践

     如何通过Hadoop进行离线数据分析,通过Hive建立数据仓库。  如何将关系型数据库中存储的数据导入HDFS,以及从HDFS中将数据导入关系型数据库。  如何将分析好的数据通过图形展示给用户。  5.1 日志收集 339  ...

    Hadoop基础培训教程.pdf

    2003年Google发表论文介绍GFS文件系统,2004年HDFS(NDFS)项目建立 2004年Google发表论文介绍MapReduce算法,2005年Nutch中实现了 MapReduce算法 2006年2月Hadoop独立成Lucene的一个子项目,与此同时,Hadoop创 始人...

    自己动手写搜索引擎(罗刚著).doc

    3.2.1 建立数据视图 36 3.2.2 JDBC数据库连接 36 3.2.3 增量抓取 40 3.3 抓取本地硬盘上的文件 41 3.3.1 目录遍历 41 3.4 本章小结 42 第4章 提取文档中的文本内容 43 4.1 从HTML文件中提取文本 43 4.1.1 HtmlParser...

    gerrit-heroku:尝试在Heroku上运行Gerrit

    这是试图与Heroku建立Gerrit的尝试。 不幸的是我无法运行它。 我在Heroku日志中看到的错误是这样的: 2015-07-26T23:39:44.058402+00:00 app[web.1]: 1) No index versions ready; run Reindex2015-07-26T23:39:44....

    中文分词工具word-1.0,Java实现的中文分词组件多种基于词典的分词算法

    3、利用word分析器建立Lucene索引 Directory directory = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, analyzer); IndexWriter indexWriter = new IndexWriter...

    网络爬虫调研报告.doc

    Web Crawler),它们实现的原理基本一致:深度遍历网站的资源,将这些资源抓取到本地, 使用的方法都是分析网站每一个有效的URI,并提交Http请求,从而获得相应结果,生成 本地文件及相应的日志信息等。 Heritrix 是...

    网络爬虫调研报告(1).doc

    Web Crawler),它们实现的原理基本一致:深度遍历网站的资源,将这些资源抓取到本地, 使用的方法都是分析网站每一个有效的URI,并提交Http请求,从而获得相应结果,生成 本地文件及相应的日志信息等。 Heritrix 是...

    网络爬虫调研报告(2).doc

    Web Crawler),它们实现的原理基本一致:深度遍历网站的资源,将这些资源抓取到本地, 使用的方法都是分析网站每一个有效的URI,并提交Http请求,从而获得相应结果,生成 本地文件及相应的日志信息等。 Heritrix 是...

    ZendFramework中文文档

    1. Introduction to Zend Framework 1.1. 概述 1.2. 安装 2. Zend_Acl 2.1. 简介 2.1.1. 关于资源(Resource) 2.1.2. 关于角色(Role) 2.1.3. 创建访问控制列表(ACL) ...2.1.5. 定义访问控制 ...

Global site tag (gtag.js) - Google Analytics