请教怎么快速，准确的读取和分析超大日志文件（超过100M）？-C#教程-爱易网页

请教怎么快速，准确的读取和分析超大日志文件（超过100M）？

日期：2014-05-18　浏览次数：21512 次

请问如何快速，准确的读取和分析超大日志文件（超过100M）？在线等~~~
现在要做一个日志分析工具，日志文件格式我还没拿到手，但据说大小可能超过100M，
假如格式是这样的

2007-11-01 18:20:42,983 [4520] INFO GetXXX() SERVICE START
2007-11-01 18:21:42,983 [4520] WARING 某某错误将要发生
2007-11-01 18:22:42,983 [4520] ERROR 某某错误发生
2007-11-01 18:23:59,968 [4520] INFO 程序结束

假如要统计某一天发生了多少错误，发生了多少次警告等等。

我现在的思路是用两个线程分别读文件和分析文件，
共同操作一块10行的Buffer快，每次读10行，分析10行，
然后用生产者消费者的多线程模式同步读取和分析操作，

现在的问题是感觉这样做还不是很高效，
1. 能不能同时开多个线程分块读取日志文件，比如开五个线程，把文件分成五块来读取和分析，
如果可以，应该如何同步，多线程分块读取文件应该如何实现？
2.文件I/O操作我不是很懂，请问读文件是用BinaryReader快还是BufferedReader快，
要构造成我的数据结构又是哪个方便呢？

暂时我只能想到这些增快分析速度了，请问大家还有什么更好的想法和提议吗？如果有现成的例子那就更好了：）

------解决方案--------------------
用正则啊，比如一个Regex.Macths("2007-11-01")

就可以知道2007-11-01有多少错误，什么错误了
------解决方案--------------------
对于大文件的检索要避免频繁的读取文件，访问磁盘最消耗时间
这样就需分块载入到内存中再检索
另外日期时间是顺序的，可以使用对分查询进一步提高速度
------解决方案--------------------
这是以前写得一段代码，供参考

C# code

//检索超大日志
//样本
//<166>Mar 31 2007 23:38:50: %PIX-6-302013: Built outbound TCP connection 731528465 for outside:62.241.53.2/443 (62.241.53.2/443) to inside:10.65.160.105/2918 (61.167.117.238/35049)
//
//<167>Mar 31 2007 23:38:50: %PIX-7-710005: UDP request discarded from 10.65.156.20/137 to inside:10.65.255.255/netbios-ns
//
string vFileName = @"C:\temp\sunday2007-04-01.log"; //检索文件名
DateTime vDateTime = DateTime.Parse("Apr 01 2007 01:09:25"); //检索日期
byte[] vBuffer = new byte[0x1000]; //缓冲区
int vReadLength; //读取长度
long vCurrPostion; //当前检索位置
long vBeginPostion; //检索范围开始点
long vEndPostion; //检索范围结束点

FileStream vFileStream = new FileStream(vFileName, FileMode.Open, FileAccess.Read);
vBeginPostion = 0;
vEndPostion = vFileStream.Length;
while (true)
{
    vCurrPostion = vBeginPostion + (vEndPostion - vBeginPostion) / 2; //从新计算检索位置
    vFileStream.Seek(vCurrPostion, SeekOrigin.Begin);

    vReadLength = vFileStream.Read(vBuffer, 0, vBuffer.Length);
    string vText = Encoding.ASCII.GetString(vBuffer, 0, vReadLength);
    Match vMatch = Regex.Match(vText, 
        @"(\r\n)?<\d+>(?<datetime>\w+ \d+ \d+ \d+:\d+:\d+):");
    if (!vMatch.Success) break; //没有找到日期
    DateTime vTempTime = DateTime.Parse(vMatch.Result("${datetime}"));
    if (vTempTime == vDateTime)
    {
        vBeginPostion = vCurrPostion;
        vEndPostion = vCurrPostion;
    }
    else if (vDateTime > vTempTime)
    {
        vBeginPostion = vCurrPostion; //如果该位置的日期小，就向后检索
    }
    else
    {
        vEndPostion = vCurrPostion; //如果该位置的日期大，就向前检索
    }
    if (vEndPostion - vBeginPostion < 0x1000) break;
}

vCurrPostion = Math.Min(vBeginPostion, vEndPostion); //大概位置已经找到
//向前检索
string vTemp = string.Empty; // 连接处的字符串
vBeginPostion = Math.Max(vCurrPostion - 0x1000, 0);
vEndPostion = vBeginPostion + 0x1000;
while (true)
{
    bool vLoop = false; //是否继续循环
    vFileStream.Seek(vBeginPostion, SeekOrigin.Begin);
    vReadLength = vFileStream.Read(vBuffer, 0, vBuffer.Length);
    string vText = Encoding.ASCII.GetString(vBuffer, 0, vReadLength) + vTemp;
    MatchCollection vMatches = Regex.Matches(vText,
        @"(\r\n)?<\d+>(?<datetime>\w+ \d+ \d+ \d+:\d+:\d+):[^\r\n]+\r\n");
    if (vMatches.Count <= 0) break;
    for (int i = 0; i < vMatches.Count; i++)
    {
        DateTime vTempTime = DateTime.Parse(vMatches[i].Result("${datetime}"));
        if (vTempTime == vDateTime)
        {       
            if (i == 0 && vBeginPostion > 0)
            {
                // 需要继续向前检索
                if (vBeginPostion - 0x1000 >= 0)
                {
                    vTemp = vText.Substring(0, 180);
                    vBeginPostion = vBeginPostion - 0x1000;
                    vLoop = true;
                }

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

请教怎么快速，准确的读取和分析超大日志文件（超过100M）？

相关资料更多>

推荐阅读更多>