[펌글] StringList에 있는 동일 문자열 갯수 세기 또는 StringList 검색 속도 증가
절름발이 프로그래머/Delphi / 2009. 7. 13. 01:41
http://www.matsgefvert.se/blog/archives/651
에서 가져온 글입니다.
StringList상에 있는 동일한 문자열 개수 세기 샘플 코드와
StringList의 검색속도를 증가시키는 방법이 나와있습니다.
----------------------------------------------------------------------------------
Speeding Up Delphi’s TStringList.IndexOf
1 June 2009, 15:33 ? Music, Software Development
Delphi’s TStringList is one of the objects I love the most. If it’s sorted (StringList.Sorted := true) then you can use it to parse huge chunks of data quickly.
For instance, looping through an enormous amount of IP addresses and keeping count of how many times they appeared, is easily done using the following code (not compiled or checked for syntax errors):
ls := TStringList.Create;
ls.Sorted := true;
for ip in ipAddresses do begin
n := ls.IndexOf(ip);
if n = -1 then
ls.AddObject(ip, TObject(1))
else
ls.Objects[n] := TObject(Integer(ls.Objects[n]) + 1);
end;
It’s very efficient. Since TStringList.IndexOf always does a binary search, it operates in log2(n) time, and using Objects as integers allows us to keep track of count without messing with the string data.
But there are things we can do to speed it up. For instance, TStringList.IndexOf relies on TStringList.Find, which itself uses AnsiCompareStr, which is a slow Windows call, taking locale and its mother into consideration. Overriding this with our own method should be worthwhile. (The code below is adapted straight from the Classes unit.)
type
TStringListEx = class(TStringList)
public
function Find(const S: string; var Index: Integer): Boolean; override;
end;
function TStringListEx.Find(const S: string; var Index: Integer): Boolean;
var
L, H, I, C: Integer;
begin
Result := False;
L := 0;
H := Count - 1;
while L <= H do
begin
I := (L + H) shr 1;
C := CompareStr(Get(I), S);
if C < 0 then L := I + 1 else
begin
H := I - 1;
if C = 0 then
begin
Result := True;
if Duplicates <> dupAccept then L := I;
end;
end;
end;
Index := L;
end;
We’ve replaced AnsiCompareStr with Delphi’s own CompareStr, which is a highly optimized FastCode routine. There are some drawbacks ? things will always be sorted in byte order and no case-sensitivity is done. But we don’t care about this ? it can always be done afterwards; right now, speed is the main importance.
And it turns out that using the above code, in pure examples, can slash execution time with up to about 80%. Dramatic savings, indeed. In my own example, where I analyze ftp log data, I managed to cut execution time on 122 MB of data down from 7 seconds down to 3.1 seconds.
Best of all, since TStringList.Find is declared virtual, we don’t need to change any types anywhere, just do a TStringListEx.Create instead of a TStringList.Create and you’re good to go.