Ben bir süre oturdu ve sadece ilgi uğruna, tam tüylenmek FSM ayrıştırıcı yazdı.
Bu regex ile görmek olası değil bir kaç özellikleri vardır (en azından PHP altında, Perl özyinelemeli regex ile yapabilirdi, ama PHP, henüz bu özelliği yok).
- Akıllı ve yığın tabanlı dirsek ayrıştırma
- AnyBracket Destek
- Modüler
- Genişletilebilir.
- Sözdizimi yanlış olduğunda, nerede biraz söyleyebilirim.
Verilen orada bir kod butload burada ve bunun bir sürü biraz bizzare ve yeni kodlayıcılar da kıvrık olduğunu, ancak, onun oldukça harika şeyler ne açısından.
Onun değil bir bitmiş ürün, sadece Someting ben birlikte attı, ancak çalışıyor ve ben bulabilirim herhangi bir hata yok.
Ben Normalde İstisnalar ve etajer kullanmak daha iyi olurdu yerlerde bir sürü 'die' var, yani bir temizleme ve Refactor sadece yayıyoruz önce tercih olacaktır.
Bu yorumlama makul bir miktar var, ama sonlu-durum-işleme anlamak zor olurdu ayrıca asıl mesele comment olsaydı ben hissediyorum.
# Pretty Colour Debug of the tokeniser in action.
# Uncomment to use.
function debug( $title, $stream, $msg, $remaining ){
# print chr(27) ."[31m$title" . chr(27) ."[0m\n";
# print chr(27) ."[33min:$stream" . chr(27) ."[0m\n";
# print chr(27) ."[32m$msg" . chr(27) ."[0m\n";
# print chr(27) ."[34mstream:$remaining" . chr(27) ."[0m\n\n";
}
# Simple utility to store a captured part of the stream in one place
# and the remainder somewhere else
# Wraps most the regexy stuff
# Insprired by some Perl Regex Parser I found.
function get_token( $regex, $input ){
$out = array(
'success' => false,
'match' => '',
'rest' => ''
);
if( !preg_match( '/^' . $regex . '/' , $input, $matches ) ){
die("Could not match $regex at start of $input ");
#return $out; # error condition, not matched.
}
$out['match'] = $matches[1];
$out['rest'] = substr( $input, strlen( $out['match'] ) );
$out['success'] = true;
debug( 'Scan For Token: '. $regex , $input, "matched: " . $out['match'] , $out['rest'] );
return $out;
}
function skip_space( $input ){
return get_token('(\s*)', $input );
}
# Given $input and $opener, find
# the data stream that occurs until the respecive closer.
# All nested bracket sets must be well balanced.
# No 'escape code' implementation has been done (yet)
# Match will contain the contents,
# Rest will contain unprocessed part of the string
# []{}() and bracket types are currently supported.
function close_bracket( $input , $opener ){
$out = array(
'success' => false,
'match' => '',
'rest' => ''
);
$map = array( '(' => ')', '[' => ']', '{' => '}', chr(60) => '>' );
$nests = array( $map[$opener] );
while( strlen($input) > 0 ){
$d = get_token( '([^()\[\]{}' . chr(60). '>]*?[()\[\]{}' . chr(60) . '>])', $input );
$input = $d['rest'];
if( !$d['success'] ){
debug( 'Scan For ) Bailing ' , $input, "depth: $nests, matched: " . $out['match'] , $out['rest'] );
$out['match'] .= $d['match'];
return $out; # error condition, not matched. brackets are imbalanced.
}
# Work out which of the 4 bracket types we got, and
# Which orientation it is, and then decide if were going up the tree or down it
end($nests);
$tail = substr( $d['match'], -1, 1 );
if( $tail == current($nests) ){
array_pop( $nests );
} elseif ( array_key_exists( $tail, $map ) ){
array_push( $nests, $map[$tail] );
} else {
die ("Error. Bad bracket Matching, unclosed/unbalanced/unmatching bracket sequence: " . $out['match'] . $d['match'] );
}
$out['match'] .= $d['match'] ;
$out['rest' ] = $d['rest'];
debug( 'Scan For ) running' , $input, "depth: $nests, matched: " . $out['match'] , $out['rest'] );
if ( count($nests) == 0 ){
# Chomp off the tail bracket to just get the body
$out['match'] = substr( $out['match'] , 0 , -1 );
$out['success'] = true;
debug( 'Scan For ) returning ' , $input, "matched: " . $out['match'] , $out['rest'] );
return $out;
}
else {
}
}
die('Scan for closing ) exhausted buffer while searching. Brackets Missmatched. Fix this: \'' . $out['match'] . '\'');
}
# Given $function_name and $input, expects the form fnname(data)
# 'data' can be any well balanced bracket sequence
# also, brackets used for functions in the stream can be any of your choice,
# as long as you're consistent. fnname[foo] will work.
function parse_function_body( $input, $function_name ){
$out = array (
'success' => false,
'match' => '',
'rest' => '',
);
debug( 'Parsing ' . $function_name . "()", $input, "" , "" );
$d = get_token( "(" . $function_name . '[({\[' . chr(60) . '])' , $input );
if ( !$d['success'] ){
die("Doom while parsing for function $function_name. Not Where its expected.");
}
$e = close_bracket( $d['rest'] , substr($d['match'],-1,1) );
if ( !$e['success'] ){
die("Found Imbalanced Brackets while parsing for $function_name, last snapshot was '" . $e['match'] . "'");
return $out; # inbalanced brackets for function
}
$out['success'] = true;
$out['match'] = $e['match'];
$out['rest'] = $e['rest'];
debug( 'Finished Parsing ' . $function_name . "()", $input, 'body:'. $out['match'] , $out['rest'] );
return $out;
}
function parse_query( $input ){
$eat = skip_space( $input );
$get = parse_function_body( $eat['rest'] , 'get' );
if ( !$get['success'] ){
die("Get Token Malformed/Missing, instead found '" . $eat['rest'] . "'");
}
$eat = skip_space( $get['rest'] );
$where = parse_function_body( $eat['rest'], 'where' );
if ( !$where['success'] ){
die("Where Token Malformed/Missing, instead found '" . $eat['rest'] . "'");
}
$eat = skip_space( $where['rest'] );
$sort = parse_function_body( $eat['rest'], 'sort' );
if( !$sort['success'] ){
die("Sort Token Malformed/Missing, instead found '" . $eat['rest'] . "'");
}
return array(
'get' => $get['match'],
'where' => $where['match'],
'sort' => $sort['match'],
'_Trailing_Data' => $sort['rest'],
);
}
$structure = parse_query("get[max(fieldname1),min(fieldname2),fieldname3]where(something=something) sort(fieldname2 asc)");
print_r($structure);
$structure = parse_query("get(max(fieldname1),min(fieldname2),fieldname3)where(something=something) sort(fieldname2 asc)");
print_r($structure);
$structure = parse_query("get{max(fieldname1),min(fieldname2),fieldname3}where(something=something) sort(fieldname2 asc)");
print_r($structure);
$structure = parse_query("get" . chr(60) . "max(fieldname1),min(fieldname2),fieldname3" . chr(60). "where(something=something) sort(fieldname2 asc)");
print_r($structure);
All of the above print_r($structure) lines should produce this:
Array
(
[get] => max(fieldname1),min(fieldname2),fieldname3
[where] => something=something
[sort] => fieldname2 asc
[_Trailing_Data] =>
)