Discussion:
[Haskell-cafe] parsec manyTill documentation question
Peter Schmitz
2010-09-25 03:46:43 UTC
Permalink
I am new to parsec and having difficulty understanding the
explanation of manyTill in
http://legacy.cs.uu.nl/daan/download/parsec/parsec.html.

(I really appreciate having this doc by the way; great reference.)
manyTill :: GenParser tok st a -> GenParser tok st end -> GenParser tok st [a]
(manyTill p end) applies parser p zero or more times until parser
end succeeds. Returns the list of values returned by p . This parser
simpleComment = do{ string "<!--"
; manyTill anyChar (try (string "-->"))
}
Note the overlapping parsers anyChar and string "<!--", and
therefore the use of the try combinator.
First, I would have expected it to instead say:

Note the overlapping parsers anyChar and string "-->", ...

since anyChar begins reading input beginning with the char *after*
string "<!--". Use of anyChar here will potentially overlap with
what it is reading towards: (string "-->").

Second, manyTill, by definition, keeps applying p (anyChar) until
end (string "-->") is satisfied, so I would expect one could just
write:

manyTill anyChar (string "-->")

Assuming the documentation is correct on both counts, I would really
appreciate any explanation someone could offer.

Thanks very much, (really like Haskell & parsec)
-- Peter


(If anyone knows of a collection of parsec demos or good examples, I
would appreciate a link; thanks)
Evan Laforge
2010-09-25 04:30:18 UTC
Permalink
[ sorry, forgot reply to all ]
Post by Peter Schmitz
simpleComment = do{ string "<!--"
                  ; manyTill anyChar (try (string "-->"))
                  }
Note the overlapping parsers anyChar and string "<!--", and
therefore the use of the try combinator.
Note the overlapping parsers anyChar and string "-->", ...
Yes, I think the doc just made a mistake there.  In fact, it looks
like the same mistake is in the current doc at
http://hackage.haskell.org/packages/archive/parsec/3.1.0/doc/html/Text-Parsec-Combinator.html
Post by Peter Schmitz
Second, manyTill, by definition, keeps applying p (anyChar) until
end (string "-->") is satisfied, so I would expect one could just
manyTill anyChar (string "-->")
The problem is that "-->" has multiple characters.  So if you have
"-not end comment", it will match the '-' against the (string "-->").
Since it doesn't backtrack by default, it's committed now and will
fail when it hits the 'n'.  The 'try' will make it backtrack to
'anyChar' when the second '-' fails to match.
Post by Peter Schmitz
(If anyone knows of a collection of parsec demos or good examples, I
would appreciate a link; thanks)
I thought the parsec source included some example parsers for simple
languages?  In any case, there is lots of material floating around,
though I found parsec so intuitive and the docs so good that I just
started hacking.  I think the 'build scheme in haskell' tutorial uses
parsec for the parsing.
Stephen Tetley
2010-09-25 07:34:36 UTC
Permalink
Post by Evan Laforge
I thought the parsec source included some example parsers for simple
languages?  In any case, there is lots of material floating around,
[Snip]
The best documentation is Daan Leijen's original manual, plus the
original source distribution which has example parsers for Henk, Tiger
and Mondrain.

Both are available from here - the original poster was working with
the HTML version of the manual - there is also a PDF version:

http://legacy.cs.uu.nl/daan/parsec.html

It would be nice if the Hackage package added the examples back into
the distribution.

The parser in the Scheme in 48 hours tutorial isn't a great example of
Parsec as it doesn't use the Token module. Not using the Token module
means the Scheme parser does hacky things such as parseNumber which
uses /read/ - this is double work, Parsec already handles numbers, it
doesn't need to call out to another parser (Haskell's builtin read).


http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing
Peter Schmitz
2010-09-27 21:38:47 UTC
Permalink
Stephen,
Thanks much for the pointer to the examples in the sources; found them.
(Its nice to learn from the coding style used by the authors.)
-- Peter

On Sat, Sep 25, 2010 at 12:34 AM, Stephen Tetley
Post by Stephen Tetley
Post by Evan Laforge
I thought the parsec source included some example parsers for simple
languages?  In any case, there is lots of material floating around,
[Snip]
The best documentation is Daan Leijen's original manual, plus the
original source distribution which has example parsers for Henk, Tiger
and Mondrain.
Both are available from here - the original poster was working with
http://legacy.cs.uu.nl/daan/parsec.html
It would be nice if the Hackage package added the examples back into
the distribution.
The parser in the Scheme in 48 hours tutorial isn't a great example of
Parsec as it doesn't use the Token module. Not using the Token module
means the Scheme parser does hacky things such as parseNumber which
uses /read/ - this is double work, Parsec already handles numbers, it
doesn't need to call out to another parser (Haskell's builtin read).
http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing
_______________________________________________
Haskell-Cafe mailing list
http://www.haskell.org/mailman/listinfo/haskell-cafe
Peter Schmitz
2010-09-27 21:22:19 UTC
Permalink
Post by Evan Laforge
Post by Peter Schmitz
simpleComment = do{ string "<!--"
                  ; manyTill anyChar (try (string "-->"))
                  }
Note the overlapping parsers anyChar and string "<!--", and
therefore the use of the try combinator.
Note the overlapping parsers anyChar and string "-->", ...
Yes, I think the doc just made a mistake there.  In fact, it looks
like the same mistake is in the current doc at
http://hackage.haskell.org/packages/archive/parsec/3.1.0/doc/html/Text-Parsec-Combinator.html
Evan,
Thanks very much for the typo confirmation, the explanation about
backtracking below, and the tip about the source distribution for the
examples.
I need to remember that multiple char strings imply backtracking, and
that backtracking is not the default, hence "try". Thanks.
-- Peter
Post by Evan Laforge
Post by Peter Schmitz
Second, manyTill, by definition, keeps applying p (anyChar) until
end (string "-->") is satisfied, so I would expect one could just
manyTill anyChar (string "-->")
The problem is that "-->" has multiple characters.  So if you have
"-not end comment", it will match the '-' against the (string "-->").
Since it doesn't backtrack by default, it's committed now and will
fail when it hits the 'n'.  The 'try' will make it backtrack to
'anyChar' when the second '-' fails to match.
Post by Peter Schmitz
(If anyone knows of a collection of parsec demos or good examples, I
would appreciate a link; thanks)
I thought the parsec source included some example parsers for simple
languages?  In any case, there is lots of material floating around,
though I found parsec so intuitive and the docs so good that I just
started hacking.  I think the 'build scheme in haskell' tutorial uses
parsec for the parsing.
Loading...