5

Comment Stripper - new problem 242

 3 years ago
source link: https://www.codeabbey.com/index/forum_topic/86e730c6471b98d56327b8656d608828
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Comment Stripper - new problem 242

Back to General discussions forum

Rodion (admin)     2021-11-26 20:16:18

Hi! You are completely right, such case is not covered. But it is in problem statement:

Suppose that no other symbol except single or double quote, or backslash could follow escaping backslash

I.e. we do not try to cover full real python syntax - I suspect it is still possible with regexps, but they will look extremely horribly... Such problem should really be solved with state automat, rather than regexp (which, of course, contains state automat inside, but we have limited way of "programming" it)...

BTW curious to see that someone (you) decided to start with latest problems. Surely that's better, more diversity :)

HouseDwarf     2021-11-26 20:41:25

Ah, my mistake - I only skim read the problem statement and didn't notice that. I figured that the disclaimers would essentially mean you wouldn't do something tricky like:

print(r'\\'') # this is a bit evil

In the test cases :-).

The reason I decided to do some of the later problems is that I occasionally talk to someone else that uses the site regularly and they'll describe the problems they're doing. After that I sometimes get the desire to do the problem we were talking about

Rodion (admin)     2021-11-26 21:45:50

Feels like my long lost cousin ... :)

Ha-ha, that note is more puzzling rather than clarifying - either you met your cousin, or just think you met, or you think that person could be your cousin if you ever had one...

But probably this wasn't meant to be clarifying at all!

As a side note I dare say I expected this problem to be very simple, and solution by HouseDwarf is even simpler than mine - but after it hanged here for a few days before first solutions, I start suspecting I misjudged it a bit...

print(r'\\'') # this is a bit evil

Honestly, I'm looser in Python so this made me go-and-check - this isn't syntactically correct at all, right?

HouseDwarf     2021-11-26 21:58:22

Ah, just ran that in the interpreter and that is indeed not correct. I misunderstood the semantics of raw strings. It looks like they probably mean "accept a normal string literal but don't convert any of the escapes when converting it to its value". I thought they would have different definitions for what follows the backslash than normal string literals. I found it surprising, for instance, that:

print(r'\'')   # is fine - prints: \'
print(r'\\'')  # is not fine
print(r'\\\'') # is fine - prints: \\\'

In which case if the actual meaning of raw string is to process a normal string literal but don't process the escapes when converting it to its value then I can't think of much more tricky stuff that you can do... The only other trick I can think of is:

print(''' this's evil ''')

Which would break solutions naive to triple quotes...

Rodion (admin)     2021-11-26 22:05:14

gardengnome & HouseDwarf

oh my, I'm either blind or too stupid. thanks :)

by the way I remember long ago I first misread gnome for genome and was quite puzzled. Is there any curious story behind this username, if not a secret?

Which would break solutions naive to triple quotes...

Ah, that's true. I vaguely remembered r'...' and u'...' and triple quotes, but thought the latter are mainly for multiple line usage. Well, I even hate to think about writing regexp to cover this all. Anyway initial problem was inspired by exercise in K&R ancient book, which was written probably in pre-regex times :)

Rodion (admin)     2021-11-26 22:10:20

Aha, thanks for this note, I'll try to improve problem statement.

Regexp could cover either comment (provided that replacement is empty string) or more (but then one needs to use capture groups in replacement part, I guess). You have freedom in this sense.

What format should the replacement pattern take

not sure I understand the question, but I'd better add alternative example with capture group:

(.*)#.* $1

Ignoring quoted strings, such simple regexp and replacement will capture anything before sharp sign and replace the whole matched part with the content of the matched group. Hm. I shall try to write it down carefully in the morning :)

But one can just play with "regexp" button to get idea...

HouseDwarf     2021-11-27 09:29:56

I thought about this problem a little more and wrote a solution that is triple quote aware (I've posted that as my solution now - I think it's quite readable for what it is). I've spent a while trying to think of anything that would break it, but I can't think of any special cases it can't handle. If anyone else can I'd be interested to know :-). Edit: to be clear - I'll be interested in any valid line of python that breaks my solution (other than triple quote strings across multiple lines). After messing around with raw strings a bit I think that my solution should handle them.

Oh, and by the way, because of the comment I decided to look up when K&R was written and when regex became a thing. K&R was written in 1978, and regular expressions were devised in the 1950s. I figured I'd look this up as I think of regular expressions as "mathsy" and C as "engineery", so thought regular expressions would be older. Personally, my main memory of regular expressions is automata theory and language hierarchies at university, where it had a very "theoretical" feel. I think that the kind of regular expressions that we use day to day aren't actually regular expressions as they add some operations which are strictly more powerful than formal regular expressions (depending on the implementation).

Also, when thinking about writing a regular expression to match this I considered that if the problem were too difficult to write a regular expression by hand it might be easier to write a DFA and then convert that to a regular expression. I didn't previously know whether there existed a DFA to RE conversion algorithm, but figured there probably did and that DFA and RE are equivalent. Anyway, I looked it up and found there do indeed exist DFA to RE conversion algorithms like this. I don't know how far down the regular expression rabbit hole you want to go on this site, but I think making a problem where the easiest way to generate a regular expression to solve the problem is writing a DFA to RE compiler could be fun XD.

Please login and solve 5 problems to be able to post at forum

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK