

How do I get Perl to read the values of my html form as unicode?
source link: https://www.codesd.com/item/how-do-i-get-perl-to-read-the-values-of-my-html-form-as-unicode.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How do I get Perl to read the values of my html form as unicode?
I have an html form that sends data to .cgi page. Here is the html:
<HTML>
<BODY BGCOLOR="#FFFFFF">
<FORM METHOD="post" ACTION="test.cgi">
<B>Write to me below:</B><P>
<TEXTAREA NAME="feedback" ROWS=10 COLS=50></TEXTAREA><P>
<CENTER>
<INPUT TYPE=submit VALUE="SEND">
<INPUT TYPE=reset VALUE="CLEAR">
</CENTER>
</FORM>
</BODY>
</HTML>
Here is the perl script for test.cgi:
#!/usr/bin/perl
use utf8;
use encoding('utf8');
require Encode;
require CGI;
# The following accepts the data from the form and puts it in %FORM
if ($ENV{'REQUEST_METHOD'} eq 'POST') {
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$FORM{$name} = $value;
}
# The following generates the html for the page
print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<HEAD>\n";
print "<TITLE>Thank You!</TITLE>\n";
print "</HEAD>\n";
print "<BODY BGCOLOR=#FFFFCC TEXT=#000000>\n";
print "<H1>Thank You!</H1>\n";
print "<P>\n";
print "<H3>Your feedback is greatly appreciated.</h3><BR>\n";
print "<P>\n<P>\n";
print "The user wrote:\n\n";
print "<P>\n";
# This is print statement A
print "$FORM{'feedback'}<br>\n";
$FORM{'feedback'}=~s/(\w)/ $1/g;
# This is print statement B
print "$FORM{'feedback'}\n";
print "</BODY>\n";
print "</HTML>\n";
exit(0);
}
This all works the way it's supposed to if the user enters English text. However, this will eventually be used in a product where the user will enter Chinese text. So here's an example of the problem. If the user enters "中文" into the form, then Print Statement A prints "中文." However, Print Statement B (which prints $value after the regex has been run) prints " 2 0 0 1 3; 2 5 9 9 1; ". What I want it to print however is "中 文". If you want to see this, go to http://thedeandp.com/chinese/input.html and try it yourself.
So basically, what I've figured out is that when perl reads in the form, it's just treating each byte as a character, so the regex adds a space between each byte. Chinese characters use unicode, so it's multiple bytes to a character. That means the regex breaks up the unicode with a space between the bytes, and that is what produces the output seen in Print Statement B. I've tried methods like $value = Encode::decode_utf8($value) to get perl to treat it as unicode, but nothing has worked so far.
That CGI style could be improved while fixing your encoding decoding issue. Try this–
use strict;
use warnings;
use Encode;
use CGI ":standard";
use HTML::Entities;
print
header("text/html; charset=utf-8"),
start_html("Thank you!"),
h1("Thank You!"),
h3("Your feedback is greatly appreciated.");
if ( my $feedback = decode_utf8( param("feedback") ) )
{
print
p("The user wrote:"),
blockquote( encode_utf8( encode_entities($feedback) ) );
}
print end_html();
Proper encoding and decoding between octets/bytes and utf-8 is necessary to avoid surprises and allow the Perl to behave as you’d expect.
For example, you can drop this in–
h4("Which capitalizes as:"),
blockquote( encode_utf8( uc $feedback ) );
And see character conversions work like so: å™ç∂®r£ ➟ Å™Ç∂®R£
Update: added encode_entities
. NEVER print user input back without escaping the HTML. Update to update: which actually will end up escaping the utf-8 depending on the setup (you can have it only escape ['"<>] for example)…
Recommend
-
14
Standard Django: Passing Form Values in URL Lets say you have a form that is used to drive the next page. For example, it might contain parameters for running a report. One way to ha...
-
23
Standard Django: Putting Form Values in the URL Query String I used to avoid using URLs with long query strings. This was partially due to an essay I read by a Big Shot about how URL...
-
16
How to Style a Form with HTML and CSS 365 Days of Coding (19 Part Series) Happy day 9 of 365 Days of Coding! Today is day 2 of a 4 day series on creating...
-
13
Inserting Form Values into an Excel Worksheet Using VBA advertisements I'm trying to insert form values into my Excel spreadsheet using vba...
-
8
How to find and insert Unicode symbols in HTML When I started using MacBook, I couldn’t remember how to type the right arrow (→) symbol. I...
-
15
Why does HTML Canvas getImageData () not return exactly the same values that were set? advertisements When writing pixels to an HTML Canvas...
-
6
pelican编译html出错must be unicode, not str ERROR: Could not process ./linux_query_software_ower_pkg_deb_rpm.md | TypeError: must be unicode, not str 原因linux_q...
-
11
include <bits/stdc++.h>using namespace std;int number; int l1,r1,l2,r2;int main(){ cin >> number; for(int i = 0; i < number; i++){ cin >> l1 >> r1 >> l2 >> r2; //basically, check if...
-
12
Code is available on GitHub Storing secrets in AWS For IAM credentials, AWS provides a secret-less way t...
-
5
Why do STANDARD_RIGHTS_READ, STANDARD_RIGHTS_WRITE, and STANDARD_RIGHTS_EXECUTE have the same values?
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK