This should be set to US-ASCII if desired as this module is by default inconsistent with RFC 3023 which requires that for text/xml documents without a charset parameter in the HTTP header US-ASCII is assumed. text_xml_defaultĭefault encoding for documents determined (by is_text_xml) as text/xml, defaults to undef in which case the default is ignored. xml_defaultĭefault encoding for documents determined (by is_xml) as XML, defaults to UTF-8. html_defaultĭefault encoding for documents determined (by is_html) as HTML, defaults to ISO-8859-1. This will only be checked if is_xml matches aswell. xml_declaration_from_octets($octets )Īttempts to find a ">" character in the byte string $octets using the encodings in $encodings and upon success attempts to find a preceding " elements using encoding_from_meta_element. This means XML processors must apply further checks to determine whether the entity is well-formed, etc. Note that encoding_from_xml_declaration() determines the encoding even if the XML declaration is not well-formed or violates other requirements of the relevant XML specification as long as it can find an encoding pseudo-attribute in the provided string. ENCODING SOURCESĮncoding_from_xml_document, encoding_from_html_document, and encoding_from_http_message return in list context the encoding source and the encoding name, possible encoding sources are * protocol (Content-Type: text/html charset=encoding) If you change the values or pass custom values to the routines note that Encode must support them in order for this module to work correctly. This option always defaults to the $HTML::Encoding::DEFAULT_ENCODINGS array reference which means the following encodings are considered by default: * ISO-8859-1 Most routines need to know some suspected character encodings which can be provided through the encodings option. HTML::Encoding helps to determine the encoding of HTML and XML/XHTML documents. The interface and implementation are guranteed to change before this module reaches version 1.00! Please send feedback to the author of this module. My $utf8 = decode($enco => $resp->content) WARNING My $enco = encoding_from_http_message($resp) HTML::Encoding - Determine the encoding of HTML/XML/XHTML documents SYNOPSIS use HTML::Encoding 'encoding_from_http_message'