Toggle menu
15
236
70
27.6K
Kenshi Wiki
Toggle preferences menu
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.


Implements Lua functions mw.text.decode, mw.text.encode in a module.

{{#invoke:decodeEncode|decode|s=Source text©}}Source text©

See List of XML and HTML character entity references.

Decode (© → ©)

Decodes Named Entities from entity name into a regular (unicode) character:
©©
>>

All well-defined named entities are decoded (HTML Named character references, formally: as defined in the PHP table).

A regular, rendered sentence:
"At 100 °F, & with a "burning" sun above, we , we ⁄walked⁄."
In code:
"At 100 °F, & with a "burning" sun above, we ⁄walked⁄." -- wikitext
Processing:
{{#invoke:decodeEncode|decode|s=At 100 °F, & with a "burning" sun above, we ⁄walked⁄.}}
At 100 °F, & with a "burning" sun above, we ⁄walked⁄. -- In code: straight characters, no named entities.
Renders, again:
"At 100 °F, & with a "burning" sun above, we ⁄walked⁄."

Decode a reduced set only

By setting |subset_only=true, only these five entity names are decoded: '&lt;', '&gt;', '&amp;', '&quot;', '&nbsp;' (that is, into '<', '>', '&', '"', ' ').

Note: There is a difference with the relevant Lua parameter. (This only concerns your task if you also work directly with the Lua mw.text.decode function). Lua documentation defines parameter |decodeNamedEntities=, having this effect: when omitted or false, only the reduced set of entities is recognized and decoded. This use of 'false' is inverted in using |subset_only=: |decodeNamedEntities=false = |subset_only=true.
Also, this module ignores the "omitted" logic: |subset_only= should be set explicitly to 'true' to be effective.

Encode (© → &copy;)

Function encode encodes some entity-named characters into that name (for example: &&amp;).

Regular sentence:

"At >100 °F, & with a "burning" sun above, we walked. ©"

In code:

"At >100 °F, & with a "burning" sun above, we walked. ©"

Encode:

{{#invoke:decodeEncode|encode|s=At >100 °F, & with a "burning" sun above, we walked. ©|charset=&<>{{!}}°"'&©}}
At &gt;100 &#176;F, &amp; with a &quot;burning&quot; sun above, we walked. &#169;
Renders as:
"At >100 °F, & with a "burning" sun above, we walked. ©"

character set to encode

Per Lua documentation, only a small set of characters is processed. The characterset can be set (expanded) by using |charset=.

Example: |charset=<>" \'& (the default), |charset=<>°"'&©{{!}}; characters not in the default will be replaced by their decimal entity: ©&#169; (hexadecimal number, not decimal nor named &copy;)

Known issues

  • 13 Sep 2021: NOTE: The encode function with user-supplied charset is now used productively in {{R/superscript}} and {{R/ref}}. Before implementing breaking changes here, these templates need to be adjusted accordingly!
  • 26 Sep 2021: U+2009 THIN SPACE (&thinsp;, &ThinSpace;)
Note: Possible bug: Decoding &ThinSpace; works, but &thinsp; doesn't.
Resolved in code.
  • 4 Feb 2023: U+03B5 ε GREEK SMALL LETTER EPSILON (&epsi;, &epsilon;)
See Module talk:DecodeEncode  Bug report: bad decoding of U+03B5 ε (epsilon)
Resolved in code.

See also


require('strict')
local p = {}

local function _getBoolean( boolean_str )
	-- from: module:String; adapted
	-- requires an explicit true
	local boolean_value

	if type( boolean_str ) == 'string' then
		boolean_str = boolean_str:lower()
		if boolean_str == 'true' or boolean_str == 'yes' or boolean_str == '1' then
			boolean_value = true
		else
			boolean_value = false
		end
	elseif type( boolean_str ) == 'boolean' then
		boolean_value = boolean_str
	else
		boolean_value = false
	end
	return boolean_value
end

function p.decode( frame )
	local s = frame.args['s'] or ''
	local subset_only = _getBoolean(frame.args['subset_only'] or false)

	return p._decode( s, subset_only )
end

function p._decode( s, subset_only )
	-- U+2009 THIN SPACE: workaround for bug: HTML entity &thinsp; is decoded incorrect. Entity &ThinSpace; gets decoded properly
	s = mw.ustring.gsub( s, '&thinsp;', '&ThinSpace;' )
	-- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity &epsilon; is decoded incorrect for gsub(). Entity &epsi; gets decoded properly
	s = mw.ustring.gsub( s, '&epsilon;', '&epsi;' )

	local ret = mw.text.decode( s, not subset_only )

	return ret
end

function p.encode( frame )
	local s = frame.args['s'] or ''
	local charset = frame.args['charset']

	return p._encode( s, charset )
end

function p._encode( s, charset )
	-- example: charset = '_&©−°\\\"\'\=' -- do escape with backslash not %;
	local ret

	if charset and charset ~= '' then
		ret = mw.text.encode( s, charset )
	else
		-- use default: chartset = '<>&"\' ' (outer quotes = lua required; space = NBSP)
		ret = mw.text.encode( s )
	end 
	
	return ret
end

return p