Writing language definition files: Difference between revisions

From Bluefish Wiki
Jump to navigation Jump to search
Line 132: Line 132:
== The context tag in the definition section ==
== The context tag in the definition section ==


<pre><context symbols="&gt;&lt;&amp;; &#9;&#10;&#13;" commentid_block="cm.htmlcomment" commentid_line="none"></pre>
<pre><context symbols="&amp;gt;&amp;lt;&amp;amp;; &amp;#9;&amp;#10;&amp;#13;" commentid_block="cm.htmlcomment" commentid_line="none"></pre>
Or
Or
<pre><context symbols="LIST OF CHARACTERS" highlight="HIGHLIGHT-TYPE" id="IDENTIFIER" > </pre>
<pre><context symbols="LIST OF CHARACTERS" highlight="HIGHLIGHT-TYPE" id="IDENTIFIER" > </pre>
Line 146: Line 146:
To re-use a context, use <context idref="IDENTIFIER" /> where IDENTIFIER refers to a previously defined
To re-use a context, use <context idref="IDENTIFIER" /> where IDENTIFIER refers to a previously defined
context. The file is parsed top to bottom.
context. The file is parsed top to bottom.


=== What are symbols ===
=== What are symbols ===
Line 165: Line 164:


To detect function strlen in the following examples (language C):
To detect function strlen in the following examples (language C):
i=strlen(a);
<pre>i=strlen(a);
i+strlen(a);
i+strlen(a);
i*strlen (a);
i*strlen (a);</pre>
we need at least symbols =+*(
we need at least symbols ''=+*(''


In most languages all whitespace is a symbol ( =space, &#9;=tab, &#10;=newline, &#13;=carriange return).
In most languages all whitespace is a symbol ( =space, &amp;#9;=tab, &amp;#10;=newline, &amp;#13;=carriange return).


In xml/sgml/html only '<>&;' are symbols, but withtin a tag also " and ' are symbols.
In xml/sgml/html only '<>&amp;;' are symbols, but withtin a tag also " and ' are symbols.


== The element tag in the definition section ==
== The element tag in the definition section ==


== The tag tag in the definition section ==
== The tag tag in the definition section ==

Revision as of 22:38, 11 March 2014

Bluefish language definition files

All syntax highlighting and autocompletion is defined in bluefish language definition files, saved in .bflang2 files. In the source code they can be found in data/bflang/

On Linux they are installed in /usr/share/bluefish/bflang/

Example files

shell.bflang2 is the most simple example of what a language definition can look like. php.bflang2 is probably the most complex example with many included files and many different syntax types supported within another syntax (javascript, css, html and php itself). There is also sample.bflang2 that describes more or less the same as this wikipage.

Editing bflang files

If you store a bflang2 file in your bluefish settings directory ~/.bluefish/ it has higher priority than the system wide installed files. So if you are going to change a bflang2 file, just copy it into ~/.bluefish/

If you start bluefish from the commandline it will output errors and warnings about the bflang2 files that are loaded. So after you have edited a bflang2 file, test it, and look at the output in the terminal.

Including files

The format of the file

The file format is XML.

It starts with a root tag <bflang>:

<bflang name="Shell" version="2.0" >
</bflang>

Inside the root tag there are three sections

The header section

The header section is always loaded for each bflang2 file. The rest of the file is loaded "on demand", so only if it is needed.

<header>
	<mime type="application/x-shellscript"/>
	<option name="show_in_menu" default="1"/>
	<highlight name="value" style="value"  />
</header>

The mime tag in the header

The mime tag specifies for which mime types this definition file is used. There can be multiple mime types specified. Sometimes a file doesn't have a specific mime type, or the mime type is not defined on many systems. In that case the mime type is often something like text/plain Bluefish supports a combination of mime type and extension. To detect a file type that ends on .fake you add

<mime type="application/x-fake"/>
<mime type="text/plain?fake"/>

The option tag in the header

The option tag defines an option that is used further on in the language file

<option name="allphpfunctions" default="1" description="All php functions" />

A special note: All language files share one list of option names and their description. So if two or more options have the same name, they will get the same description in Bluefish. If they have a different description inside the file, it is not defined which description is used!!!

There are a few special (hardcoded) option names:

In this example a block named 'php block' is made optionally foldable (or not). The '_foldable' suffix is hardcoded in bluefish.

<option name="php block_foldable" default="1" description="Allow the PHP block to fold"/>

Whether or not to load the reference data for this language (saves memory)

<option name="load_reference" default="1"/>

Whether or not to load the auto completion data for this language (saves memory)

<option name="load_completion" default="1" />

Whether or not to close <tag> in the auto-completion <option name="autocomplete_tags" default="1" />

Whether or not to show this language by default in the menu

<option name="show_in_menu" default="0"/>

The highlight tag in the header

The higlight tag defines which element-types that are defined in the file, and which styles should be applied for each of these types. THESE CAN BE ALTERED BY THE USER IN THE PREFERENCES PANEL..

So if an element in this file has attribute highlight="foo", this section should have <highlight name="foo" style="somestyle"/>. Look at other language files and try to re-use styles !!!!!!!!!

For the end-user it is convenient if styles are re-used. All languages that define a comment should use style 'comment' by default.

<highlight name="comment" style="comment" />

Some users may like the same color for all keywords, other may like a different style for storage types and language keywords. So use a different 'highlight' name for them, such that users may assign a different textstyle if they want.

<highlight name="storage-types" style="keyword" />
<highlight name="keyword" style="keyword" />

The properties section

The properties section is similar to the header, but it is loaded on-demand. As long as there is no syntax scanning needed for this type of file, the properties section is not yet loaded.

The comment tag in the properties section

the comment tag defines which type of line comments and block comments that could exist in this language. The smart comment function shift-ctrl-c uses this information to comment or uncomment

	<comment id="cm.cblockcomment" type="block" start="/*" end="*/" />
	<comment id="cm.htmlcomment" type="block" start="<!--" end="-->" />
	<comment id="cm.cpplinecomment" type="line" start="//" />
	<comment id="cm.scriptcomment" type="line" start="#" />

The smartindent and smartoutdent tags in the properties section

smartindent characters specify which characters, followed by a return, should increase the indenting. Smartoutdent means that this character, typed immediately after auto-indenting has set the indenting, should decrease the previous auto-indenting

	<smartindent characters="{" />
	<smartoutdent characters="}" />

The default_spellcheck tag in the properties section

default_spellcheck defines if regions that are not highlighted will be checked by the spell checker. This is typically enabled for markup languages like HTML and XML, and disabled (or left out, because the default=0) for all programming languages

	<default_spellcheck enabled="1" />

The definition section

The definition section is where the syntax is really described.

A language definition always starts with a <context> tag, and contains ONE SINGLE context tag (which may have other context tags as children).

The context tag in the definition section

<context symbols="&gt;&lt;&amp;; &#9;&#10;&#13;" commentid_block="cm.htmlcomment" commentid_line="none">

Or

<context symbols="LIST OF CHARACTERS" highlight="HIGHLIGHT-TYPE" id="IDENTIFIER" > 

A <context> tag should always define symbols. Symbols are those characters that may start or end an element.

The optional attribute highlight may specify a highlight type that is valid for the complete text region that has this context. Useful for 'comment' or 'string' type of contexts where the complete context is highlighted

The optional attribute 'id' is used to define an identifier which can be used to re-use this context.

To re-use a context, use <context idref="IDENTIFIER" /> where IDENTIFIER refers to a previously defined context. The file is parsed top to bottom.

What are symbols

Symbols are characters that may start or end a pattern. Try to highlight for example:

 char *rc_char(char*chara);
 ^^^^          ^^^^

Only two of the four 'char' need to be highlighted. How does the scanner know which one to highlight? In the above example there are several symbols such as whitespace , brackets and operators:

 char *rc_char(char*chara);
^    ^^       ^    ^     ^^

see that the occurences of 'char' that should be highlighted are all in between symbols?!

To detect function strlen in the following examples (language C):

i=strlen(a);
i+strlen(a);
i*strlen (a);

we need at least symbols =+*(

In most languages all whitespace is a symbol ( =space, &#9;=tab, &#10;=newline, &#13;=carriange return).

In xml/sgml/html only '<>&;' are symbols, but withtin a tag also " and ' are symbols.

The element tag in the definition section

The tag tag in the definition section