Writing language definition files: Difference between revisions

From Bluefish Wiki
Jump to navigation Jump to search
No edit summary
Line 21: Line 21:


This makes it easier to re-use syntax. CSS is for example used in html, php and css itself.
This makes it easier to re-use syntax. CSS is for example used in html, php and css itself.
== Deep understanding of the Bluefish syntax scanning ==
'''You do not need to understand this to change or write a language file, this is provided for deeper understanding of the internals'''
=== Scanning with a DFA table ===
Lets use a very simple language file:
<context symbols=" ;(){}[]:\&#34;\\',&amp;gt;&amp;lt;*&amp;amp;^%!+=-|/?#&amp;#9;&amp;#10;&amp;#13;." dump_dfa_chars="()*;char" dump_dfa_run="1">
  <element pattern="(" id="lparen" starts_block="1" highlight="brackets" />
  <element pattern=")" highlight="brackets" ends_block="1" blockstartelement="lparen" />
  <element pattern="char" highlight="function" />
</context>
Bluefish compiles each context into a DFA table. Because we use the attribute ''dump_dfa_chars'' Bluefish will show the DFA table for these characters:
***************** print subset of DFA table for context 1
        '('  ')'  '*'  ';'  'c'  'h'  'a'  'r' : match
    0:    2    3    0    0    4    1    1    1  :    0 this is the startstate
    1:    0    0    0    0    1    1    1    1  :    0 this is the identstate
    2:    0    0    0    0    0    0    0    0  :    1 (
    3:    0    0    0    0    0    0    0    0  :    2 )
    4:    0    0    0    0    1    5    1    1  :    0
    5:    0    0    0    0    1    1    6    1  :    0
    6:    0    0    0    0    1    1    1    7  :    0
    7:    0    0    0    0    1    1    1    1  :    3 char
*****************
Lets scan the following text with this table:
char *rc_char(char*chara);
Because we used the attribute ''dump_dfa_run'' we get to see how the scanner walks trough this table:
context 1: '  ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: '  ' in    7 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (char)
context 1: '  ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '* ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'r ' in    0 makes    1 --> nothing matches, go to identstate
context 1: 'c ' in    1 makes    1 .....identstate
context 1: '_ ' in    1 makes    1 .....identstate
context 1: 'c ' in    1 makes    1 .....identstate
context 1: 'h ' in    1 makes    1 .....identstate
context 1: 'a ' in    1 makes    1 .....identstate
context 1: 'r ' in    1 makes    1 .....identstate
context 1: '( ' in    1 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '( ' in    0 makes    2
context 1: 'c ' in    2 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (()
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: '* ' in    7 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (char)
context 1: '* ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: 'a ' in    7 makes    1 --> nothing matches, go to identstate
context 1: ') ' in    1 makes    0 --> restart scanning (found a symbol, no match?)
context 1: ') ' in    0 makes    3
context 1: '; ' in    3 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match ())
context 1: '; ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '\0' in    0 makes    0 --> restart scanning (found a symbol, no match?)
== Code documentation ==
If you want a deep understanding how the syntax scanner works, please read the documentation included in the Bluefish source code
* src/bftextview2.h the scanner overall design is described, and some of the types are defined
* src/bftextview2_private.h most internal types are described
* src/bftextview2.c has the code for the widget which invokes the scanner, the spell checker
* src/bftextview2_langmgr.c has the code for the parsing of the language file, which invokes the DFA compiler
* src/bftextview2_patcompile.c has the code that compiles the DFA table
* src/bftextview2_scanner.c has the code for the scanner and it's cache
* src/bftextview2_autocomp.c has the auto-completion code
* src/bftextview2_markregion.c has the code to keep track of which part of the document has changes and needs rescanning
* src/bftextview2_spell.c has the code to do context-sensitive spell checking
* src/bftextview2_identifier.c has the code to keep track of identifiers (for example names of user defined variables) so you can jump to them, or autocomplete them.


= The format of the file =
= The format of the file =
Line 580: Line 496:
condition_mode="1" condition_relation="2" condition_contextref="c.html.css.main"/>
condition_mode="1" condition_relation="2" condition_contextref="c.html.css.main"/>
</pre>
</pre>
= Deep understanding of the Bluefish syntax scanning =
'''You do not need to understand this to change or write a language file, this is provided for deeper understanding of the internals'''
== Scanning with a DFA table ==
Lets use a very simple language file:
<context symbols=" ;(){}[]:\&#34;\\',&amp;gt;&amp;lt;*&amp;amp;^%!+=-|/?#&amp;#9;&amp;#10;&amp;#13;." dump_dfa_chars="()*;char" dump_dfa_run="1">
  <element pattern="(" id="lparen" starts_block="1" highlight="brackets" />
  <element pattern=")" highlight="brackets" ends_block="1" blockstartelement="lparen" />
  <element pattern="char" highlight="function" />
</context>
Bluefish compiles each context into a DFA table. Because we use the attribute ''dump_dfa_chars'' Bluefish will show the DFA table for these characters:
***************** print subset of DFA table for context 1
        '('  ')'  '*'  ';'  'c'  'h'  'a'  'r' : match
    0:    2    3    0    0    4    1    1    1  :    0 this is the startstate
    1:    0    0    0    0    1    1    1    1  :    0 this is the identstate
    2:    0    0    0    0    0    0    0    0  :    1 (
    3:    0    0    0    0    0    0    0    0  :    2 )
    4:    0    0    0    0    1    5    1    1  :    0
    5:    0    0    0    0    1    1    6    1  :    0
    6:    0    0    0    0    1    1    1    7  :    0
    7:    0    0    0    0    1    1    1    1  :    3 char
*****************
Lets scan the following text with this table:
char *rc_char(char*chara);
Because we used the attribute ''dump_dfa_run'' we get to see how the scanner walks trough this table:
context 1: '  ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: '  ' in    7 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (char)
context 1: '  ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '* ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'r ' in    0 makes    1 --> nothing matches, go to identstate
context 1: 'c ' in    1 makes    1 .....identstate
context 1: '_ ' in    1 makes    1 .....identstate
context 1: 'c ' in    1 makes    1 .....identstate
context 1: 'h ' in    1 makes    1 .....identstate
context 1: 'a ' in    1 makes    1 .....identstate
context 1: 'r ' in    1 makes    1 .....identstate
context 1: '( ' in    1 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '( ' in    0 makes    2
context 1: 'c ' in    2 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (()
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: '* ' in    7 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (char)
context 1: '* ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: 'a ' in    7 makes    1 --> nothing matches, go to identstate
context 1: ') ' in    1 makes    0 --> restart scanning (found a symbol, no match?)
context 1: ') ' in    0 makes    3
context 1: '; ' in    3 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match ())
context 1: '; ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '\0' in    0 makes    0 --> restart scanning (found a symbol, no match?)
== Code documentation ==
If you want a deep understanding how the syntax scanner works, please read the documentation included in the Bluefish source code
* src/bftextview2.h the scanner overall design is described, and some of the types are defined
* src/bftextview2_private.h most internal types are described
* src/bftextview2.c has the code for the widget which invokes the scanner, the spell checker
* src/bftextview2_langmgr.c has the code for the parsing of the language file, which invokes the DFA compiler
* src/bftextview2_patcompile.c has the code that compiles the DFA table
* src/bftextview2_scanner.c has the code for the scanner and it's cache
* src/bftextview2_autocomp.c has the auto-completion code
* src/bftextview2_markregion.c has the code to keep track of which part of the document has changes and needs rescanning
* src/bftextview2_spell.c has the code to do context-sensitive spell checking
* src/bftextview2_identifier.c has the code to keep track of identifiers (for example names of user defined variables) so you can jump to them, or autocomplete them.

Revision as of 09:53, 7 November 2014

Bluefish language definition files

All syntax highlighting and autocompletion is defined in bluefish language definition files, saved in .bflang2 files. In the source code they can be found in data/bflang/

On Linux they are installed in /usr/share/bluefish/bflang/

Example files

shell.bflang2 is the most simple example of what a language definition can look like. php.bflang2 is probably the most complex example with many included files and many different syntax types supported within another syntax (javascript, css, html and php itself). There is also sample.bflang2 that describes more or less the same as this wikipage.

Editing bflang files

If you store a bflang2 file in your bluefish settings directory ~/.bluefish/ it has higher priority than the system wide installed files. So if you are going to change a bflang2 file, just copy it (and any files it includes) into ~/.bluefish/

If you start bluefish from the commandline it will output errors and warnings about the bflang2 files that are loaded. So after you have edited a bflang2 file, test it, and look at the output in the terminal.

Including files

The top of a bflang file may define new entities that will include another file. For example the line

<!ENTITY css-rules SYSTEM "css-rules.bfinc">

defines that &css-rules; should be replaced by the contents of css-rules.bfinc (which should be placed in the same directory)

This makes it easier to re-use syntax. CSS is for example used in html, php and css itself.

The format of the file

The file format is XML.

It starts with a root tag <bflang>:

<bflang name="Shell" version="2.0" >
</bflang>

Inside the root tag there are three sections

The header section

The header section is always loaded for each bflang2 file. The rest of the file is loaded "on demand", so only if it is needed.

<header>
	<mime type="application/x-shellscript"/>
	<option name="show_in_menu" default="1"/>
	<highlight name="value" style="value"  />
</header>

The mime tag in the header

The mime tag specifies for which mime types this definition file is used. There can be multiple mime types specified. Sometimes a file doesn't have a specific mime type, or the mime type is not defined on many systems. In that case the mime type is often something like text/plain Bluefish supports a combination of mime type and extension. To detect a file type that ends on .fake you add

<mime type="application/x-fake"/>
<mime type="text/plain?fake"/>

The option tag in the header

The option tag defines an option that is used further on in the language file

<option name="allphpfunctions" default="1" description="All php functions" />

A special note: All language files share one list of option names and their description. So if two or more options have the same name, they will get the same description in Bluefish. If they have a different description inside the file, it is not defined which description is used!!!

An option is a boolean value that is referred to in class and notclass attributes.

Adding

class="allphpfunctions"

means the tag is enabled if the user option is enabled.

Adding

notclass="allphpfunctions"

means the tag is disabled if the user option is enabled.

These attributes exist for <element />, <tag />, <group /> and <autocomplete />

hardcoded option names

There are a few special (hardcoded) option names:

In the next example a block named 'php block' is made optionally foldable (or not). Read more about block detection in the <element /> section. The '_foldable' suffix is hardcoded in bluefish.

<option name="php block_foldable" default="1" description="Allow the PHP block to fold"/>

Whether or not to load the reference data for this language (saves memory)

<option name="load_reference" default="1"/>

Whether or not to load the auto completion data for this language (saves memory)

<option name="load_completion" default="1" />

Whether or not to close <tag> in the auto-completion <option name="autocomplete_tags" default="1" />

Whether or not to show this language by default in the menu

<option name="show_in_menu" default="0"/>

Referring to an option further on in the language file, in tag or element

Since 2.2.5 Bluefish supports boolean variables inside the language file (that thus have value 0 or 1). There are two ways these can be used, as option: (for boolean values) and as condition: (for string values).

<element pattern="foo" highlight="condition:foo_as_string?string:function" >
	<autocomplete enable="option:autocomplete_foo" />
</element>

The highlight tag in the header

The higlight tag defines which element-types that are defined in the file, and which styles should be applied for each of these types. THESE CAN BE ALTERED BY THE USER IN THE PREFERENCES PANEL..

So if an element in this file has attribute highlight="foo", this section should have <highlight name="foo" style="somestyle"/>. Look at other language files and try to re-use styles !!!!!!!!!

For the end-user it is convenient if styles are re-used. All languages that define a comment should use style 'comment' by default.

<highlight name="comment" style="comment" />

Some users may like the same color for all keywords, other may like a different style for storage types and language keywords. So use a different 'highlight' name for them, such that users may assign a different textstyle if they want.

<highlight name="storage-types" style="keyword" />
<highlight name="keyword" style="keyword" />

The properties section

The properties section is similar to the header, but it is loaded on-demand. As long as there is no syntax scanning needed for this type of file, the properties section is not yet loaded.

The comment tag in the properties section

the comment tag defines which type of line comments and block comments that could exist in this language. The smart comment function shift-ctrl-c uses this information to comment or uncomment

	<comment id="cm.cblockcomment" type="block" start="/*" end="*/" />
	<comment id="cm.htmlcomment" type="block" start="<!--" end="-->" />
	<comment id="cm.cpplinecomment" type="line" start="//" />
	<comment id="cm.scriptcomment" type="line" start="#" />

The smartindent and smartoutdent tags in the properties section

smartindent characters specify which characters, followed by a return, should increase the indenting. Smartoutdent means that this character, typed immediately after auto-indenting has set the indenting, should decrease the previous auto-indenting

	<smartindent characters="{" />
	<smartoutdent characters="}" />

The default_spellcheck tag in the properties section

default_spellcheck defines if regions that are not highlighted will be checked by the spell checker. This is typically enabled for markup languages like HTML and XML, and disabled (or left out, because the default=0) for all programming languages

	<default_spellcheck enabled="1" />

The definition section

The definition section is where the syntax is really described.

A language definition always starts with a <context> tag, and contains ONE SINGLE context tag (which may have other context tags as children).

The concept of contexts

Different positions in a file may have a different syntax. An HTML example: inside a comment you can have a < character without breaking the syntax. That means that the syntax scanner inside a HTML comment is only looking for the end of the comment. But outside the comment it is looking for tags, or entities. Thus the syntax scanner runs in two different contexts: the main context (where a tag, entity or comment may be started), and the comment context (where only the end of the comment is relevant). But inside a tag we have again a new context, because we only look for attributes. And we may have CSS inside HTML, or javascript. And inside javascript we can again have a comment. Etc. etc. etc. The HTML syntax currently has 465 contexts.

The syntax scanner always is in one single context.

The context tag in the definition section

<context symbols="&gt;&lt;&amp;; &#9;&#10;&#13;" commentid_block="cm.htmlcomment" commentid_line="none">

Or

<context symbols="LIST OF CHARACTERS" highlight="HIGHLIGHT-TYPE" id="IDENTIFIER" > 

A <context> tag should always define symbols. Symbols are those characters that may start or end an element.

The optional attribute highlight may specify a highlight type that is valid for the complete text region that has this context. Useful for 'comment' or 'string' type of contexts where the complete context is highlighted

The optional attributes commentid_block and commentid_line may specify how the comment toggle function should work in this context. The value should refer to the comment section in the properties.

What are symbols

Symbols are characters that may start or end a pattern. Try to highlight for example:

 char *rc_char(char*chara);
 ^^^^          ^^^^

Only two of the four 'char' need to be highlighted. How does the scanner know which one to highlight? In the above example there are several symbols such as whitespace , brackets and operators:

 char *rc_char(char*chara);
^    ^^       ^    ^     ^^

see that the occurences of 'char' that should be highlighted are all in between symbols?!

To detect function strlen in the following examples (language C):

i=strlen(a);
i+strlen(a);
i*strlen (a);

we need at least symbols =+*(

In most languages all whitespace is a symbol ( =space, &#9;=tab, &#10;=newline, &#13;=carriange return).

In xml/sgml/html only '<>&;' are symbols, but withtin a tag also " and ' are symbols.

Advanced use of context tags

The optional attribute id is used to define an identifier which can be used to re-use this context. To re-use a context, use

<context idref="IDENTIFIER" />

where IDENTIFIER refers to a previously defined context with an id. The file is parsed top to bottom, so previous must be earlier in the file.


Inside a context tag

Inside a context tag are usually the tags element, tag, group. For advanced usage it can have another context.

Advanced use of context tags

If there is a context inside another context, it must have it's id set. This context is defined but not yet used. It can be used if it is referred to with <context idref="" />

The element tag in the definition section

<element pattern="while" highlight="keyword"/>

<element> defines an element that is highlighted, or can be autocompleted, or an element that starts a new context

it always needs attribute 'pattern' which defines the pattern that will be looked for in this context

the pattern can be defined in 'regular expression' style, to do this add attribute is_regex="1". however, there is only limited regular expression support. you may use - a range of characters such as [a-z0-9;'] - an inverted range such as [^0-9] - operators such as ? (zero or one), + (one or more), and * (zero or more) - subpatterns such as ab(fg)?

<element pattern="'[^']*'" is_regex="1" highlight="string"/>

a pattern may be case insensitive, set case_insens="1"

to highlight the pattern use attribute highlight="TYPE", where TYPE should be defined within the <header> section of the language file

Element re-use

<element> may have attribute 'id' so this element may be referred to later. To re-use element 'foo' later in the file use

<element idref="foo" />

Block detection

Next is a block detection example

<element id="bracket{" pattern="{" starts_block="1" highlight="brackets" block_name="Bracket block" />
<element pattern="}" ends_block="1" blockstartelement="bracket{" highlight="brackets" />

an element may start or end a block. a block consists of two patterns (start and end) where the contents between the start and the end may be hidden when the block is 'folded'.

to make a pattern a block start define starts_block="1" and use the 'id' attribute

to specify a pattern that ends a block use ends_block="1" and use blockstartelement="FOO" where FOO is the id of the start-of-block-element

Because this block has a name ('Bracket block') it can be selected by the user in the expand/collapse popup menu. You can also create an option 'Bracket block_foldable' in the header options so the user may decide if this block may fold or not. If you don't need either the block_name can be left empty.

An element may start a new context

Next is an context example, a javascript comment

<element pattern="/*" highlight="c-style-comment">
	<context symbols="*/&#9;&#10;&#13;" highlight="c-style-comment">
		<element pattern="*/" highlight="c-style-comment" ends_context="1" />
	</context>
</element>

whenever this pattern is found the engine switches to this context and starts scanning only the patterns defined in this context. To do this define <context></context> between <element> and </element>. within this <context> there are entirely different patterns. There can be only 1 context within an element.

There is an end of the context too in most languages. To make the scanner switch back to the previous context an element INSIDE the inner context that has ends_context="NUM" where NUM specifies the number of contexts that are ended by this element. Because context may be nested there may be several contexts inside each other.

Basically context switches work like a stack. Lets take the example

i = 1;
/* text */
i = 1 + 1;

pattern '/*' exists in the initial context, but when it is found, the initial context is pushed on the context stack, and the scanner switches to a new context context (for c-style-comment). In this context there exists only a single pattern: '*/' The scanner now continues until it finds */, at this point it pops 1 context from the stack, and thus in this example it continues with the initial context

Next is a nested context example, inside a php comment, there may be the end of the php block. Note that this element has ends_context=2

<element pattern="<?php" highlight="php-block">
	<context symbols="?*/+-=*&amp;&lt;&gt;&#9;&#10;&#13;">
		<element pattern="?>" highlight="php-block" ends_context="1" />
		<element pattern="/*" highlight="c-style-comment">
			<context symbols="*/&#9;&#10;&#13;" highlight="c-style-comment">
				<element pattern="*/" highlight="c-style-comment" ends_context="1" />
				<element pattern="?>" highlight="php-block" ends_context="2" />
			</context>
		</element>
	</context>
</element>

Auto completion

an pattern may also be autocompletable. to enable this add

<autocomplete enable="1" />

Often it is convenient if not only the pattern itself can be completed but some common characters are appended. use append="STRING" to define any characters that will be autocompleted. The cursor position AFTER auto completion can be set back a couple of characters. This is defined by attribute backup_cursor.

<autocomplete append="() {" backup_cursor="3" />

A regular expession pattern may be autocompletable as well. but to autocomplete the pattern itself usually makes no sense because it matches various other patterns. use string="STRING" to autocomplete STRING in this context

<autocomplete string="import" />

Making auto completion more user configurable

Suppose you want to make auto-completion with or without semicolon configurable. Just add two <autocomplete /> entries, one with a class="" and the other with a notclass="" Then define an option in the header that can be enabled or disabled by the user.

<element pattern="abort"><reference>Aborting a Program</reference>
   <autocomplete append="();" class="autocompl_with_semicolon" backup_cursor="2" />
   <autocomplete append="()" notclass="autocompl_with_semicolon" backup_cursor="1" />
</element>

The tag tag in the definition section

next example shows a xml/sgml tag with attributes

<tag name="body" highlight="tag" attributes="style,class,id" attribhighlight="attribute" />

because there are many languages that use sgml/xml/html style patterns there is <tag> for convenience.

it should have attribute 'name' to specify the name of the tag

the attribute 'attributes' defines attributes that are valid for this tag

to highlight the tag use highlight="TYPE" where TYPE is the highlight type defined in the <header> section to highlight attributes use attrib_highlight="TYPE"

next example show the equivalent of the above <tag> but then with <element>. as you can see a single tag needs a lot of text. That's why this convenience <tag was created.

<element id="<body" pattern="<body" highlight="tag" starts_block="1">
	<context symbols="&gt;\&quot;=' &#9;&#10;&#13;" >
		<element pattern="style" highlight="attribute" />
		<element pattern="class" highlight="attribute" />
		<element pattern="id" highlight="attribute" />
		<element id="__internal_tag_string_d__" pattern="&quot;[^&quot;]*&quot;" is_regex="1" highlight="string" />
		<element id="__internal_tag_string_s__" pattern="'[^']*'" is_regex="1" highlight="string" />
		<element pattern="/>" ends_context="1" highlight="tag" />
	</context>
</element>
<element pattern="</body<" highlight="tag" ends_block="1" blockstartelement="<body" />

starting a new context

a <tag> may also start a new context just as <element> does

auto completion

next example shows autocompletion for tags

<tag name="img" attributes="style,class,id,src,width,height"
		autocomplete_append=">" attrib_autocomplete_append="=&quot;&quot;" attrib_autocomplete_backup_cursor="2"/>

a <tag> automatically autocompletes. it also has an 'attrib_autocomplete_append' atribute.

next example shows auto closing options for tags

<tag name="br" no_close="1" />

a <tag> will automaticaly suggest </tag> for autocompletion (if not disabled for the complete language file). some tags don't need a closing tag because they close themselves <tag />. use no_close="1" typical tags in html are for example br img hr input

next example shows how to enable SGML short tags. This suggests to the autocompletion that this tag is not closed and also does not end on '/>' (thus no proper xml syntax). instead of suggesting <br /> it will suggest <br>

<tag name="img" sgml_shorttag="1" />

in XML or XHTML a tag always needs to be closed, either <img /> or <img></img> in SGML <img> is also allowed. set sgml_shorttag="1" to enable this

The group tag in the definition section

often there are many elements that need the same attribute such as highlight or autocomplete

to make this easier you can group these elements inside <group>.

<group highlight="keyword" >
	<autocomplete enable="1" />
	<element pattern="for"/>
	<element pattern="while"/>
</group>

supported atributes are:

  • highlight
  • autocomplete
  • autocomplete_append
  • class
  • case_insens
  • is_regex

groups for tags

also many <tag> entries can have the same attributes, so these can also be grouped inside <group>

<group  attribhighlight="attribute" highlight="tag" attrib_autocomplete_append="="""  >
	<autocomplete append=">" />
	<tag name="p" attributes="style,id,width"/>
	<tag name="div" attributes="style,id" />
</group>

supported attributes are: - highlight - attribhighlight - attrib_autocomplete_append - class

Autocomplete options in groups

Suppose you have a lot of <element /> tags where you want to make auto completion configurable. You do not need to add autocomplete tags to each and every <element />, you can add them to a group:

<group highlight="libc-function" >
        <autocomplete append="();" class="autocompl_with_semicolon" backup_cursor="2" />
        <autocomplete append="()" notclass="autocompl_with_semicolon" backup_cursor="1" />
        <element pattern="a64l"><reference>Encode Binary Data</reference></element>
        <element pattern="abort"><reference>Aborting a Program</reference></element>
..... etc.


groups that can be disabled/enabled with an option

a special usage of <group is to allow the user to disable/enable a section of the file. if the <header> section has <option name="allphpfunctions" default="1" description="All php functions" /> we can put this option into effect like this:

<group class="allphpfunctions">
	<element pattern="mysql_query" />
	<element pattern="mysql_fetch_row" />
	<element pattern="mysql_fetch_array" />
</group>

the reverse is also supported, using the notclass attribute, this can be used to make a option that disables one section but enables a different section

<group notclass="mysetting">
	<element pattern="foo" />
</group>
<group class="mysetting">
	<element pattern="bar" />
</group>

Advanced group class/notclass values

Some parts of language definition files can be included in different languages. That is why there are some special options defined. The option is_LANG is always defined, where LANG is the name of the current language.

For example, CSS is included in HTML and in PHP. But in CSS-in-HTML the pattern <?php should do something different than in CSS-in-HTML-in-PHP. The following example does just that:

<group class="is_PHP">
   <element idref="e.php.short.open" />
</group>

Advanced option: conditional execution

Since Bluefish 2.2.7 patterns can be made "conditional", they will be still compiled in the DFA engine, but their actions (starting a context, highlighting, starting a block) are depending on a certain condition, such as if they are in a certain context or not.

condition_mode=""

  • 1 = valid if relation with context matches,
  • 2 = invalid if relation with context matches,
  • 3 = valid if relation with block matches
  • 4 = invalid if relation with block matches

condition_relation=""

  • -1 means any parent
  • 0 = direct parent
  • 1= grandparent
  • etc.

condition_contextref=""

  • refers to the id of a context or block-starting-element


An example is used in the CSS highlighting include file. If it is included in a CSS file this pattern is not executed, but if this is included in a HTML HEAD STYLE section, this pattern will be executed:

<element id="end-style-tag" pattern="</style>" highlight="
html-tag" ends_context="3" 
condition_mode="1" condition_relation="2" condition_contextref="c.html.css.main"/>


Deep understanding of the Bluefish syntax scanning

You do not need to understand this to change or write a language file, this is provided for deeper understanding of the internals

Scanning with a DFA table

Lets use a very simple language file:

<context symbols=" ;(){}[]:\"\\',&gt;&lt;*&amp;^%!+=-|/?#&#9;&#10;&#13;." dump_dfa_chars="()*;char" dump_dfa_run="1">
 <element pattern="(" id="lparen" starts_block="1" highlight="brackets" />
 <element pattern=")" highlight="brackets" ends_block="1" blockstartelement="lparen" />
 <element pattern="char" highlight="function" />
</context>

Bluefish compiles each context into a DFA table. Because we use the attribute dump_dfa_chars Bluefish will show the DFA table for these characters:

***************** print subset of DFA table for context 1
        '('  ')'  '*'  ';'  'c'  'h'  'a'  'r' : match
   0:    2    3    0    0    4    1    1    1  :    0 	this is the startstate
   1:    0    0    0    0    1    1    1    1  :    0 	this is the identstate
   2:    0    0    0    0    0    0    0    0  :    1 (
   3:    0    0    0    0    0    0    0    0  :    2 )
   4:    0    0    0    0    1    5    1    1  :    0
   5:    0    0    0    0    1    1    6    1  :    0
   6:    0    0    0    0    1    1    1    7  :    0
   7:    0    0    0    0    1    1    1    1  :    3 char
*****************

Lets scan the following text with this table:

char *rc_char(char*chara);

Because we used the attribute dump_dfa_run we get to see how the scanner walks trough this table:

context 1: '  ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: '  ' in    7 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (char)
context 1: '  ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '* ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'r ' in    0 makes    1 --> nothing matches, go to identstate
context 1: 'c ' in    1 makes    1 .....identstate
context 1: '_ ' in    1 makes    1 .....identstate
context 1: 'c ' in    1 makes    1 .....identstate
context 1: 'h ' in    1 makes    1 .....identstate
context 1: 'a ' in    1 makes    1 .....identstate
context 1: 'r ' in    1 makes    1 .....identstate
context 1: '( ' in    1 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '( ' in    0 makes    2
context 1: 'c ' in    2 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (()
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: '* ' in    7 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match (char)
context 1: '* ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: 'c ' in    0 makes    4
context 1: 'h ' in    4 makes    5
context 1: 'a ' in    5 makes    6
context 1: 'r ' in    6 makes    7
context 1: 'a ' in    7 makes    1 --> nothing matches, go to identstate
context 1: ') ' in    1 makes    0 --> restart scanning (found a symbol, no match?)
context 1: ') ' in    0 makes    3
context 1: '; ' in    3 makes    0 --> a symbol or the pattern ends on a symbol, the previous was a match ())
context 1: '; ' in    0 makes    0 --> restart scanning (found a symbol, no match?)
context 1: '\0' in    0 makes    0 --> restart scanning (found a symbol, no match?)

Code documentation

If you want a deep understanding how the syntax scanner works, please read the documentation included in the Bluefish source code

  • src/bftextview2.h the scanner overall design is described, and some of the types are defined
  • src/bftextview2_private.h most internal types are described
  • src/bftextview2.c has the code for the widget which invokes the scanner, the spell checker
  • src/bftextview2_langmgr.c has the code for the parsing of the language file, which invokes the DFA compiler
  • src/bftextview2_patcompile.c has the code that compiles the DFA table
  • src/bftextview2_scanner.c has the code for the scanner and it's cache
  • src/bftextview2_autocomp.c has the auto-completion code
  • src/bftextview2_markregion.c has the code to keep track of which part of the document has changes and needs rescanning
  • src/bftextview2_spell.c has the code to do context-sensitive spell checking
  • src/bftextview2_identifier.c has the code to keep track of identifiers (for example names of user defined variables) so you can jump to them, or autocomplete them.