2
votes

I am trying to filter undesired messages in ejabberd. I took some direction from this post. Here is the function snippet which gets executed via filter_packet hook:

check_stanza({_From, _To, #xmlel{name = StanzaType}} = Input) ->
    AccessRule = case StanzaType of
             <<"message">> ->
           ?DEBUG("filtering packet...~nFrom: ~p~nTo: ~p~nPacket: ~p~nResult: ",
             [_From, _To, Input]),
           Input
           %check_stanza_type(AccessRule, Input)
         end.

The packet printed in log :

{{jid,<<"test25">>,<<"localhost">>,<<"Administrators-MacBook-Pro-6">>,
<<"test25">>,<<"localhost">>,<<"Administrators-MacBook-Pro-6">>},{jid,
<<"test24">>,<<"localhost">>,<<"Administrators-MacBook-Pro-6">>,
<<"test24">>,<<"localhost">>,<<"Administrators-MacBook-Pro-6">>},{xmlel,
<<"message">>,[{<<"type">>,<<"chat">>},{<<"id">>,<<"purpleaed2ec77">>},
{<<"to">>,<<"test24@localhost/Administrators-MacBook-Pro-6">>}],[{xmlel,
<<"active">>,[{<<"xmlns">>,<<"http://jabber.org/protocol/chatstates">>}],
[]},{xmlel,<<"body">>,[],[{xmlcdata,<<"MESSAGE BODY GOES HERE">>}]}]}}

My requirement : extract message's body and filter out abusive words. For example, if a user is sending "message body goes here", following sequence should occur:

  • Packet by sender is intercepted by the module, via hook (done)
  • Message of body is extracted and words are run through a set of data for filtering. the data can be in Mnesia or MySQL (pending)
  • Altered packet (filtered body) is let through to receiver client

the receiver will receive "message body goes ****" if "here" was an undesired word.

I am new to Erlang and its a small community with few good articles out there, so need some advice on best way to achieve the above. There is a great post on how to do the same with elixir support , but I want to stick to Erlang. Any help would be appreciated.

UPDATE

thanks Amiramix. Here is the code to replace a particular word:

{xmlel,Syntax,Type,OuterBody} = Xmlel.   


case Syntax ->
    "<<message>>" ,
        XmlelBody = lists:keyfind(<<"body">>, 2, OuterBody),  %{xmlel,<<"body">>,[],[{xmlcdata,<<"HI">>}]}
        {xmlel,BodySyntax,_,Innerbody} = XmlelBody,      % [{xmlcdata,<<"HI">>}]    
        Body = proplists:get_value(xmlcdata, Innerbody),   %<<"HI">>


        TmpList = re:replace(Body,<<"HI$">>,<<"**">>),
        NewBody = binary:list_to_bin(TmpList),      %<<"**">>
        NewInnerBody = lists:keyreplace(xmlcdata, 1, Innerbody, {xmlcdata, NewBody}).   %[{xmlcdata,<<"**">>}]
        NewXmlelBody = setelement(4,XmlelBody,NewInnerBody),   %{xmlel,<<"body">>,[],[{xmlcdata,<<"**">>}]}


        NewOuterBody  = lists:keyreplace(<<"body">>, 2, OuterBody, NewXmlelBody),
        NewXmlel = setelement(4, Xmlel, NewOuterBody)

Since it would be difficult to keep iterating each word in body over many blocked words, I want to send the extracted body to a python script which does this for me. Any suggestion on how to extract MESSAGE BODY GOES HERE from <<"MESSAGE BODY GOES HERE">> ?

1

1 Answers

3
votes

The log doesn't match the code, i.e. there is no "filtering packet..." in the output, so I can't give you an exact code to put to the check_stanza function. And also I don't know much about ejabberd to validate. However, I would like to give you some guidance how to deal with such structures in Erlang so that you can more easily do what you want yourself.

First of all re-format the structure so that it's clear how the data is nested:

{
  {jid,
   <<"test25">>,
   <<"localhost">>,
   <<"Administrators-MacBook-Pro-6">>,
   <<"test25">>,
   <<"localhost">>,
   <<"Administrators-MacBook-Pro-6">>
  },
  {jid,
   <<"test24">>,
   <<"localhost">>,
   <<"Administrators-MacBook-Pro-6">>,
   <<"test24">>,
   <<"localhost">>,<<"Administrators-MacBook-Pro-6">>
  },
  {xmlel, <<"message">>,
   [
    {<<"type">>, <<"chat">>},
    {<<"id">>, <<"purpleaed2ec77">>},
    {<<"to">>, <<"test24@localhost/Administrators-MacBook-Pro-6">>}
   ],
   [
    {xmlel, <<"active">>,
     [{<<"xmlns">>, <<"http://jabber.org/protocol/chatstates">>}], []
    },
    {xmlel, <<"body">>, [],
     [{xmlcdata, <<"MESSAGE BODY GOES HERE">>}]
    }
   ]
  }
}.

You have one external tuple with three tuples inside:

{ {jid, ...}, {jid, ...}, {xmlel, ...} }.

It doesn't look quite right that the outer data is a tuple, I would expect it to be a list, e.g.:

[ {jid, ...}, {jid, ...}, {xmlel, ...} ].

But maybe that's how it is, but please make sure that you are logging it correctly.

To modify the body you would need to do the following steps:

  1. Extract xmlcdata that contains the body
  2. Modify the body
  3. Store it back in the main structure

Before proceeding please copy the whole structure into Erlang shell and store it as a variable, so that you can follow in your own shell. Don't forget to add the variable name at the beginning and '.' at the end:

Erlang/OTP 18 [erts-7.2.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.2.1  (abort with ^G)
1> M =
1> {
1>   {jid,
1>    <<"test25">>,
1>    <<"localhost">>,
1>    <<"Administrators-MacBook-Pro-6">>,
1>    <<"test25">>,
1>    <<"localhost">>,
1>    <<"Administrators-MacBook-Pro-6">>
1>   },
1>   {jid,
1>    <<"test24">>,
1>    <<"localhost">>,
1>    <<"Administrators-MacBook-Pro-6">>,
1>    <<"test24">>,
1>    <<"localhost">>,<<"Administrators-MacBook-Pro-6">>
1>   },
1>   {xmlel, <<"message">>,
1>    [
1>     {<<"type">>, <<"chat">>},
1>     {<<"id">>, <<"purpleaed2ec77">>},
1>     {<<"to">>, <<"test24@localhost/Administrators-MacBook-Pro-6">>}
1>    ],
1>    [
1>     {xmlel, <<"active">>,
1>      [{<<"xmlns">>, <<"http://jabber.org/protocol/chatstates">>}], []
1>     },
1>     {xmlel, <<"body">>, [],
1>      [{xmlcdata, <<"MESSAGE BODY GOES HERE">>}]
1>     }
1>    ]
1>   }
1> }.
{{jid,<<"test25">>,<<"localhost">>,
      <<"Administrators-MacBook-Pro-6">>,<<"test25">>,
      <<"localhost">>,<<"Administrators-MacBook-Pro-6">>},
 {jid,<<"test24">>,<<"localhost">>,
      <<"Administrators-MacBook-Pro-6">>,<<"test24">>,
      <<"localhost">>,<<"Administrators-MacBook-Pro-6">>},
 {xmlel,<<"message">>,
        [{<<"type">>,<<"chat">>},
         {<<"id">>,<<"purpleaed2ec77">>},
         {<<"to">>,
          <<"test24@localhost/Administrators-MacBook-Pro-6">>}],
        [{xmlel,<<"active">>,
                [{<<"xmlns">>,<<"http://jabber.org/protocol/chatstates">>}],
                []},
         {xmlel,<<"body">>,[],
                [{xmlcdata,<<"MESSAGE BODY GOES HERE">>}]}]}}
2>

Now just typing M. in the shell will print out the whole structure (cut for brevity):

2> M.
{{jid,<<"test25">>,<<"localhost">>,
      <<"Administrators-MacBook-Pro-6">>,<<"test25">>,
(...)
         {xmlel,<<"body">>,[],
                [{xmlcdata,<<"MESSAGE BODY GOES HERE">>}]}]}}

If the data is indeed a tuple you can get the last sub-tuple with this code:

3> {_, _, Xmlel} = M.

Again, typing just Xmlel. in the shell will print out the content of that variable ('_' means don'tcare or an anonymous variable). Now to extract the last list, xmlel itself is a tuple:

4> {xmlel, _, _, L} = Xmlel.

<<"message">> is matched to the first '_' and then the first list to the second '_'. The second list is then bound to L:

6> L.
[{xmlel,<<"active">>,
        [{<<"xmlns">>,<<"http://jabber.org/protocol/chatstates">>}],
        []},
 {xmlel,<<"body">>,[],
        [{xmlcdata,<<"MESSAGE BODY GOES HERE">>}]}]

You want the tuple that contains the <<"body">> value, e.g.:

7> T = lists:keyfind(<<"body">>, 2, L).
{xmlel,<<"body">>,[], [{xmlcdata,<<"MESSAGE BODY GOES HERE">>}]}

Please check lists:keyfind/3 documentation for information about arguments to that function. And check Erlang documentation for particular modules if you need explanations what those functions do.

Finally, we want the list that contains the body element:

8> {xmlel, _, _, BL} = T.

The bound BL is a proplist, to get the body simply:

16> Body = proplists:get_value(xmlcdata, BL).
<<"MESSAGE BODY GOES HERE">>

Lets replace the string and reconstruct the structure:

21> TmpList = re:replace(Body, <<"HERE$">>, <<"*****">>).
[<<"MESSAGE BODY GOES ">>,<<"*****">>]

23> binary:list_to_bin(TmpList).
<<"MESSAGE BODY GOES *****">>

24> NewBody = binary:list_to_bin(TmpList).
<<"MESSAGE BODY GOES *****">>

Now the new body is the NewBody variable. We replace tuples in a list with lists:keyreplace/4:

28> NewBL = lists:keyreplace(xmlcdata, 1, BL, {xmlcdata, NewBody}).
[{xmlcdata,<<"MESSAGE BODY GOES *****">>}]

And we replace elements in a tuple with setelement/3:

31> NewT = setelement(4, T, NewBL).
{xmlel,<<"body">>,[], [{xmlcdata,<<"MESSAGE BODY GOES *****">>}]}

To be fair, the tuple {xmlel, <<"body">>, [], List} is probably an Erlang record xmlel, and if you knew the definition of of that record you could replace it in a more semantically correct way, like:

32> NewT = T#xmlel{body = NewBody}

If that's indeed a record then its definition must be in one of the Erlang .hrl files available somewhere in the ejabberd code ready for you to include it in your code and use. If the definition of that record changes you can only recompile your code and it should still work. With setelement there is the risk that if the size of the tuple changes the code stops working, so please keep that in mind. I will continue using setelement as that is simpler for me at this moment (record definitions need to imported to shell with rr before they can be used).

Now three operations are left: replacing the <<"body">> tuple in the main list L, then replacing L in the <<"message">> tuple, and finally replacing that tuple in the main structure:

35> NewL = lists:keyreplace(<<"body">>, 2, L, NewT).
[{xmlel,<<"active">>,
        [{<<"xmlns">>,<<"http://jabber.org/protocol/chatstates">>}],
        []},
 {xmlel,<<"body">>,[],
        [{xmlcdata,<<"MESSAGE BODY GOES *****">>}]}]

41> NewXmlel = setelement(4, Xmlel, NewL).
{xmlel,<<"message">>,
       [{<<"type">>,<<"chat">>},
        {<<"id">>,<<"purpleaed2ec77">>},
        {<<"to">>,
         <<"test24@localhost/Administrators-MacBook-Pro-6">>}],
       [{xmlel,<<"active">>,
               [{<<"xmlns">>,<<"http://jabber.org/protocol/chatstates">>}],
               []},
        {xmlel,<<"body">>,[],
               [{xmlcdata,<<"MESSAGE BODY GOES *****">>}]}]}

42> NewM = setelement(3, M, NewXmlel).
{{jid,<<"test25">>,<<"localhost">>,
      <<"Administrators-MacBook-Pro-6">>,<<"test25">>,
      <<"localhost">>,<<"Administrators-MacBook-Pro-6">>},
 {jid,<<"test24">>,<<"localhost">>,
      <<"Administrators-MacBook-Pro-6">>,<<"test24">>,
      <<"localhost">>,<<"Administrators-MacBook-Pro-6">>},
 {xmlel,<<"message">>,
        [{<<"type">>,<<"chat">>},
         {<<"id">>,<<"purpleaed2ec77">>},
         {<<"to">>,
          <<"test24@localhost/Administrators-MacBook-Pro-6">>}],
        [{xmlel,<<"active">>,
                [{<<"xmlns">>,<<"http://jabber.org/protocol/chatstates">>}],
                []},
         {xmlel,<<"body">>,[],
                [{xmlcdata,<<"MESSAGE BODY GOES *****">>}]}]}}

Now NewM contains the same message as M but with the body replaced as required.

This was fairly long because I coded each step separately for clarity. In reality, when using that in your code, you will be able to make these steps shorter, especially if you can include and use the appropriate record definitions.