|
About Z/Yen
History
Clients
People
Testimonials
Z/EALOUS
FAQ
Services
Strategy
Intelligence
Systems
SpecialiZm
Change management
Outsourcing
Sectors
Financial services
Technology
Not-for-Profit
Commercial
Public sector
Activities
GFCI
The London Accord
Events
On-line surveys
c
Products
Compliance workstation
Benchmarking
Investment games
PPRISM
PropheZy
VizZy
Knowledge
Books
Articles
Research &
Presentations
Press
Now & Z/Yen
Media coverage
Press releases
Z/Yen imagery
Fun
Photo gallery
Caption competition
Humour
ExtZy
Links
Reading list
Risk/Reward surfing

© The Z/Yen Group of Companies 2008
| |
Best Execution Compliance Automation
Towards An Equities Compliance Workstation
Michael Mainelli and Mark Yeandle
for submission to the Journal of Risk Finance
|
Feasibility of Best Execution
Compliance Automation Using Dynamic Anomaly and Pattern
Response Systems
Abstract
Purpose – Forthcoming requirements in MiFID and RegNMS mean that
buy-side and sell-side firms need to find ways of showing regulators
that they are sifting through their trading volumes
in a justifiable, methodical manner looking for anomalous trades and
investigating them, in
order to prove ‘best execution’. The objective was to see if a SVM/DAPR
approach could help
identify equity trade anomalies for compliance investigation.
Design/methodology/approach – A major stock exchange, a computer
systems supplier, four
brokers and a statistical firm undertook a co-operative research project
to determine
whether automated statistical processing of trade and order information
could provide a
tighter focus on the most likely trades for best execution compliance
investigation.
Findings – The support vector machine approach worked on UK
equities and has significant
potential for other markets such as foreign exchange, fixed income and
commodities.
Research limitations/implications – The research has implications
for risk professionals as
a generic approach to trading anomaly detection. The prototype
compliance workstation can
be trialled.
Originality/value – Automated anomaly detection could transform
the role of compliance and
risk in financial institutions.
Keywords - best execution, MiFID, RegNMS, compliance, support
vector machine (SVM), equities
trading, Dynamic Anomaly and Pattern Response (DAPR), predictive
systems, market
surveillance
Paper type - Research Paper |
Research Project Objective
In 2005 a joint research project was agreed between the London Stock Exchange,
Sun Microsystems and Z/Yen Limited. The primary objective was to investigate the
feasibility of using support vector machine & dynamic anomaly and pattern
response (SVM/DAPR) techniques to automate the detection of best execution anomalies for management investigation
(see
Best Execution Compliance, New Techniques
for Managing Compliance Risk). To meet this objective, the research would also compare the
results of the SVM/DAPR technique with current techniques such as VWAP and comparisons with the
current best price. The research would also evaluate how useful SVM/DAPR techniques are
in providing a tighter set of trades for further investigation.
Some of the questions that the research attempted to answer are:
-
How large is the universe of anomalous trades?
-
Using SVM/DAPR techniques, how many trades actually warrant investigation and
what proportion of the universe do these represent?
-
What do firms do now to monitor the execution quality?
-
Could a SVM/DAPR approach provide a benchmark for measuring best execution
better than VWAP
comparisons with best price?
The “null hypothesis” was effectively that automated sifting and selection will
be unable to
identify potentially anomalous trades any better, if at all, than existing
processes.
Approach
Research Team
Research team roles were allocated to:
-
London Stock Exchange, who provided Execution Quality Service and tick data, as
well as help in recruiting members as participants;
-
Sun Microsystems, who provided direct financial backing and further services and
equipment, particularly Sun Solaris equipment for large volume SVM/DAPR;
-
Z/Yen, who provided PropheZy and VizZy, as well as project management and
researchers to build the SVM/DAPR systems and conduct the data trials. A Z/Yen director,
Michael Mainelli, was the principal point of contact for the research project and Mark Yeandle was
the Project Manager.
Overall Approach
Following an informal trial in 2004, this research
project was conducted from June to December 2005 after the team recruited four brokers
prepared to examine off-book equity trades. The research began by surveying participants’
existing approaches to best execution compliance. The team then built SVM/DAPR systems
for each broker taking three months of trading data for training in order to test a
fourth month of data for anomalous trades. The research team sent back to each broker a set of
trades that the SVM/DAPR system identified as worth being investigated. The key criterion
for investigation was that the trade price seemed anomalous. The research team then
worked on contrasting the SVM/DAPR output with alternative methods of identifying
outliers. These alternatives included:
- VWAP;
- best price at the time of the trade;
- cluster analysis.
Methodology
The research followed an overall methodology as follows, though many tasks were
conducted in parallel:
-
review of previous relevant research;
-
recruit four participants;
-
outline the data requirements;
-
assess current best execution monitoring practices;
-
collect data from the brokers and the LSE;
-
analyse, validate and & pre-process the data;
-
build trial SVM/DAPR systems for various parameters to ensure historic data has
predictive capacity;
-
select approach to anomaly identification and build full SVM/DAPR systems;
-
develop visualisations of a possible compliance workstation;
-
feedback data and visualisations to the participants;
-
collect feedback from the participants;
-
prepare final research document for publication.
Participants
The four brokers who participated in the research provided:
-
staff time to explain current best execution compliance approaches, factors,
coefficients and benchmarks;
-
a dataset of four months’ off-book and on-book equities trading data (September
through December 2004);
-
staff time to investigate a sample of trades that the SVM/DAPR approach found
anomalous.
|
The research team agreed to keep the identities of the participants
confidential. Their profiles are:
-
Broker A is an independent UK-based firm offering stockbroking services, asset
management and financial planning advice to members of the public;
-
Broker B is a USA-based global investment bank serving corporations,
institutions, governments and high-net-worth investors worldwide;
-
Broker C is an Asian-based financial services group engaged in four main
business areas - Global Markets, Investment Banking, Merchant Banking and Asset Management;
-
Broker D is a European-based global investment bank and asset management
business dealing with individuals, corporations, the public sector and other not-for-profit
organisations.
In return for participating in this research the brokers received:
-
research results including working documents not subject to confidentiality;
-
a presentation of the approach and results for their business;
-
an example of a SVM/DAPR system on their data and sample visualisation of their
data in a prototype compliance workstation.
|
PropheZy, A Support Vector Machine/Dynamic Anomaly and Pattern Response (SVM/DAPR)
Implementation
This study used classification and prediction tools based on SVM mathematics to
undertake predictive analysis of the data. SVMs are algorithms that develop classification
and regression rules from data. SVMs result from classification algorithms first
proposed by Vladimir Vapnik in the 1960’s, arising from his work in Statistical Learning
Theory [Vapnik, 1995, 1998]. SVMs are based on some wonderfully direct mathematical ideas about
data classification and provide a clear direction for machine learning
implementations. While some of the ideas behind SVMs date back to the 1960’s, computer implementations
of SVMs did not arise until the 1990’s with the introduction of a computer-based approach at
COLT-92 [Boser, B., Guyon, I. and Vapnik, V., 1992]. |
SVMs are now used as core components in many applications where computers
classify instances of data (e.g. to which defined set does this group of variables belong), perform
regression estimation and identify anomalies (novelty detection). SVMs have been
successfully applied in time series analysis, reconstructing chaotic systems and principal component
analysis.
SVM applications are diverse, including credit scoring (good or bad credit),
disease classification, handwriting recognition, image classification, bioinformatics
and database marketing, to name a few.
SVMs are said to be independent of the dimensionality of feature space as the
main idea behind their classification technique is to separate the classes in many data
dimensions with surfaces (hyperplanes) that maximise the margins between them, applying the
structural risk minimisation principle. The data points needed to describe the
classification algorithmically are primarily those closest to the hyperplane boundaries, the
“support vectors”. Thus, only a small number of points are required in many complex
feature spaces. SVMs can work well with small datasets, though the structure of the training and
test data is an important determinant of the effectiveness of the SVM in any specific
application.
SVMs compete forcefully with neural networks as well as other machine learning
and data mining algorithms as tools for solving pattern recognition problems. Where SVMs
do not perform well it is arguable that the algorithmic rules behind the support vector
algorithm do not so much reflect incapabilities of the learning machine (as in the case of
an overfitted artificial neural network) as much as irregularities of the data. In
short, current opinion holds that if the data in the domain is predictive, SVMs are
highly likely to be capable of producing a predictive algorithm. Importantly, SVMs are robust
tools (understandable implementations, simple algorithmic validation, better
classification rates, overfitting avoidance, fewer false positives and faster performance) in
practical applications. “The SVM does not fall into the class of ‘just another algorithm’
as it is based on firm statistical and mathematical foundations concerning generalisation
and optimisation theory” [Burbridge & Buxton, 2001]. However, comparative tests with
other techniques indicate that while SVMs are highly likely to be capable of
predicting, in some applications SVMs may not be the best approach for any specific dataset. “In
short, our results confirm the potential of SVMs to yield good results, especially for
classification, but their overall superiority cannot be attested” [Meyer, Leisch, Hornik, 2002].
PropheZy and VizZy are two software packages developed by Z/Yen for
classification and visualisation of data [www.zyen.com/Products/Prophezy/prophezy.htm,
www.zyen.com/Products/Vizzy/vizzy.htm]. Together they constitute a SVM/DAPR
environment. PropheZy implements a SVM on a server (though it can be used in a local
client/server mode). Naturally, as in any field of computing, there are a number of variant SVM
implementations, of which PropheZy implements three types - C-SVC, nu-SVC and binary. Further, of
statistical importance in replicating results is the “kernel function”. PropheZy
implements four types of kernel function - linear, radial basis function, sigmoid and
polynomial. The SVM types and kernel function types are described in detail in Vapnik [1995,
1998]. For this study, the SVM implementation used was C-SVC and the kernel function was
linear.
So far, the PropheZy server SVM has been implemented on a Linux server, a Sun
Solaris server and a Windows NT server. PropheZy implements the user-interface to the server
SVM via XML (extensible mark-up language). The XML user-interface can be via an HTML page,
directly through a bulk file loader command line or by use of an Excel add-in that
performs XML data submission from spreadsheets to the server SVM and displays results back in
Excel spreadsheets. For this study, the PropheZy implementation was the Sun Solaris
server using a bulk file loader command line. VizZy provided visualisation of clustering,
histogram, Voronoi and data cube diagrams from tabular data output by PropheZy.
Z/Yen has benchmarked PropheZy against standardised machine learning tests, e.g.
appropriate StatLog test sets [Michie, Spiegelhalter and Taylor, 1994], in order
to validate the SVM with good to excellent results. Z/Yen has trialled PropheZy extensively
in financial services applications and sees great promise for SVMs and other
Dynamic Anomaly and Pattern Response (DAPR) techniques in areas such as compliance [Mainelli,
2005], trade anomaly detection and scorecards [Mainelli, 2004] as well as regression and
value prediction [Mainelli, Harris and Helmore-Simpson, 2003].
|
Data
For the purposes of this research, a discrete, easily quantifiable and readily
obtainable set of data was required. SETS - The London Stock Exchange’s trading service for
UK blue chip securities is an electronic order book that can execute hundreds of trades
a second. Securities traded on SETS include all the FTSE 100 securities, the most liquid
FTSE 250 securities along with some others. The prices quoted on SETS are the best
available at the time and as such trades fulfilled ‘on-book’ might be viewed as representing best
execution. However, SETS-traded equities are also regularly traded outside the electronic
order book (‘off-book’ trades). |
The research team decided that they should look at off-book trades of SETS
securities as this was the area requiring proof of best execution. A large number of trades
are conducted off-book and it is often these trades that attract attention from compliance
departments. There are a substantial number of these trades that occur outside the current
bid/offer spread. This is usually for a very good reason – typically the size of the trade
(in relation to the usual size of trades in that security) or specific client
instructions regarding timing of execution.
The total number of trades included within the research was approximately
190,000 with a value of over £54bn. These were from four brokers and covered their off-book
trades from September 2004 to December 2004 inclusive. It should be noted that although
exchange traded securities were selected as an ideal dataset for this research, the information
used in building the model was only that which would be available for non-exchange
traded instruments.
|
Data Preparation & Validation
Each of the four participants provided large text files containing details of
their off-book trades for the four months from September 2004 to December 2004 inclusive. The
first thing that the research team did was to validate this data with the data provided by
LSE. The team also studied the movement of the market in general over the period [Diagram
1] in order to establish overall volume and price patterns. |
Diagram 1 – Movement of the All Share Index
September to December 2004

From the data supplied by all parties, the team generated files of
transactions with the following fields (* indicates that the data was codified into numeric values
from text):
-
SEDOL code (share code, e.g. 0316893 = Eurotunnel);*
-
market segment of share (e.g. pharmaceutical, retail);*
-
trade date;
-
trade time;
-
trade size;
-
trade price;
-
trade code;
-
buy/sell indicator (buy=1, sell=0);*
-
participant code buyer (counterparty);*
-
participant code seller (counterparty); *
-
settlement due date.
The following fields were then added:
-
market sector (e.g. FTSE 100, FTSE 250 - downloaded from LSE website);*
-
day of the week (1 to 5 - derived from date);
-
closing price for the previous 10 days;
-
inside (0) or outside (1) the bid/offer spread;
-
consideration (a calculated field - price multiplied by the quantity of shares);
-
5 day % price movement (calculated fields);
-
bid price (from LSE files – linked using SEDOL Code);
-
mid price (from LSE files – linked using SEDOL Code);
-
offer price (from LSE files – linked using SEDOL Code);
-
VWAP movement (calculated previous trades executed by each firm within the
previous 5 days);
-
F.T.S.E. Index movement (of the relevant index for the share) since the last
trade;
-
price volatility - the standard deviation of the closing prices of the previous
12 days (12 days based on discussions with market participants and previous research);
-
% size of bid/offer spread;
-
liquidity (average number of shares traded per day, over the 88 trading days
between September and December);
-
return versus mid price (trade price – mid price) / mid price;
-
return versus previous closing price (trade price – previous closing price) /
previous closing price;
-
% of liquidity of the trade (trade size / liquidity);
-
3 day index movement for the index in which the share is listed;
-
number of trades in a day- for each share and each day;
-
time since last trade- the number of 10 minute slots since the previous trade of
that share by that broker;
-
day of the week of the last trade (1 to 5) - when the share was last traded by
that broker.
|
Compliance Workstation Prototype
During the course of the project the team built a prototype “Compliance
Workstation”. The Compliance Workstation combined a number of software tools, PropheZy, VizZy,
FractalEdge and Decisionality within an Excel framework. The Compliance Workstation provided a
number of features, specifically the ability to:
- construct predictive tests on any trade characteristic in order to spot
anomalies;
- spot anomalies using cluster analysis;
-
display the results visually, specifically showing predicted versus actual
differences in three dimensions.
Diagram 2 shows a visualisation with the predicted price
movement bands in blue plotted against the actual price movement band in purple. The length of
the yellow link shows the difference between the actual and predicted values.
|
Diagram 2 – A 3-dimensional VizZy presentation
of anomalous trades

-
provide a ‘drill down’ tool for a compliance officer to home in on specific
trades. This tool, Fractal Intelligence, allows the user to drill down through a variety of
hierarchies. Diagram 3 shows a set of sell trades that fell outside the bid/offer spread and
is arranged in 10 circles, starting clockwise from the top showing trades with increasing
differences between actual and predicted price movement bands.
Diagram 3 - A ‘drill-down’ and data
visualisation tool showing anomalous trades

|
Initial Tests of Predictive Capability
The research team initially conducted a range of tests on the data to assess the
predictive capacity of the data. Many of the heuristics used in the construction of these
tests were informed by previous studies of share price prediction [Cao & Tay, Huang,
Nakamori & Wang], as well as previous Z/Yen client work on share liquidity and other analyses. |
The first set of tests used the SVM to predict the counterparty with which each
transaction was conducted. This was conducted on two datasets. (1) In the first dataset
there were 52 different counterparties and the SVM was able to correctly predict the
counterparty in over 43% of the trades tested. (2) In the second dataset there were 11 different
counterparties and over 61% were correctly predicted.
The second set of tests used the SVM to predict the share that was traded in
each transaction. This was again conducted on two datasets. (3) In the first dataset
there were 150 different shares and the SVM was able to correctly predict the share in
nearly 8% of the trades tested. (4) In the second dataset there were 82 different shares and over
18% were correctly predicted.
One question was whether the predictions made by the SVM were different to those
that could have been made by a simpler method, a “naïve classifier”. The team compared the
SVM output with a random classification based on the observed probabilities of each class.
This was done by conducting a statistical significance test based on Liddell’s exact test
for paired proportions [Liddell, 1983], which examines the difference between the two
alternatives by looking at where each method was correct when the other was incorrect. The tests
show that for Broker’s A, B and D, there is significant evidence that the predictions made
by the SVM are much better than those that would be made by a naïve classifier. The
evidence for Broker C (a much smaller dataset) was weaker, but nevertheless supported the
performance of the SVM.
A third set of tests were run were to see whether the SVM could predict how much
the VWAP of a share would move. For this set of tests, VWAP movement was banded into twenty
equal bands across the range of VWAP, between -0.3% and +0.3% movement, with each band
having a size of 0.03%. Tests conducted by building a SVM with three months’ data were
significantly more accurate than those built with only one month’s data. Increasing the dataset to
four months gave no significant increase in accuracy. (5-8) The team therefore ran all
datasets using a three month history to train the SVM. The average accuracy over all four brokers
was over 25%. Over 52% of predictions were within 4 bands.
The conclusion was that the data did have predictive capacity when used by a SVM.
Results & Analysis
|
Price Movement Band Tests
After conducting a number of trial tests and speaking to several industry
experts, the research team decided that the most relevant test was to predict price movement.
In order to keep the SVM construction and the analysis simpler, the team calculated
twenty price movement bands. First, the team calculated the percentage by which a share price
had moved between the trade in question and the previous trade. Then the team calculated
the natural logarithm of the absolute price movement. Using the mean and standard deviation
of all price movement logarithms in the training set, the team created a normalised
price movement variable by subtracting the mean value from each observation and dividing by the
standard deviation. Based on these normalised variables, the trades were split into
twenty equal price movement bands, with the probability of any trade being in any particular
band at 5%. By using this method of banding, the trades with the largest share price
movements fall into the highest bands and the trades with the smallest movements are contained in
the lowest bands. |
Tests were then conducted using three months of data to train the model and
perform price movement band predictions for trades on the following day (the first test used 1
September to 1 November as a training set and predicted 1 December; the second test used 2
September to 1 December as a training set and predicted 2 December, etc.). This ‘rolling’
approach covered the 21 trading days of December 2004.
The team looked at the predictions of price movement band and compared these to
the actual price movement band. The overall results were that over 9% of records were
predicted correctly, 47% were within 4 bands, 74% were within 9 bands and 93% were within
14 bands. After examination and discussion with industry experts, the trades where the
prediction differed from the actual result by 15 bands or more were defined as ‘outliers’.
On average 7% of trades were statistically anomalous or ‘outliers’. If all statistically
anomalous trades that were transacted at ‘best’ price (or better) are excluded, only 1% of
all trades remain for possible investigation. The daily results for each of the four
brokers are shown in both tabular [Tables 1, 2, 3, 4] and graphical form [Charts 1, 2, 3, 4]. The
accuracy of the SVM at correctly predicting the exact price movement band out of 20
varies between 8% and 11.75% for the different brokers. The accuracy of the SVM at predicting
within 4 price movement bands varies between 45% and 56% for the different brokers.
Table 1 – Analysis of the 21 price movement
band tests for Broker A (21 trading days in December 2004)

Chart 1 – Differences for Broker A between
actual and predicted price movement bands – out of 20 bands – yellow line shows
the % of ‘outliers’, trades where the difference is 15 or more

Table 2 – Analysis of the 21 price movement band tests for Broker A (21
trading days in December 2004)

Chart 2 – Differences for Broker B between actual and predicted price
movement bands – out of 20 bands – yellow line shows the % of ‘outliers’, trades
where the difference is 15 or more

Table 3 – Analysis of the 21 price movement band tests for Broker A (21
trading days in December 2004)

Chart 3 – Differences for Broker C between actual and predicted price
movement bands – out of 20 bands – yellow line shows the % of ‘outliers’, trades
where the difference is 15 or more

Table 4 – Analysis of the 21 price movement band tests for Broker A (21
trading days in December 2004)

Chart 4 – Differences for Broker D between actual and predicted price
movement bands – out of 20 bands – yellow line shows the % of ‘outliers’, trades
where the difference is 15 or more

The team was keen to establish that these results were better than those of a
“naïve classifier” and conducted Liddell’s test for paired proportions. Again, the
predictions made for Broker’s A, B and D are significantly better than those that would be
made by a naïve classifier. The evidence for Broker C (a much smaller dataset) was weaker,
but still supported the performance of the SVM. The team also put in a handful of ‘wacky’
trades, i.e. trades where the variables were copied from real trades, but the actual
price movements were artificially exaggerated or muted. The SVM correctly identified four out of
the five as anomalous.
|
Filtering Outliers
An outlier, or anomalous trade, was defined as a trade where the predicted price
movement differs from the actual price movement by more than 15 bands out of 20 – either
a very high price movement was predicted but a low price movement was observed, or a very
low price movement was predicted but a high price movement was observed. |
Table 5 – An analysis of outliers by broker
|
Broker
|
Number of
December trades |
Number of
outliers |
% outliers
|
Number of
trades outside bid/offer |
% outside
bid/offer |
Outliers
outside
bid/offer |
% outliers
outside
bid/offer |
|
|
First Filter |
Second Filter |
Combined Filters |
|
A |
2,232 |
109 |
4.88% |
56 |
2.51% |
1 |
0.04% |
|
B |
6,530 |
312 |
4.78% |
2,879 |
44.09% |
124 |
1.90% |
|
C |
294 |
6 |
2.04% |
11 |
3.74% |
1 |
0.34% |
|
D |
28,623 |
2,220 |
7.76% |
2,621 |
9.16% |
277 |
0.97% |
|
Overall |
37,679 |
2,647 |
7.03% |
5,567 |
14.77% |
403 |
1.07% |
Table 5 indicates that when using the SVM as a filter [First Filter] on
average 7% of non-SETS trades are defined as outliers. 7% is still too many outliers for a
detailed manual investigation. A second filter is therefore needed. A trade is unlikely
to fail best execution if it was conducted at the best prevailing price (or better),
though there are some arguments that very large trades might be capable of exceptional
improvement under certain conditions. When excluding trades outside the bid/offer spread [Second
Filter] is combined with the first filter the number of outliers that are outside the
bid/offer spread is approximately 1%.
In order to further verify the results shown in Tables 1 to 4, the team analysed
the quality of predictions within the ‘extreme’ bands [Table 6]. These are the three bands
with the lowest price movement (0, 1 and 2) and the three with the highest price
movements (17, 18 and 19).
Table 6 – An analysis of outliers from
the extreme bands – 3 lowest and 3 highest bands
|
|
Actual
Band |
Correct |
Within
0-4 |
Within
5-9 |
Within
10-14 |
Outliers |
Total |
|
Low
Price
Movements |
0 |
1,034 |
1,143 |
84 |
84 |
712 |
2,023 |
|
1 |
59 |
867 |
74 |
202 |
415 |
1,558 |
|
2 |
18 |
1,041 |
72 |
612 |
83 |
1,808 |
|
High
Price
Movements |
17 |
448 |
1,510 |
28 |
56 |
247 |
1,841 |
|
18 |
226 |
1,343 |
30 |
26 |
181 |
1,580 |
|
19 |
588 |
1,473 |
34 |
8 |
88 |
1,603 |
|
|
Overall |
2,373 |
7,377 |
322 |
988 |
1,726 |
10,413 |
|
|
% |
22.8% |
70.8% |
3.1% |
9.5% |
16.6% |
100% |
It can be seen from this table that 22.8% of predictions within the six
extreme bands were correct and 70.8% were within 4 bands. Only 16.6% of predictions from these
bands were defined as outliers.
There was no outstanding single feature about the trades that were identified as
outliers. The team wanted to assess how the outliers identified by the SVM method differed
from outliers identified by the two most common methods in use at present, comparison
with VWAP and comparison with current market price. An inspection of the list of outliers
showed very little cross-over with either of these methods.
The average VWAP movement of the 2,647 SVM outliers is 0.28%. The average VWAP
movement of the 2,647 trades with the highest VWAP movement is 4.68% - nearly 17 times
higher. In a random nine day sample of trades, the SVM outliers are plotted against 10 bands
of increasing VWAP movement [Table 7], there is no significant correlation:
Table 7 – An analysis of SVM outliers by band
of increasing VWAP
|
Bands of increasing VWAP |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
Percentage of SVM outliers |
13% |
6% |
8% |
6% |
7% |
10% |
13% |
10% |
23% |
4% |
The average distance from best market price at the time of trade (from the
bid price for sells and the offer price for buys) of the 2,647 SVM outliers is 5.43%. The
average distance from best market price of the 2,647 trades with the highest distance is
18.75% - nearly 3.5 times higher. If the SVM outliers are plotted against distance from
best market price, there is again no significant correlation [Table 8].
Table 8 – An analysis of SVM outliers by band
of increasing distance from best price
|
Bands of increasing distance from best price |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
Percentage of SVM outliers |
8% |
9% |
11% |
8% |
17% |
12% |
11% |
8% |
9% |
6% |
The team also briefly compared the SVM outliers with a set of outliers
produced by single-link hierarchical clustering. A random sample of trades was clustered and
the outliers identified. The result of this clustering is shown in Diagram 4. The
most similar trades are linked by a horizontal line towards the bottom of the
vertical axis (and left on the horizontal axis). Trades linked higher up (and further right) are
less similar to one another. None of the outliers corresponded with the SVM outliers from the
same dataset, however, the team intends to conduct further research in this area.
Diagram 4 – Clustering of trades with the most
similar trades linked at the bottom and the ‘outliers’ at the far right of the diagram.

One question that was raised during the research was how well did the SVM
perform for trades in a single share. Several shares with a reasonable number of trades were
examined. The overall level of outlier prediction and distribution of differences between
actual and predicted price movement band is very similar to datasets over all shares [Chart
6].
Chart 6 – The difference of actual and
predicted price movement band movement for a single share plotted against
current share price

Chart 6 demonstrates that during a time of rapid change in market prices it
appears that the SVM does seem to note turning points between trend and uncertain, and uncertain
and trend.
The team also examined how well the SVM could predict the magnitude of a single
share’s price movement [Chart 7]. By plotting the actual share price movement since the
last trade against the SVM predicted price movement band it is evident that the SVM does
achieve reasonable predictions of price movement and ultimately responds well to changes
in the market. The three outliers for this frequently traded share are plotted in
green. It is interesting to note that in all three cases the SVM presages major changes, but
for this broker they hadn’t occurred at that point. The team would recommend further
research in this area.
Chart 7 – Movement of share price (in pence)
and predicted price movement band for a single share

Another aspect of the results which interested the team was the rate of decay
of accuracy over time as it would determine how often a predictive model would need to be
rebuilt for any installation of an automated compliance system. The majority of the tests
conducted were making predictions for the next full day. It is noticeable that pred |