| The Insider's Guide To Planning C166 Family Designs - Part I |
![]() |
||
|
poor quality of reproduction. However a high quality PDF version is available here. |
||
The Insider's Guide To Planning C166 Family Designs - Part II
The Insider's Guide To Planning C166 Family Designs - Part III
The Insider's Guide To Planning C166 Family Designs - Part IV
The Insider's Guide To Planning C166 Family Designs - Part V
The Insider's Guide To Planning C166 Family Designs - Part VI
|
Issue B |
||||||||||
Second Edition |
||||||||||
|
166 Designer's Guide - Page |
||||||||||
|
This guide contains basic information that is useful when doing your first 166 family design. There are many simple facts which if they are known at the outset can save a lot of time and money. Overall, it is intended as a complement to the user manuals by putting things into a practical context. Some of the material can be found in the 166 family databooks but most of it is simply the result of our practical experience and so is only to be found here. Topics covered are those that are not obvious or are often missed. Where the user manuals provide a satisfactory explanation, you will be referred to it rather than duplicating information here. This is by no means a complete reference work and you are directed to the excellent work by one of the architecture's original designers Karl-Heinz Mattheis, available in the German language.
Note: While every effort has been made to ensure the accuracy of the information contained within this guide, Hitex cannot be held responsible for the consequences of any errors contained therein. Any subjective or anecdotal information presented is not necessarily the official view of either Hitex Development Tools Ltd. or Siemens Plc.. Ulrich Beier |
|||||||||||
| Hitex produces the largest range of 166 family emulation and simulation tools available from any manufacturer. By using both standard part and bondout-based technology, Hitex can uniquely provide the optimal emulation method for all 166 variants, whatever the application. Besides supplying the development tools, Hitex is also pleased to help and advise new and | |||||||||||
| prospective 166 users in all aspects of hardware and software design, as this guide demonstrates - we are at your service! | |||||||||||
|
|
|||||||||||
|
166 Designer's Guide - Page |
|||||||||||
|
RISC Architectures For Embedded Applications 6 Introduction 6 Behind The 166's Near-RISC Core 6 Conventional CISC Bottle-necks 6 The RISC Architecture For Embedded Control 7 Basic Definitions: 7 Bus Interface 8 RISC Interrupt Response 8 Registers And Multi-Tasking 8 Coping With RISC Instruction Set (Apparent) Omissions 10 RISC And Real World Peripherals 11
1. Getting Started With The 166 12 1.1 Basic Considerations 12 1.1.1 Family Overview 12 1.1.2 Fundamental Design Factors 12 1.2.1 Setting The CPU Hardware Configuration Options (166) 12 1.2.2 Setting The CPU Hardware Configuration Options (167) 12 1.3 Calculating The Pull-Down Resistor Values 13 Pull Down Resistor Calculation 13 1.4 Pull-Up Resistor Calculations 13 Pull Up Resistor Calculation 14 1.4 Setting The Configuration Without Pulldown Resistors 14 1.5 Port 0 Configuration Functions 15 1.6 Reset Control 16 2. Clock Speeds And Sources 17 2.1 166 Variants 17 2.2 165 And Basic 167 Variants 17 2.3 167SR & CR Variants 17 2.4 Generating The Clock 17 2.4.1 Designing Clock Circuits 17 2.4.2 Oscillator Modules 17 2.4.3 Designing Crystal Oscillator Circuits 18 2.4.4 Crystal Oscillator Components Test Procedure 18 2.4.5 Typical Component Values 19 2.4.6 Laying Out Clock Circuits 20 2.4.7 Symptoms Of A Poor Clock 20
3. Bus Modes 21 3.1 Flexible Bus Interface 21 3.2 Setting The Bus Mode 21 3.2.1 166 Variants 21 3.2.2 C165/7 Derivatives 21 3.3 Setting The Overall Addressing Capabilities 21 3.4 External Memory Access Times (167 Derivatives Only) 22 3.5 Expanding The Basic 166's Memory Space 22
4. Interfacing To External Devices 23 4.1 The Integral Chip Selects (167/5/4/3/1) 23 4.2 Setting The Number Of Chip Selects 24 4.3 READ/WRITE Chip Selects. 24 4.4 Replacing Address Lines With Chip Selects 25 4.5 Generating Extra Chip Selects 26 4.6 Confirming How The Pull-Down Resistors Are Configured 27 4.7 Generating Waitstates And Controlling Bus Cycle Timings 27
5. Interfacing To External Memory Devices 28 5.1 Using Byte-Wide Memory Devices In 16-bit 167 Systems 29 |
||||||||
|
166 Designer's Guide - Page |
||||||||
|
5.2 Using The 166 With Byte-Wide Memories 30 5.3 Using DRAM With The 166 Family 31
6. Single Chip 166 Family Considerations 32 6.1 Single Chip Operation 32 6.2 In-Circuit Reprogrammability Of FLASH EPROM 32 6.3 Total Security For Proprietary Software 32 6.4 Keeping An External Bus 32 6.5 Hitex's In-Circuit FLASH Programming Utility Toolkit 32 6.5 Accommodating In-Circuit FLASH Programming 33 6.7 In-Circuit FLASH Programming Via CAN 33
7. The Basic Memory Map 34 7.1 On-Chip RAM Regions 34 7.1.1 166 Variants 34 7.1.2 167CR & 167SR, C165, Some 161 Variants 34 7.1.4 C167CS, C161CS 34 7.2 Planning The Memory Map 34 7.2.1 External ROM Applications 34 7.2.2 Internal ROM Applications 35 7.3 A Typical 167 System Memory Map 35 7.4 How CPU Throughput Is Related To The Bus Mode 36 7.5 Implications Of Bus Mode/Trading Port Pins For IO 36
8. System Programming Issues 37 8.1 Serial Port Baud Rates 37 8.1.1 166 Variants 37 Baudrates for 20 MHz 37 Baudrates for 16 MHz 37 8.1.2 Enhanced Baudrate Generator On 167 Variants 37 8.1.3 The Synchronous Port On The 167 37 8.2 Interrupt Performance 37 8.2.1 Conventional Interrupt Servicing Factors 37 8.2.2 Event-Driven Data Transfers Via The PEC System 38 PEC Usage Examples 38 8.2.3 Extending The PEC Address Ranges And Sizes Above 64K 39 8.2.4 Software Interrupts 39 8.2.5 Hardware Traps 39 8.2.6 Interrupt Vectors And Booting Up The 166 39 8.2.7 Interrupt Structure 40 8.3 The Bootstrap Loader 40 8.3.1 On-Chip Bootstrap Booted Systems 40 8.3.2 Freeware Bootstrap Utilities For 167 41 8.4 166 Family Stacks 41 8.5 Power Consumption 42 8.6 Understanding The DPPs 42 8.6.1 166 Derivatives 42 8.6.2 167 Derivatives 43
9. Allocating Pins/Port Pins In Your Application 44 9.1 General Points About Parallel IO Ports 44 9.2 Allocating Port Pins To Your Application 44 9.3 Port 0 44 Port 0 Pin Allocations: 44 9.4 Port 1 44 9.5 Port 2 45 9.5.1 The CAPCOM Unit 45 9.5.2 Time-Processor Unit Versus CAPCOM 45 9.5.3 32-bit Period Measurements 45 9.5.4 Generating PWM With The 166 CAPCOM Unit 46 9.5.5 Sinewave Synthesis Using The CAPCOM 46 |
||||||||
|
166 Designer's Guide - Page |
||||||||
|
9.5.6 Automotive Applications Of CAPCOM1 46 9.5.7 Digital To Analog Conversion Using The CAPCOM Unit 47 9.5.8 Timebase Generation 47 9.5.9 Software UARTs 48 9.6 Port 3 49 9.6.1 Using GPT1 49 9.6.2 Using GPT2 50 9.7 Port 4 50 9.7.1 Interfacing To CAN Networks 50 9.8 Port 5 51 9.8.1 166 Analog To Digital Convertor 51 9.8.2 167 Analog To Digital Convertor 51 9.8.3 Over-Voltage Protected Analog Inputs 52 9.8.4 167/4-Specific Enhancements 52 - wait-for-ADDAT-read mode 52 - channel injection 52 - programmable sampling times 52 9.8.5 Matching The A/D Inputs To Signal Sources 53 9.8.6 165/3 54 9.9 Port 6 (167) 54 9.10 Port 7 (167 Only) 54 50ns PWM Module/High Resolution Digital To Analog Convertor 55 9.11 Port 8 (167 Only) 55 9.12 Summary Of Port Pin Interrupt Capabilities 55 9.12.1 Interrupts From Port Pins 55 9.12.2 166 Variants 55 9.12.3 167 Variants 55 9.13 Typical 166 Family Applications 56 9.13.1 Automotive Applications 56 9.13.2 Industrial Control Applications 56 9.13.3 Telecommuncations Applications 57 9.13.4 Transport Applications 57 9.13.5 Consumer Applications 57 9.13.6 Instrumentation Applications 57
10. 166 Compatibility With Other Architectures 58
11. Mounting 166 Family Devices 59 11.1 Package Types 59 11.2 Connecting Emulators To 166 Family Devices 60 11.2.1 Socketed Devices 60 11.2.2 The "PressON" Emulation Connector 60 11.3 166 Family PCBs 60 11.4 CAD Symbols 60
12. Direct PCB Emulation Interfaces For 166 Designs 61 12.1 The Problem 61 12.2 The ROMless Solution - ICEconnect166 61 12.3 The ROM/ROMless Solution - QuadConnect 61
13. Getting New Boards Going 62 13.1 External Bus Design Pitfalls 62 13.2 Single Chip Designs 64 13.3 Testing The System 64
14. Conclusion 65 15. Acknowledgements 65 16. Feedback 65 17. Contact Addresses 65
Appendix 1 - Siemens C166 Family Part Numbers 66 |
||||||||
|
166 Designer's Guide - Page |
||||||||
RISC Architectures For Embedded Applications
Introduction
The 166 CPU core makes extensive use of Reduced Instruction Set Computer (RISC) concepts to acheive its blend of very high performance at modest cost. To understand why RISC techniques are especially suited to high-speed real time embedded systems, it might be useful to examine in detail how they grew out of the traditional Complex Instruction Set Computers (CISC) that reached their peak in the late 1980's to early 1990's.
Behind The 166's Near-RISC Core
The reasons behind the abandonment of traditional Complex Instruction Set Computers (CISC) has been the quest for ever greater throughput. The demands of workstations involved in CAD tasks and latterly advanced video games, have been the real driving force behind this. Traditionally, microprocessors have been designed with assembler instruction sets that have been geared towards making the assembler programmer's life easier through the extensive use of microcode to produce ever more powerful instructions`. By providing single assembler instructions that perform, for instance, three operand multiplication, the assembler programmer (and HLL compiler writer) has been relieved of the job of achieving the same result with simpler instructions.
The need for the CPU to be able to recognise and act on (decode) many hundreds of different instructions, requires complex silicon and many clock cycles. The greater the silicon area, the greater the cost of the device and power consumed. With physical limitations acting to restrict achievable clock speeds on silicon devices, the number of cycles per instruction is obviously very significant in gaining higher performance..
RISCs tend to shift the burden of programming from the microcoder to the assembler programmers and compiler writers. Work both within academia and commercial manufacturers has proved that a suitably programmed RISC machine can achieve a far higher throughput than a CISC for a given clock speed.
Strangley, the embedded world has been slow to question the suitability of the CISC-based microcontroller. Whilst at the very top end, devices such as the i80960 have enjoyed some success, for more commonplace embedded tasks, RISC is almost unknown. With the increasing complexity of modern control algorithms, the need for greater processing power is set to become an issue in anything but the simplest applications. In addition, here more than in the workstation world, the worst-case response time to non-deterministic events is crucial, an area where CISCs are especially poor.
Many current high-end microcontrollers are based on existing CISC architectures such as the 8086, 68000 etc., which in common with 8-bit devices such as the 8051, have an internal structure that dates back up to 19 years. With the silicon vendor's need to give existing users an upgrade path, apparently new designs are often based closely on the existing architecture/instruction set, so protecting the user's investment in expensive assembler-code.
Like workstations, microcontrollers are tending to be programmed in a high level language (HLL) to reduce coding times and enhance maintainability. Inevitably, even with the best compilers, some loss of performance is encountered, emphasising again the need for improved CPU performance.
In addition to straightforward data processing, microcontrollers must also handle real-world peripherals such as A/D converters, PWM's, timers, Ports, PLL's etc., all of which require real time processing.
Conventional CISC Bottle-necks
1. Long And Unpredictable Interrupt Latencies
Complicated "labour-saving" instructions must hold
CPU's entire attention during execution, thus preventing real-world
generated interrupts from being serviced. Unpredictable latency
times result which can cause serious problems in hard real-time
systems. One approach to overcoming the CISC's poor real-time
response has been to bolt a secondary "time processor"
onto the core to try and off-load the time-critical portions.
However, this results in an awkward design and the need to use
a very terse microcode to program it, in addition to the more
usual C and assembler for the CISC core itself.
166 Designer's Guide - Page |
||||
|
2. Vast Instruction Sets Give Slow Decoding
Loaded instruction must be recognised from potentially many hundreds or even thousands of possibilities. Decoding is thus complicated and lengthy.
3. Frequent Accesses To Slow Memory Devices
Data is typically fetched from off-chip memory and placed in accumlator-type registers. Mathematical or logical operations are performed and then result written back to memory. Value is likely to be required again in course of procedure, thus requiring further movements to and from off-chip memory.
4. Slow Procedure Calling
When calling subroutines with parameters (essential in good HLL programming), parameters must be individually pushed on to stack. They must then be moved through accumulator register(s) for processing before being returned via stack to caller.
5. Strictly One Job At A time
Each peripheral device or interrupt source must have dedicated service routine which at the least will require the PSW, PC to be stacked and restored and data removed from or fed to peripheral device.
6. Software Has To Be Structured To Suit Architecture.
Embedded systems frequently contain many separate real time tasks which together form a complete system. Conventional CPU's make switching between tasks slow. Often, many registers have to be stacked to free them up for the incoming task. This problem is aggravated by the use of HLL compilers which tend to use a large number of local variables in library functions which must be preserved.
7. Redundant Instructions And Addressing Modes
With the move to HLLs, compilers are tending to dictate what instructions should be provided in silicon.
In practice, compilers tend to only make use of a small number of addressing modes. This results in a large number of unused addressing modes which serve only to complicate the opcode decoding process.
8. Inconsistent Instruction Sets
Instruction sets that have evolved tend to be difficult to use due to large number of different basic types and the inconsistent addressing modes allowed.
9. Bus Not Fully Utilised
Whilst complex instructions are being executed, bus is idle.
The RISC Architecture For Embedded Control
To show how RISC design is used to improve microcontroller throughput, the 166 is used as an example.
Basic Definitions:
1 state time = 2 * 1/oscillator frequency
- fundamental unit of time recognised within processor system.
1 machine cycle = 2 * state time
- minimum time required to perform the simplest meaningful task within cpu. |
||||||||
|
166 Designer's Guide - Page |
||||||||
|
The unit of state times is used when making comparisons between RISCs and CISCs as this removes any dependency on clock frequency.
- All state time counts are given in single chip operation mode for both 80C196 and 166.
Bus Interface
To maximise the rate at which instructions are executed, RISC CPU's are very heavily pipelined. Here, on any given machine cycle, up to 4 instructions may be processed by overlapping the various steps thus:
FETCH: - get opcode from program store DECODE: - identify opcode from a small list and fetch operands EXECUTE: - perform operation denoted by opcode WRITE-BACK: - result returned to specified location
Thus although the instruction takes four machine cycles, it is apparently executed in just one (2 state times). Pipelining has considerable benefits for speeding sequential code execution as the bus is guaranteed to be fully occupied.
RISC Interrupt Response
In the 166, branches to interrupts make use of the injected instruction technique and so vectoring to a service routine is achieved in only 4 machine cycles (400ns). The effect of complex but necessary instructions such as MUL and DIV (5 and 10 cycles respectively) stretch this but it is interesting to note that the 80C166 does provide these as interruptable instructions.
Very fast interrupt service is crucial in high-end applications such as engine management systems, servo drives and radar systems where real-world timings are is used in DSP-style calculations. As these normally form part of a larger closed control loop, erratic latency times manifest themselves as an undesirable jitter in the controlled variable.
Registers And Multi-Tasking
Traditional microcontrollers have one or more special registers which can be used for mathematical, logical or Boolean operations. In the 8051, there is a single "accumulator" with 8 other registers which may be used for handling local variables or intermediate results in complex calculations. These additional registers are also used to access memory locations via indirect and/or indexed addressing.
As pointed out in section 3 and 4 above, conventional CPU's spend much time moving data from slow memory areas into active registers. The RISC offers a very large number of general purpose registers which may be used for locals, parameters and intermediates. The 166 provides 16 word-wide general purpose registers (GPRs), each of which is effectively an accumulator, indirect pointer and index. With such a large number of GPR's available, it becomes realistic to keep all locals and intermediates within the CPU throughout quite large procedures. This can yield a great increase in speed.
Further significant benefits are derived from the RISC technique of register windowing. As has been said, up to 16 registers are available for use by the program. However, by making the active register bank movable within a larger on-chip RAM, the job of real time multi-tasking is considerably eased.
Central to this is the concept of a "Context Pointer" (CP), which defines the current absolute base address of the active bank. Thus a reference to "R0" means the register at the address indicated by the CP. Thereafter, the 16 registers originating from CP are accessed by a fast 4-bit offset.
The best example of how the CP is exploited is perhaps a background task and a real-time interrupt co-existing. When the interrupt occurs, rather than pushing all GPR's onto the stack, the CP of the current register bank is stacked and simply switched to a new value, determined at link time, to yield a fresh register bank. This results in a complete context switch in just one machine cycle but does rule out the use of recursion.
A hybrid method, which permits re-entrancy, uses the stack pointer to calculate the new CP dynamically. |
||||||||
|
166 Designer's Guide - Page |
||||||||
|
Here, on entering the interrupt, the number of registers now required is subtracted from the current SP and the result placed in CP, with the old CP stacked. Thus the new register bank is located at the top of the old stack, with the old CP and then the new stack following on immediately afterwards. On exiting the interrupt routine, the original registerbank is restored by POPping the old CP from the stack. The SP is reinstated by adding the size of the new register bank onto the current SP.
A further RISC refinement is register window overlapping whereby when a new procedure is called, part of the new register bank defined by CP' is coincident with the original at CP:
R3' ; Register for subroutine's locals and intermediates
R2' ; Register for subroutine's locals and intermediates
R7 R1' ; Common register, R7 == R1'
CP' R6 R0' ; Common register, R6 == R0'
R5 ; Register for caller's locals and intermediates
R4 ; Register for caller's locals and intermediates
R3 ; Register for caller's locals and intermediates
R2 ; Register for caller's locals and intermediates
R1 ; Register for caller's locals and intermediates
CP R0 ; Register for caller's locals and intermediates
MODULE 1
; *** Assignment Of GPRs To Local Variables - Caller *** x_var LIT `R0' ; Local variable y_var LIT `R1' ; Local variable parm1 LIT `R6' ; Passed parameter 1 parm2 LIT `R7' ; Passed parameter 2 result LIT `R6' ; Value returned from sub routine ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MODULE 2 ; *** Assignment Of GPRs To Local Variables - Sub Routine *** a_var LIT `R2' ; Local variable b_var LIT `R3' ; Local variable input1 LIT `R0' ; Received parameter 1 input2 LIT `R1' ; Received parameter 2 ret1 LIT `R0' ; Final result returned in R0 Fig. A - Giving GPR's Meaningful Names By using some forethought, the programmer should arrange for any value to be passed to the sub routine to be located in the common area, so that all the normal loading and unloading of parameters is avoided. This technique can be used in either absolute or SP-relative registerbank modes.
To get the best from a RISC's registers, the location of data needs close consideration: although highly orthogonal, the limited number of addressing modes provided for MUL and DIV for example, can appear somewhat restrictive. Fortunately though, most operands involved will already be in registers, so eliminating the need for many addressing techniques. As might be expected, the instructions with the widest range of addressing modes are the simple data moves - the fact that RISC's are the result of very careful analysis of the requirements for fast execution becomes obvious after a short acquaintance! |
||||||||
|
166 Designer's Guide - Page |
||||||||
Coping With RISC Instruction Set (Apparent) Omissions
With largely single machine cycle execution, some conventional "fast" instructions such as CLEAR, INC and DEC become redundant. Therefore, to keep the total number of instructions to a minimum, RISC's simply omit them. Examples are given below:
Instruction 80C196 States 80C166 States
Clear Word CLR 4 AND Rn,#0 2 Decrement Word DEC 4 SUB Rn,#01 2 Increment Word INC 4 ADD Rn,#01 2
- all direct addressing mode
Three-operand instructions are also commonplace in CISCs but not present in RISCs. Although additional instructions are required, the overall number of states is still less than the three operand CISC equivalent, plus the shorter RISC instructions allow greater opportunity for interrupt servicing.
The following example illustrates this:
Perform: z = x + y
80C196 (CISC) z,x and y are directly addressed memory locations
x DW 1
y DW 1
z DW 1
ADD z,x,y ; 5 states - no interrupt possible
166 (RISC)
z,x and y are memory locations, Rw is a GPR
x DW 1
y DW 1
z DW 1
MOV Rw,x ; 2 states
; * Interruptable here
ADD Rw,y ; 2 states
; * Interruptable here
MOV z,Rw ; 2 states
;
; 6 states
One extra state required when using RISC approach. However, if the variables are assigned recognising that this is a RISC: x and y are memory locations, z is a GPR
x DW 1
y DW 1
z LIT `R0' ; z is assigned to GPR R0 via a LITeral definition
MOV z,x ; 2 states
; * Interruptable here
ADD z,y ; 2 states
;
; 4 states - 1 state saved over CISC. The above was chosen as a worst case RISC, best case CISC example. |
||||||||
|
166 Designer's Guide - Page |
||||||||
To request us to send you this book by email
or post.... ![]()
View the next chapter of this document...

ST10F168