PC Desktop Getting Started
This guide walks you through everything required to automate PC desktop applications with Midscene: install dependencies, configure model credentials, and run your first JavaScript script.
Control PC desktop with JavaScript: https://github.com/web-infra-dev/midscene-example/tree/main/computer/javascript-sdk-demo
Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/computer/vitest-demo
Set up API keys for model
Set your model configs into the environment variables. You may refer to Model strategy for more details.
For more configuration details, please refer to Model strategy and Model configuration.
System Requirements
Node.js
Node.js 18.19.0 or higher is required.
Platform-Specific Dependencies
macOS: Accessibility permissions are required for keyboard and mouse control. When you run the script for the first time, macOS will prompt you to grant access. Go to System Settings > Privacy & Security > Accessibility and enable permissions for the application running your script (e.g., Terminal, iTerm2, VS Code, WebStorm, or other IDEs). For more details, see nut.js macOS setup.
Linux: ImageMagick is required for screenshot functionality.
Headless Linux (CI): To run desktop automation on a headless Linux server (e.g. GitHub Actions), install Xvfb and its dependencies, then enable headless mode:
Xvfb creates a virtual display so that mouse, keyboard, and screenshot operations work without a physical monitor. See API Reference for details.
A complete demo of testing Obsidian (an Electron app) on headless Linux CI with @midscene/computer: https://github.com/web-infra-dev/midscene-example/tree/main/computer/electron-demo
Try Playground (no code)
Playground is the fastest way to validate the connection and observe AI-driven steps without writing code. It shares the same core as @midscene/computer, so anything that works here will behave the same once scripted.
- Launch the Playground CLI:
- Click the gear icon in the Playground window, then paste your API key configuration. Refer back to Model configuration if you still need credentials.
Start experiencing
After configuration, you can start using Midscene right away. It provides several key operation tabs:
- Act: interact with the page. This is Auto Planning, corresponding to
aiAct. For example:
- Query: extract JSON data from the interface, corresponding to
aiQuery.
Similar methods include aiBoolean(), aiNumber(), and aiString() for directly extracting booleans, numbers, and strings.
- Assert: understand the page and assert; if the condition is not met, throw an error, corresponding to
aiAssert.
- Tap: click on an element. This is Instant Action, corresponding to
aiTap.
For the difference between Auto Planning and Instant Action, see the API document.
Integration with Midscene Agent
Once Playground works, move to a repeatable script with the JavaScript SDK.
Step 1. Install dependencies
Step 2. Write your first script
Create example.ts:
Step 3. Run the script
Multi-Display Support
If you have multiple displays, you can control a specific one:
Example Usage
Basic Mouse Operations
Keyboard Operations
Query Information
Complex Workflows
Environment Check
You can check if your system is properly configured:

