Data Science Virtual Machine (DSVM) is a virtual machine with pre-installed popular libraries for data analytics and machine learning. A DSVM can be used as an environment for training models and experimenting with data.
The image is based on Ubuntu and includes pre-installed software:
- Conda, a package manager with Python 2.7 (environment
py27
) and Python 3.10 (py310
). - Jupyter Notebook and JupyterLab, tools for interactive and reproducible computations.
- Data analysis, scientific computing and data visualisation libraries: Pandas, NumPy, SciPy, Matplotlib.
- Machine Learning libraries: PyTorch, CatBoost, TensorFlow, scikit-learn, Keras.
- PySpark, a library for interacting with Apache Spark™ and building distributed data processing pipelines.
- NLTK, a suite of natural language processing libraries and data.
- Docker®, a container management system.
- Git, a version control system.
- NVIDIA® Data Center Driver, CUDA® Toolkit 12, and Container Toolkit for accelerating machine learning and other compute-intensive applications on NVIDIA GPUs available in Nebius.ai.
- Optimised libraries and instruments for working with images: scikit-image, opencv-python, Pillow.
-
Click the button in this card to go to VM creation. The image will be automatically selected under Image/boot disk selection.
-
Under Disks, follow these recommendations for a boot disk:
- Type: SSD.
- Size: 30 GB or more.
-
Under Computing resources, follow these recommendations:
- vCPUs: 2 or more.
- RAM: 2 GB or more.
-
Paste the public key from the pair into the SSH key field.
-
Create the VM.
-
In your web browser, go to
http://<VM's_IP_address>:8888
to access the UI. The password is your VM’s ID; to get it, follow the guide. -
In the UI, open the terminal.
-
Initialize Conda and restart the terminal:
conda init bash bash
-
Activate a Conda environment with Python 3.10 (even if you are going to use Python 2.7; you will be able to switch later):
conda activate py310
-
Set a new password and restart the Jupyter Lab daemon:
jupyter-lab password sudo systemctl restart jupyter-lab
-
If you are going to use Python 2.7, switch to it:
conda activate py27
- Analysis and prediction of user behavior.
- Analysis of system operation and prediction of failures.
- Customer segmentation.
- Classification of images, documents, and any types of data.
- Recommendation systems.
- Speech synthesis and recognition services.
- Dialog engines.
Nebius AI technical support responds as quickly as possible, aiming to respond within 4 hours. Learn more about technical support.
Software | Version |
---|---|
Ubuntu | 20.04 LTS |
CatBoost | 1.2 |
Conda | 23.5.0 |
Docker | 24.0.2 |
Git | 2.25.1 |
JypiterLab | 3.6.3 |
Keras | 2.11.0 |
Matplotlib | 3.7.1 |
NLTK | 3.7 |
NVIDIA CUDA Toolkit | 12.0.1 |
NVIDIA Container Toolkit | 1.13.2 |
NVIDIA Data Center Driver | 535.54.03 |
NumPy | 1.22.3 |
Pandas | 1.4.2 |
Pillow | 9.4.0 |
PySpark | 3.2.1 |
PyTorch | 2.0.1 |
SciPy | 1.8.1 |
TensorFlow | 2.11.0 |
opencv-python | 4.6.0 |
scikit-image | 0.19.2 |
scikit-learn | 1.1.1 |
accelerate | 0.17.1 |
datasets | 2.9.0 |
transformers | 4.27.1 |
torchvision | 0.15.2 |